This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Lessons from Site Reliability Engineering: Part 1

About 17 months ago, our marketing data infrastructure went through a significant change. Red Hat Summit 2016 was quickly approaching, and we were concerned about the volume of event data which would be flooding our system. With multiple steps in our data flow, both vendors and internal processes, small mistakes could quickly propagate to issues which would take months to fix. So, we took a big step forward.

Taking a page from Google’s Site Reliability Engineering practices, we decided to automate our monitoring, and add alerts for potential data issues. We identified Service-Level Objectives (SLOs) for our data, set thresholds, and got to work. We implemented monitoring and alerts through the use of Eloqua’s APIs, Prometheus...

Oct 23, 2017

All Things Open 2017 - Data Washing Machine

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

This year marks the second time I have attended All Things Open, and it continues to be awesome! Some amazing keynote speakers, including Sara Chipps of Jewelbots, and Kelsey Hightower from Google Cloud Platform.

Today, I was honored to present on how Marketing Operations at Red Hat tackles the problem of data quality. Specifically, we dived in to how we abstracted the process that many data scientists and data engineers use in a more ad-hoc manner.

Below is the slide deck, along with a link to the Github repo:

https://www.slideshare.net/secret/KRDIMK9nJNikmW

https://github.com/rh-marketingops/dwm

Hope to see you at ATO 2018!

Feel free to connect with me!

www.linkedin.com/in/jeremiah-coleman-product
https://twitter.com/nerds_s
jeremiah.coleman@daasnerds.com
https://g...

Continue reading →

Using Python to import contacts to Eloqua

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Being a data engineer in marketing lends itself to a lot of fun projects, and working at Red Hat (which is, from its roots, an open-source company) gives a lot of opportunity to share those projects.

Last year, while moving our team into data automations with Python, I developed a package called pyeloqua, which made it easier to work with the APIs in the marketing automation platform Eloqua. Several other Eloqua users have reached out with questions about the package, most recently, asking for some simple examples which I had overlooked in the project documentation.

What I developed was a simple gist showing the import of example contact data: https://gist.github.com/colemanja91/532a269eaaa09ef9e3c2a8984bf69bb8

Hopefully others working in marketing automation will find this...

Spark Summit 2017 - Day 2 Takeaways

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Day 2 was a bustle of activity at Spark Summit 2017. I started the day catching up with a bit of work at Workshop Cafe in the Financial District, then made my way down to Moscone for the opening keynotes and a day of sessions.

The official beginning of Spark Summit, billed as Developer Day, had a lot of great content to digest. Most surprising were clear trends independently developed at different companies that signal a clear direction for the immediate future of Spark and big data processing.

Compute-as-a-service is the future of data

Running a Spark cluster in production (or any other framework for data engineering) requires a lot of up-front investment, but does not end there. Once use catches on, you quickly have to deal with multi-user tenancy, data security...

Spark Summit 2017 - Day 1 Takeaways

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

I arrived in San Francisco for the first time Sunday evening to attend Spark Summit 2017. Spark is the hip-tool-on-the-block for data engineering and data science, so it’s really exciting to see how other companies are using and what cases we may be able to adapt on my team.

I’m going to try sharing some of my key takeaways from each day (although delaying 1–2 days to give the concepts some time to gel).

Day 1 was “Training Day" - I attended the session Architecting a Data Platform, given by the top-notch team at Silicon Valley Data Science.

Build a Lab and a Factory

This one stuck out because it is a philosophy we’ve already adopted. The most frequent analogy we use to describe "what we do” is that we are a data factory. Not only do we focus on automation, but we have to...

What is a “Data Engineer”?

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Earlier this year, my boss worked with me and a coworker to define a new position for our team - the data engineer position. This was exciting for me because, since entering my post-college career, I’ve had an unfortunate habit of holding jobs that didn’t match their title or description. After researching how other companies describe the same position, I realized that it was closest to my current responsibilities (as well as my interests).

Last week, I had the opportunity to describe my career path and how, based on that, I interpret my current responsibilities as a data engineer. I was further able to explain what I see as the differences between “data engineer” and several of the other hot data-related careers out there (including the ever-famous “data scientist”). The end...

Running Luigi on Openshift 3

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

In a previous post, I outlined how Red Hat’s Marketing Operations group is leveraging the power of Openshift 3 and Kubernetes along with Spotify’s open source project Luigi (see here for more details: https://github.com/rh-marketingops/rh-mo-scc-luigi). This architecture has allowed us to greatly expand the power of our data processing pipelines.

Today, I’d like to share an example of running the Luigi central scheduler app on Openshift. The following GitHub repo gives the basics for running the scheduler: https://github.com/colemanja91/os3-luigi

The central scheduler is a powerful tool for managing multiple worker nodes; at a high-level, it tracks task progress in a visual way, and ensures no to workers attempt to execute the exact same task.

Leveraging the power of...

Jeremiah Coleman

Thoughts about product management, engineering, and storytelling; based in Raleigh, NC

Page 3

Making Marketing Infrastructure Robust

Lessons from Site Reliability Engineering: Part 1

All Things Open 2017 - Data Washing Machine

Using Python to import contacts to Eloqua

Spark Summit 2017 - Day 2 Takeaways

Compute-as-a-service is the future of data

Spark Summit 2017 - Day 1 Takeaways

Build a Lab and a Factory

What is a “Data Engineer”?

Running Luigi on Openshift 3