Thoughts about product management, engineering, and storytelling; based in Raleigh, NC

Examining your Data-User Experience

Originally published in MyTake

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Everyone wants to benefit from data. There are so many ways to collect and use data, so why not? In theory, a big break in an industry is developing some “magic insight” and putting it in an app for users to see.

Why is it never actually that easy?

I believe teams often ignore their Data-User Experience until it’s too late, with the outcome being frustrated users that didn’t get the “magic insight” they wanted.

Traditional UX is a very broad-ranging activity - your (many) goals include making it easy and intuitive for the user to perform in-app actions, build a sense of how those actions help them, draw them into a routine of opening and using your app, and create a positive mental association to the value you...

Continue reading →

Jan 21, 2019

How I Grew as an Engineer in 2018

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

OK, I’m a little late this year… It’s time to reflect on what happened in the last year, and the most important lessons learned.

This year was actually quite a bit different from 2017. In January, I wrapped up work with my old team and moved on to an amazing new challenge by joining the product engineering team at First! I’ve been blessed to work with some amazing engineers, product managers, and many others. First has drastically challenged me to grow in ways that I am excited to share.

Incrementalization is difficult- but magical

The biggest challenge I’ve seen with implementing Agile - all throughout my career - is properly breaking down stories from the backlog. Estimated hours turn into days, and occasionally days into weeks. Here’s what I’ve learned about story...

Continue reading →

Nov 19, 2018

Data is not important

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

When I was right out of college - all bright-eyed and optimistic - I wanted to change businesses using math. Before college, I loved the CBS show NUMB3RS, which heavily played into that view. It was a good time for that, as data science was just coming into it’s own as a profession. I was convinced that, given the right algorithms, we could make significant impact to the bottom line.

A few years in the corporate world taught me that it wasn’t more algorithms that were needed. It was better access to data, in general. Silos, lack of standards, and lack of purpose (toward solving specific business problems) stood out to as major problems. That drove my career and passion toward data engineering.

Now, I have reached another milestone in my view of data. Data is not the driving...

Continue reading →

Sep 16, 2018

Why Tech Storytelling Matters

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Last week, I had a chance to meet up with two former colleagues who were visiting their company HQ in Raleigh. We met up at the new Food Hall in downtown, had some drinks, talked about life and work. Eventually the topic turned to engineering, technology vendors, and storytelling.

They relayed a story of enduring a recent all-day training session. The presenter had immediately launched in to a series of menus, steps, templates, jumping back and forth between multiple platforms. My friends lamented that the goal of the product, or even the goal of that days training, was lost due to an unclear story.

I realized that I had done the same thing that very evening. When we started talking about work, I demoed for them our wizard new mobile UI, and talked about our predictive...

Continue reading →

Sep 5, 2018

Managing a Databricks Spark Environment with Ansible

Bringing configuration management to Big Data

Apache Spark is an absolutely amazing tool for working with big data. It hides and optimizes all the complexity of Hadoop MapReduce, and reduces coding to (mostly) simple Scala, Python, or SQL statements.

Databricks takes it a step further by drastically reducing the DevOps complexity involved in using Spark. They do this by providing native implementations of notebooks, jobs, easy-to-use cluster configurations (optimized for Spark), scheduling, and monitoring. Sure, all of these can be done natively in AWS, but it ends up being a pain. Instead, you can just set up your jobs via the Databricks UI.

Easy, right? Yes, but there’s always a risk associated with ease. It becomes much more difficult to track versions, especially when your Spark job is part of a larger pipeline.

Thankfully, Databricks also provides an amazing API to...

Continue reading →

Jun 5, 2018

The Fallacy of “Configuration-less” Systems

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

The risk of untracked infrastructure

The idea of a cloud service provider is amazing for many reasons. On-demand server instances was just the start - in the last few years, there’s been an explosion in easy-to-use services that manage tasks which were once difficult.

To run a PostgreSQL database, you used to provision a server, then set up and run your DB. For large amounts of non-DB data, you’d configure and run HDFS on a Hadoop cluster. And to capture streaming data, you would run a Kafka cluster.

Now there are many easier options that, at the end of the day, are way more cost-effective. Spin up an AWS RDS instance for your DB, use Azure Blob Storage for big data, and set up a Google PubSub topic for streams.

Nearly every example of these systems has a “Hello World!”...

Continue reading →

Dec 30, 2017

What I learned as a Data Engineer in 2017

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Another year coming to a close, another chance to reflect on my data engineering journey. This year, I had a challenge which was entirely new to me: make myself replaceable by building a team to do what I do. It pushed me to learn how to plan/prioritize work and mentor engineers. I also learned a lot about the “ideal” structure of a data engineering team (I plan to write on this later).

But there are many other ways I grew, and I figure I should pass on that knowledge to help other fledgling data engineers/teams.

Code reviews are important

How a team approaches code review says a lot about their priorities as a group. For example, it can be easy to make reviews only about running a linter and unit tests - but these are practices that can (and should) be automated with...

Continue reading →

Nov 18, 2017

How-To: Loading Eloqua Activity Data in to Google BigQuery

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Integrating Marketing Automation and Google Cloud Platform

Previously, we discussed various methods of data pipelines and their integration with Marketing Automation Platforms (MAPs) (see: Scaling Marketing Data Pipelines). The final state outlined was the use of a Big Data platform as the center for data management, with the MAP becoming a source for email activity data, and a sync for contact/segmentation data, as shown below:

This also sets the stage for future enhancements, such as advanced data science analyses, including machine learning.

In this article, we’ll walk through an example of importing email send data from Eloqua in to Google BigQuery (part of Google Cloud Platform). We’ll use an Apache Beam pipeline deployed in Google Cloud Dataflow to make this happen...

Continue reading →

Nov 16, 2017

Scaling Marketing Data Pipelines

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Lessons from Site Reliability Engineering: Part 2

If you’ve ever felt overwhelmed by the choices of vendors and tools available for data-driven marketing automation, you are far from alone. The need for operational, time-sensitive data pipelines is skyrocketing, mostly driven by the demand for fast response times expected by most customers.

When you break the problems/solutions of data processing down to it’s components, there are three phases of evolution that data pipelines go through: Entry-Level, basic functionality provided by many platforms; Mid-Level, usually custom scripted and implemented in-house; and Big Data, which turns marketing automation standards on their heads in order to innovate.

Operational / Time-Sensitive vs. Reporting

Before we dive in, it’s...

Continue reading →

Nov 12, 2017

Reading for Growing Data Engineers - 2017

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Books which have shaped my path in the last six months

It’s safe to say that I invest way too much in books. When I was in college, I got my hands on as many mathematics books as I could (after going through what was available - and understandable to my level - in the library). Thankfully that has carried over in to my data science and data engineering career.

I tend to buy books about any technology I have an interest in learning. While there is ample material online for learning Hadoop, Spark, Kubernetes, and others, it’s very easy to gloss over finer details in the interest of putting the tech in practice. Books tend to cover more in-depth knowledge, such as best practices for production, design nuances for long term maintainability and scalability, and security (let’s...

Continue reading →

Jeremiah Coleman

Thoughts about product management, engineering, and storytelling; based in Raleigh, NC

Page 2

Examining your Data-User Experience

Understanding your DUX

How I Grew as an Engineer in 2018

Incrementalization is difficult- but magical

Data is not important

Why Tech Storytelling Matters

Managing a Databricks Spark Environment with Ansible

Bringing configuration management to Big Data

The Fallacy of “Configuration-less” Systems

The risk of untracked infrastructure

What I learned as a Data Engineer in 2017

Code reviews are important

How-To: Loading Eloqua Activity Data in to Google BigQuery

Integrating Marketing Automation and Google Cloud Platform

Scaling Marketing Data Pipelines

Lessons from Site Reliability Engineering: Part 2

Operational / Time-Sensitive vs. Reporting

Reading for Growing Data Engineers - 2017

Books which have shaped my path in the last six months