December 30, 2017

What I learned as a Data Engineer in 2017

This blog is going away soon! :( Check out my new site where you can read the latest and subscribe for updates!

Another year coming to a close, another chance to reflect on my data engineering journey. This year, I had a challenge which was entirely new to me: make myself replaceable by building a team to do what I do. It pushed me to learn how to plan/prioritize work and mentor engineers. I also learned a lot about the “ideal” structure of a data engineering team (I plan to write on this later).

But there are many other ways I grew, and I figure I should pass on that knowledge to help other fledgling data engineers/teams.

Code reviews are important #

How a team approaches code review says a lot about their priorities as a group. For example, it can be easy to make reviews only about running a linter and unit tests - but these are practices that can (and should) be automated with CI/CD. Alternatively, reviews might get very weighed down by particulars of a programming language - also very important (holding each others to high standards), but doesn’t contribute much to the growth of the team. In fact, it can quickly exacerbate frustrations with project momentum.
Instead, I believe the following are most beneficial to a data engineering team:

How does this impact business process or measurement? Are we putting any of our solutions at risk?
Are there design patterns we should standardize/automate?
How are we tackling common scale issues (i.e., throughput, utilization, etc)?

What about reviewing details of syntax and convention? I believe those should be pushed to smaller, one-off meetings. Most of my DEs will go over these details in short 1–1 meetings.

Keep in mind that this approach seems to work for a distributed DE team of five people. Larger (or smaller) teams may need modify their approach.

CI/CD can be your greatest teaching tool #

CI/CD pipelines are an industry norm for many reasons. They automate testing and deployment so that dev teams can deploy updates faster.
However, I need to make an honest and open confession: at the beginning of this year, I thought they were a waste of time. And at that time, they were - using Openshift V2, deploying new code was as simple as running a git push to the app server. Two things happened which changed my views.

Migration to Openshift V3
The internally-supported PaaS was being traded up in versions, which was a major shift, going from a simple server-based deployment to Kubernetes. While there were immediate benefits (scalability and flexibility), complexity also increased exponentially. Now rolling out new code meant a git push, running a new Docker image build, and ensuring the correct image was being used. Then there were Openshift config objects themselves - these were also complex to manage, it it quickly became clear that the whole process needed to be automated if we were going to fully benefit from Openshift.
Teaching
Building off #1, during this migration I was also building a brand new data engineering team. The people I brought on had amazing skill sets, but most were not immediately familiar with cloud-based app deployment. They were, instead, amazing Python and SQL engineers, and that’s where I needed their initial focus. CI/CD then became a teaching tool; those who were familiar with Kubernetes could build CI/CD for new projects, and the rest could focus on feature development while learning Kubernetes/Openshift/CI/CD at their own pace. Essentially, our dev onboarding/training philosophy was “shoot first, ask questions later.”

Embrace the architecture review #

Architecture became an important focus for me right off-the-bat this year. IT engineers within the company had always spoken of it as something to be avoided - it was seen as a tedious process which slowed results. Given that our Marketing Operations team was edging more and more in to the IT space, I decided to tackle it head-on.

In order, this meant:

Taking a class offered by Red Hat’s EA group, “How to think like an architect,” which focused on what EA is about and how it is enacted within the company
Acting as a business stakeholder in an application architecture review
Taking an application architecture of our own for review, acting as the architect

Thankfully, Red Hat’s EA group is focused on teaching, not governing. (With so many internal groups building IT structures, it’s the only chance they have to scale.) What this means for me is that I had a massive information download on not only systems design, but also storytelling in the technical sense. It forced me to learn how to distill the business problem to a story that could be repeatably told, and present the technical solution to show investment strategy (i.e., risk reduction and cost/benefit analysis).

If your org does not have an EA group, then open yourself up for critical feedback from peers. Seek out people who will actively point out every flaw in the big-picture of your presentation. Most importantly, learn to explain your problem and solution to people who are not subject matter experts; once you can help them understand your architecture, you’ll be able to understand it yourself, and save a lot of heartache in the future.

I hope you find some helpful elements here, and hopefully avoid making mistakes that I made. Please leave a comment if you have any additions, criticisms, or questions!

Feel free to connect with me!

Kudos

What I learned as a Data Engineer in 2017

Code reviews are important #

CI/CD can be your greatest teaching tool #

Embrace the architecture review #

Now read this

Using Python to import contacts to Eloqua