What I Wish I Knew Before I Became A Data Engineer
Photo by Chris Ainsworth on Unsplash
As 2022 approaches, Data Engineering is primed to be one of the most important and fastest-growing fields of work. According to Forbes, data scientists, machine learning engineers, and big data engineers are three of the top trending professions on Linkedin.
This is further backed by survey's done by InterviewQuery that surveyed 10,000 data science and engineering roles and found that data engineering interviews were on the rise.
This is causing people to wonder.
Should I become a data engineer?
To better help those who are thinking about taking on a career in data engineering, I wanted to reflect upon my own career.
What are things I wish I knew prior to starting my career as a data engineer?
How could I have better approached some of my earlier years?
These are questions I hope to answer.
Here are the top five things that I wish I knew before beginning my career in data engineering.
1. Don't Get Distracted by the Hype.
The first thing I wish I had known when beginning my data engineering journey was not to get distracted by the hype!
With endless software being created seemingly every day, it is essential that you understand that mastering your fundamental data engineering skills and core frameworks is more important than getting caught up in the latest and greatest tech product.
I have discussed some of these core concepts in my data engineering roadmap.
But to summarize. Focus on learning skills like programing, SQL, basic data analysis and data modeling, data pipelines and data warehousing.
Here are some of the main reasons why it is so important to focus on learning the fundamentals before using the "shiny" new software:
- The new tools are often trends and have a non-guaranteed lifespan. One of the issues with the latest software is there always seems to be that "next" shiny new tool right around the corner. If you do not fully grasp the core skills and move on to software that has a short life cycle, you can drastically set back the progress of your career.
- Mastering the fundamentals will help you understand how to utilize new software best. If you start by first mastering the fundamental data processes, it will be easier to understand how new software can help you streamline your work. Many data engineers get stuck on some of the latest tools and do not efficiently utilize them because they do not fully grasp the fundamentals.
- Mastering fundamentals allow you to switch careers if you don't like data engineering. Although some roles like software engineering is different than data engineering. There is some cross over. Both roles require programming, understanding data structures as well as working with cloud components. If you really end up disliking data engineering, then you might be able to switch. Of course, this will depend how much code you write as a DE because if you're using low-code solutions, it might not be an easy switch.
2. Focus on Developing Maintainable Code
Learning how to create maintainable code is one of the essential steps in becoming a data engineer.
When you are new to the industry and want to create a complex system using a combination of skills from Python to Bash with a combination of any other coding tool, you are essentially creating software that only you can maintain.
While this may be fine in the short term if you decide to move on with your career or other team members have to utilize your software, it will either have to be re-written or thrown out.
Make sure that when you create a project, it is simple for other data engineers to read and interpret for the project's longevity.
3. The Lie That Is Source of Truth
A source of truth in the data world means that there is one canonical copy of the data.
This means that any time changes are made, they will be tracked and documented, so everyone knows what has changed with each revision.
Suppose you can master this skill early on in your data engineering career. In that case, it will save you a lot of headache down the line when organizing datasets for different purposes or making sure that all stakeholders have access to an up-to-date version of the dataset.
One common mistake I see new data engineers make is not deciding how they want to manage their source of truth from day one. This often ends up causing them hours upon hours getting organized later on because they did not decide which tool best suited their needs from the beginning.
Something important to remember about the source of truth is that it is a moving target rather than a final destination.
4. Save Your SQL
SQL has a lot of complex business logic that can be very challenging to remember if you make even the slightest change.
When I began my career as a data engineer and did not know this significant fact about saving my queries from the beginning, I spent hours fixing datasets because they were suddenly not functioning how I wanted them to.
Make sure that every time you create a dataset, it is saved in its current state, so if something breaks or needs to be changed, later on, you do not have to recreate the entire thing from scratch.
Remembering your initial queries is not realistic. Use version control, so you do not have to remember your exact queries. There are a lot of different options in how you could approach this. You could create a repository for your analytical teams in Github or utilize tools like dbt that help you utilize best practices like dev and prod as well as version control.
5. Saying Yes to Every Request Is Not The Answer
A widespread problem that many data engineers face is saying yes to every request! Ok, let's be real, this is a wide-spread issue of most people regardless of career choice.
But that doesn't change it's importance.
Of course, you want to help as many people as possible, but the reality of the matter is that your time is valuable, and even the smallest tasks can be very time-consuming. I have had tasks as simple as adding a column turning into a 2-month project because a Python 3 migration was required before I could even add a column(Don't ask).
Something helpful to do before agreeing to a task is first to analyze the level of importance that the task may have. If you decide that the task is of low importance, you may want to consider declining the task.
Also, consider how much work you already have on your plate. Even if you have a high-priority task come across your desk, you can't manifest new time.
So if you're already low on how much work you can take on, make sure the current requesting party as well as possibly your manager knows your current constraints.
That way you can reassess and reprioritize what needs to get done.
This will not only save you time but ensure that your team is aware of what projects are more important than others, and they can prioritize accordingly!
Conclusion
Data engineering can be a very rewarding profession, and it is increasing in demand and importance with the data-driven world that we now live in. However, there are many things that you must be aware of before starting your data engineering career!
Focusing on your core skills, developing maintainable software, understanding the source of truth, saving your SQL, and learning how to say no are some of the most important lessons I have learned over my career as a data engineer.
Data Engineer Vs Analytics Engineer Vs Analyst
Why Migrate To The Modern Data Stack And Where To Start
5 Great Data Engineering Tools For 2021 -- My Favorite Data Engineering Tools
4 SQL Tips For Data Scientists
What Are The Benefits Of Cloud Data Warehousing And Why You Should Migrate