Data Scientist or Data Engineer? Think Rock Star and Roadie
November 10, 2016 No CommentsFeatured article by Neeraj Chadha, product manager, Learning@Cisco
Data scientists are the rock stars for several Internet of Things (IoT) applications. They get most of the attention and acclaim. They extract critical intelligence from Big Data so businesses can make informed decisions on the spot.
But they don’t do their work in a vacuum. Data scientists can’t rock the IoT arena without roadies, otherwise known as data engineers. These unheralded champions ensure that Big Data keeps flowing.
Data engineers design and maintain the networks and software that keep the Big Data pipeline operating. Like the rock band’s crew, data engineers set the stage and keep it humming.
The roles of data scientist and data engineers can be confusing because there is some overlap. Data engineer and data scientist are not different titles for the same job, however.
The two jobs need different skills and experience. Some data scientists can do data engineering. Some data engineers can do data analysis and data visualization.
Large applications call for the skills of data engineers. Research is a primary focus of the data scientist.
Like roadies, data engineers are a special breed. The best have certain personality traits that help them excel: focus, mechanical aptitude, patience and persistence.
Good data engineers get down in the trenches. They want to understand how and why data pipelines work — or don’t. Data engineers need patience and persistence to set things right..
What data engineers do
Data engineers make it possible for data scientists to do modeling. They gather, store and process data so that data scientists can analyze it for insights.
Responsible for data management, data engineers handle procedures, guidelines and standards. They develop data management technologies and software engineering tools.
They design custom software and discover ways to recover from disasters. They improve data reliability, efficiency and quality. User-defined functions and analytics are part of a data engineer’s job, too.
Data scientists have a less nuts-and-bolts relationship to data. They handle analytic projects that arise from the needs of the business.
Data scientists also take on data mining architectures, modeling standards, reporting and data methodologies. They manage data mining system performance and efficiency, too.
Data engineers’ work is valuable because they build and maintain the data pipelines that send information to data scientists. They can run basic learning models if they understand algorithms.
But data scientists tackle business problems that take sophisticated machine learning algorithms. Really good data scientists adapt machine learning models to meet changing requirements of the business or agency.
Tackling Big Data’s toughest challenges
Meanwhile, the data engineers take on the tough challenges of database integration and unstructured Big Data. They must clean up that unstructured data before they pass it to anyone in the organization who needs it.
Like roadies building a sound stage, data engineers set up the foundations for data scientists to work easily with data. Data engineers should know data warehousing, database design, data collection and transfer, and coding.
The tools data engineers use depend mostly on which part of the data pipeline they focus on. Data engineers at the rear of the pipeline build APIs for data consumption, integrate datasets from external sources and analyze how the data is used to support business growth.
Python is a good language for them. They use it to write code related to data ingestion. Python can talk to any data store, such as NoSQL or RDBMS.
Data engineers might have to use Big Data technologies like Hadoop and Spark to suggest improvements based on how data is used.
Among the important tools for a data engineer are:
– Hadoop and related tools such as HBase, Hive, Pig, etc.
– Spark.
– NoSQL databases e.g. Cassandra and MongoDB
– Pentaho.
– JavaScript.
– VMware.
Looking Ahead: Demand Growing
In the United States, data engineers’ average salary is $95,526. Their salary low is $65,000 and the top reaches $121,000.
U.S. demand for these jobs should grow 15 percent by 2024. That is faster than the average for all U.S. occupations.
Some of the biggest names in business and the U.S. government are ramping up their requirements for both positions.
A 2015 survey polled 422 executives in the U.S. and Europe. The survey asked them about the digital skills most in demand in industries like financial services, healthcare, manufacturing and retail. In three years, 43 percent of the executives said that analytics and big data skills will be the most important digital capabilities at their companies.
Demand for both data engineers is strong now and growing. Those who invest in developing or updating their IT skills to acquire the ones needed for either job will be in a strong position to reap the career rewards.
About Neeraj Chadha
Learning@Cisco product manager Neeraj Chadha has more than 20 years of experience in the networking industry. Over that time, he has functioned as a software developer and network engineer, and in various aspects of product management. Currently, he guides the overall product strategy and evolution of Cisco courseware and certifications around wireless, collaboration, and Big Data and analytics. Neeraj’s primary areas of focus include technology trends, digital transformation, continuing education and product strategy.