A data scientist is a professional responsible for collecting, analyzing and interpreting large amounts of data to identify ways to help a business improve operations and gain a competitive edge over rivals.
The data scientist role is an offshoot of the statistician role that includes the use of advanced analytics technologies, including machine learning and predictive modeling, to provide insights beyond statistical analysis. The demand for data science skills has grown significantly in recent years as companies look to glean useful information from the voluminous amounts of structured, unstructured and semistructured data that a large enterprise produces and collects -- collectively referred to as big data.
Data scientist roles and responsibilities
The mix of personality traits, experience and analytics skills required for the data scientist role is considered difficult to find, and, thus, the demand for qualified data scientists has exceeded supply in recent years. Data scientist topped the list of 50 Best Jobs in America by Glassdoor.com in 2016 and again in 2017, based on metrics such as job satisfaction, number of job openings and median base salary.
Basic responsibilities include gathering and analyzing data, and using various types of analytics and reporting tools to detect patterns, trends and relationships in data sets. Data scientists typically work in teams to mine big data for information that can be used to predict customer behavior and identify business risks and opportunities.
These professionals are tasked with developing statistical learning models for data analysis and must have experience using statistical tools, as well as the ability to create and assess complex predictive models.
A data scientist uses large amounts of data to develop hypotheses, make inferences and hone in on customer, business and market trends. The data scientist must be able to communicate how to use analytics data to drive business decisions that may include changing course, improving a process or product, or creating new services or products. With the latter, the data scientist is involved in the development process. In the case of software, for example, the data scientist's role involves using data analytics to prescribe new features.
Data scientists also set best practices for data collection, use of analytics technology and data interpretation.
Data scientist skills
Soft skills required for data scientists include intellectual curiosity combined with skepticism and intuition, along with creativity. Interpersonal skills are also a critical part of the role, and many employers want their data scientists to be data storytellers who know how to present data insights to people at all levels of an organization. They also need leadership skills to steer data-driven decision-making processes in an organization.
The education requirements for data scientists typically include a bachelor's degree in statistics, data science, computer science or mathematics.
Hard skills required for the job include data mining, machine learning and the ability to integrate structured and unstructured data. Experience with statistical research techniques, such as modeling, clustering and segmentation, is also often necessary.
Data science requires knowledge of a number of big data platforms and tools, including Hadoop, Pig, Hive, Spark and MapReduce, and programming languages that include structured query language (SQL), Python, Scala and Perl, as well as statistical computing languages such as R.
Data scientist vs. data analyst
The role of data scientist is often confused with that of data analyst, but while there is overlap in many of the skills, there are significant differences.
Though the role of a data analyst varies depending on the company, in general, these professionals collect data, process that data and perform statistical analysis using standard statistical tools and techniques. Analysts also identify patterns and make correlations in data sets to identify new opportunities for improvements in business processes, products or services. In some cases, data analysts also design, build and maintain big data and relational database systems. The average U.S. data analyst salary as of May 2017 is $60,476, according to Glassdoor.com.
Data scientists are responsible for those tasks and many more. These professionals are equipped to analyze big data using advanced analytics tools and are expected to have the research background to develop new algorithms for specific problems. They may also be tasked with exploring data without a specific problem to solve. In that scenario, they need to understand the data and the business well enough to formulate questions and deliver insights back to business executives with the goal of improving business operations, products, services or customer relations.
Those additional responsibilities amount to a salary more than double that of a data analyst; Glassdoor.com puts the U.S. average data scientist salary at $113,436, as of May 2017.