Those who work in the data science community interact with data scientists, an increasingly popular job. As it turns out, not all data scientists are alike. To prove this, researchers at UCLA and Microsoft conducted a study. Their research found that there are nine different kinds of personalities for those who choose to become data scientists.
A UCLA computer science department associate professor, Miryung Kim, gave a speech at the Strata Data Conference that showed off her research into the software development and data science communities. During her research, she surveyed 793 professional data scientists at Microsoft. Kim analyzed how they spent their professional and private time, including what tools they use and the obstacles they come across in their jobs.
Kim and a team of researchers took the survey and analyzed them through a clustering algorithm. The results were published in a 17-page paper called “Data Scientists in Software Teams: State of the Art and Challenges,” available at the IEEE Xplore Digital Library.
The first thing the intrepid investigators found out is that not everyone who practices data science thinks of themselves as data scientists. Instead, about 40 percent of the respondents did identify as data scientists, while 24 percent said they were software engineers. Meanwhile, Kim concluded 532 were really data scientists.
The education levels of these people varied widely: one-third had bachelor's degrees, 22 percent had Ph.D.s, and 41 percent had a master's degree. The group had an average of 13.6 years of experience, with an average of 10 spent analyzing data.
The team's clustering algorithm showed patterns in how data scientists spent their days. Because this was a pretty active group, Kim came up with nicknames for them.
Let's take a look at the nine different kinds of data scientists that research revealed.
This kind of data scientist spends 25 percent of their day making queries to data and another 20 percent preparing the data for those who analyze it. They work with SQL and are unlikely to work with machine learning algorithms.
The Data Shaper has many of the same skills as our data preparer above. However, they bring an additional layer of expertise, like machine learning or experience with MATLAB, Python and other tools. They're much more likely to have an advanced degree, such as a Ph.D., and are far less likely to use SQL or another structured data language.
The Data Analyzer
Data scientists who spend most of the day analyzing data are included here. This group is likely to have a lot of experience with math, classical statistics, and data manipulation. According to Kim, they also like to use R.
You could be a Platform Builder if half of your time goes toward building platforms or code in order to collect data. They're far more likely to use tools like Hadoop, a distributed system. You'll find “engineer” in most of their titles, but a Ph.D. is rare.
This professional spends much of his or her time engaging with line-of-business stakeholders and associates in product development. He or she is not likely to use SQL or work with structured data.
This data science dedicates 60 percent of her day reacting to insight, and 20 percent distributing insight they discover from the data. This is a small group but represented a statistically significant faction.
You might be a data scientist without realizing it. Software engineers, as well as program managers, sometimes spend half their time utilizing data science skills and the other half doing something else fall into this category.
This is saved for engineers and managers who dabble in the data science realm.
This “jack of all trades” data scientist completes a variety of data-centered tasks. They might build platforms or collect data one day and analyze and act on it another day. Polymaths typically have a Ph.D., probably use Python and gaming statistics such as Bayesian-style Monte Carlo. Kim felt that data science is a buzzword that covers a wide net of people who do different kinds of work activities.
If you worked in data science, the biggest challenges reported by this group may sound familiar. The main categories were analysis, data, and people.
Poor data quality was one of the most common problems. Survey respondents said that those who give them data think it's the data scientist's job to correct quality issues. This task is typically relegated to those who consume the data.
The availability of data was also cited as a common problem. This refers to the inability to get into legacy systems and missing values in the current database. The merging of data streams is called data integration, and it remains a challenge for most data scientists around the globe.
Scale is another big challenge when it comes to the analysis. The term "big data" is still used to demonstrate the challenges of slicing and dicing huge data warehouse resources. Those responding to the survey thought that it took too long to collect data in the current system. They thought this was true whether they were using Hadoop or Cosmos, a Microsoft framework to distribute and process data.
When it comes to personnel, the UCLA researchers found a major impediment to the overall success of data science. The was a lag in turning over results to the decision maker who needs to see them. If you are in this field, you may also realize the incredible pressure to stay current on technologies and tools.
Utilize Your Unique Personality
Regardless of your group, your job outlook is great if you have the right skills. When you're ready to look for your next opportunity, come talk to us. Click below to see our open roles and start applying today! Our recruiters will be happy to help you take the next step in your career.