Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Scientists in Software Teams: State of the Art and Challenges (survey…
Data Scientists in Software Teams: State of the Art and Challenges
Main questions
RQ2. How do data scientists work? What tasks do they work on, how do they spend their time, and what tools do they use?
RQ3. What challenges do data scientists face and what are the best practices and advice to overcome those chal- lenges?
RQ1. What is the demographic and educational background of data scientists at Microsoft?
RQ4. How do data scientists increase confidence about the correcthness of their work?
Purpose of data science
understanding customer
understanding user behavior
assessing developer productivity
assessing software quality
Data science usage
automated telemetry instrumentation
live monitoring
survey design
Working styles
Time spent
Skills and self-perception
Challenges
Demographics
Best practices
Correctness
Target population
Employees with interest in data science
Full-time data scientist employees
data analyse by clusterisation
define data scientist at microsoft
Job title
Education
Professional experience
Skill
Some statistics
51 percent build predictive models from the data (Modeling Specialist)
36 percent build data engineering platforms to collect and process a large quantity of data and use big data cloud computing platforms (Platform Builders)
60 percent use big data cloud computing platforms to analyze large data
31 percent add logging code or other forms of instru- mentation to collect the data required for analysis (Polymaths)
76 percent communicate results and insights to business leaders (Insight Provider)
12 percent manage a team of data scientists (Team Leaders)
81 percent report that they analyze product and customer data
Main topics data analyst work on
Software Productivity and Quality
Domain-Specific Problems
User Engagement
Business Intelligence
Tool Box
SQL and Excel are popular (48 and 59 percent)
R (29 percent)
MATLAB (5 percent)
Minitab (4 percent)
SPSS (3 percent)
JMP (2 percent)
challenges faced
Data Quality
Data Availability
Data Preparation
analysis
Scale
Machine Learning
ensure quality
Cross Validation (Multi-Dimensional)
Check Data Distribution
Dogfood Simulation
Type and Schema Checking
Repeatability
Check Implicit Constraints