Loading…

Data Scientists in Software Teams: State of the Art and Challenges

The demand for analyzing large scale telemetry, machine, and quality data is rapidly increasing in software industry. Data scientists are becoming popular within software teams, e.g., Facebook, LinkedIn and Microsoft are creating a new career path for data scientists. In this paper, we present a lar...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on software engineering 2018-11, Vol.44 (11), p.1024-1038
Main Authors: Kim, Miryung, Zimmermann, Thomas, DeLine, Robert, Begel, Andrew
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c404t-aa1197028420ef0e4c18a6a526846940c413b336d0067f7326a78fec76a650893
cites cdi_FETCH-LOGICAL-c404t-aa1197028420ef0e4c18a6a526846940c413b336d0067f7326a78fec76a650893
container_end_page 1038
container_issue 11
container_start_page 1024
container_title IEEE transactions on software engineering
container_volume 44
creator Kim, Miryung
Zimmermann, Thomas
DeLine, Robert
Begel, Andrew
description The demand for analyzing large scale telemetry, machine, and quality data is rapidly increasing in software industry. Data scientists are becoming popular within software teams, e.g., Facebook, LinkedIn and Microsoft are creating a new career path for data scientists. In this paper, we present a large-scale survey with 793 professional data scientists at Microsoft to understand their educational background, problem topics that they work on, tool usages, and activities. We cluster these data scientists based on the time spent for various activities and identify 9 distinct clusters of data scientists, and their corresponding characteristics. We also discuss the challenges that they face and the best practices they share with other data scientists. Our study finds several trends about data scientists in the software engineering context at Microsoft, and should inform managers on how to leverage data science capability effectively within their teams.
doi_str_mv 10.1109/TSE.2017.2754374
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8046093</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8046093</ieee_id><sourcerecordid>2132075479</sourcerecordid><originalsourceid>FETCH-LOGICAL-c404t-aa1197028420ef0e4c18a6a526846940c413b336d0067f7326a78fec76a650893</originalsourceid><addsrcrecordid>eNo9kEtPAjEUhRujiYjuTdw0cT14-27dIeAjIXExuG7qcEeGwAy2JcZ_7xCIq7P5zr05HyG3DEaMgXtYlLMRB2ZG3CgpjDwjA-aEK4TicE4GAM4WSll3Sa5SWgOAMkYNyNM05EDLqsE2Nykn2rS07Or8EyLSBYZteqRlDhlpV9O8QjqOmYZ2SSersNlg-4XpmlzUYZPw5pRD8vE8W0xei_n7y9tkPC8qCTIXITDmDHArOWANKCtmgw6Kayu1k1BJJj6F0EsAbWojuA7G1lgZHbQC68SQ3B_v7mL3vceU_brbx7Z_6TkTHPrd5kDBkapil1LE2u9isw3x1zPwB1O-N-UPpvzJVF-5O1YaRPzHLUgNTog_2_FhUw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2132075479</pqid></control><display><type>article</type><title>Data Scientists in Software Teams: State of the Art and Challenges</title><source>IEEE Xplore (Online service)</source><creator>Kim, Miryung ; Zimmermann, Thomas ; DeLine, Robert ; Begel, Andrew</creator><creatorcontrib>Kim, Miryung ; Zimmermann, Thomas ; DeLine, Robert ; Begel, Andrew</creatorcontrib><description>The demand for analyzing large scale telemetry, machine, and quality data is rapidly increasing in software industry. Data scientists are becoming popular within software teams, e.g., Facebook, LinkedIn and Microsoft are creating a new career path for data scientists. In this paper, we present a large-scale survey with 793 professional data scientists at Microsoft to understand their educational background, problem topics that they work on, tool usages, and activities. We cluster these data scientists based on the time spent for various activities and identify 9 distinct clusters of data scientists, and their corresponding characteristics. We also discuss the challenges that they face and the best practices they share with other data scientists. Our study finds several trends about data scientists in the software engineering context at Microsoft, and should inform managers on how to leverage data science capability effectively within their teams.</description><identifier>ISSN: 0098-5589</identifier><identifier>EISSN: 1939-3520</identifier><identifier>DOI: 10.1109/TSE.2017.2754374</identifier><identifier>CODEN: IESEDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Best practices ; Data science ; Demand analysis ; development roles ; industry ; Interviews ; Scientists ; Sociology ; Software ; Software engineering ; State of the art ; Statistics ; Telemetry</subject><ispartof>IEEE transactions on software engineering, 2018-11, Vol.44 (11), p.1024-1038</ispartof><rights>Copyright IEEE Computer Society 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c404t-aa1197028420ef0e4c18a6a526846940c413b336d0067f7326a78fec76a650893</citedby><cites>FETCH-LOGICAL-c404t-aa1197028420ef0e4c18a6a526846940c413b336d0067f7326a78fec76a650893</cites><orcidid>0000-0003-4905-1469 ; 0000-0003-3802-1512</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8046093$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Kim, Miryung</creatorcontrib><creatorcontrib>Zimmermann, Thomas</creatorcontrib><creatorcontrib>DeLine, Robert</creatorcontrib><creatorcontrib>Begel, Andrew</creatorcontrib><title>Data Scientists in Software Teams: State of the Art and Challenges</title><title>IEEE transactions on software engineering</title><addtitle>TSE</addtitle><description>The demand for analyzing large scale telemetry, machine, and quality data is rapidly increasing in software industry. Data scientists are becoming popular within software teams, e.g., Facebook, LinkedIn and Microsoft are creating a new career path for data scientists. In this paper, we present a large-scale survey with 793 professional data scientists at Microsoft to understand their educational background, problem topics that they work on, tool usages, and activities. We cluster these data scientists based on the time spent for various activities and identify 9 distinct clusters of data scientists, and their corresponding characteristics. We also discuss the challenges that they face and the best practices they share with other data scientists. Our study finds several trends about data scientists in the software engineering context at Microsoft, and should inform managers on how to leverage data science capability effectively within their teams.</description><subject>Best practices</subject><subject>Data science</subject><subject>Demand analysis</subject><subject>development roles</subject><subject>industry</subject><subject>Interviews</subject><subject>Scientists</subject><subject>Sociology</subject><subject>Software</subject><subject>Software engineering</subject><subject>State of the art</subject><subject>Statistics</subject><subject>Telemetry</subject><issn>0098-5589</issn><issn>1939-3520</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNo9kEtPAjEUhRujiYjuTdw0cT14-27dIeAjIXExuG7qcEeGwAy2JcZ_7xCIq7P5zr05HyG3DEaMgXtYlLMRB2ZG3CgpjDwjA-aEK4TicE4GAM4WSll3Sa5SWgOAMkYNyNM05EDLqsE2Nykn2rS07Or8EyLSBYZteqRlDhlpV9O8QjqOmYZ2SSersNlg-4XpmlzUYZPw5pRD8vE8W0xei_n7y9tkPC8qCTIXITDmDHArOWANKCtmgw6Kayu1k1BJJj6F0EsAbWojuA7G1lgZHbQC68SQ3B_v7mL3vceU_brbx7Z_6TkTHPrd5kDBkapil1LE2u9isw3x1zPwB1O-N-UPpvzJVF-5O1YaRPzHLUgNTog_2_FhUw</recordid><startdate>20181101</startdate><enddate>20181101</enddate><creator>Kim, Miryung</creator><creator>Zimmermann, Thomas</creator><creator>DeLine, Robert</creator><creator>Begel, Andrew</creator><general>IEEE</general><general>IEEE Computer Society</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>K9.</scope><orcidid>https://orcid.org/0000-0003-4905-1469</orcidid><orcidid>https://orcid.org/0000-0003-3802-1512</orcidid></search><sort><creationdate>20181101</creationdate><title>Data Scientists in Software Teams: State of the Art and Challenges</title><author>Kim, Miryung ; Zimmermann, Thomas ; DeLine, Robert ; Begel, Andrew</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c404t-aa1197028420ef0e4c18a6a526846940c413b336d0067f7326a78fec76a650893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Best practices</topic><topic>Data science</topic><topic>Demand analysis</topic><topic>development roles</topic><topic>industry</topic><topic>Interviews</topic><topic>Scientists</topic><topic>Sociology</topic><topic>Software</topic><topic>Software engineering</topic><topic>State of the art</topic><topic>Statistics</topic><topic>Telemetry</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Miryung</creatorcontrib><creatorcontrib>Zimmermann, Thomas</creatorcontrib><creatorcontrib>DeLine, Robert</creatorcontrib><creatorcontrib>Begel, Andrew</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Xplore (Online service)</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><jtitle>IEEE transactions on software engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Miryung</au><au>Zimmermann, Thomas</au><au>DeLine, Robert</au><au>Begel, Andrew</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Data Scientists in Software Teams: State of the Art and Challenges</atitle><jtitle>IEEE transactions on software engineering</jtitle><stitle>TSE</stitle><date>2018-11-01</date><risdate>2018</risdate><volume>44</volume><issue>11</issue><spage>1024</spage><epage>1038</epage><pages>1024-1038</pages><issn>0098-5589</issn><eissn>1939-3520</eissn><coden>IESEDJ</coden><abstract>The demand for analyzing large scale telemetry, machine, and quality data is rapidly increasing in software industry. Data scientists are becoming popular within software teams, e.g., Facebook, LinkedIn and Microsoft are creating a new career path for data scientists. In this paper, we present a large-scale survey with 793 professional data scientists at Microsoft to understand their educational background, problem topics that they work on, tool usages, and activities. We cluster these data scientists based on the time spent for various activities and identify 9 distinct clusters of data scientists, and their corresponding characteristics. We also discuss the challenges that they face and the best practices they share with other data scientists. Our study finds several trends about data scientists in the software engineering context at Microsoft, and should inform managers on how to leverage data science capability effectively within their teams.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TSE.2017.2754374</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-4905-1469</orcidid><orcidid>https://orcid.org/0000-0003-3802-1512</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0098-5589
ispartof IEEE transactions on software engineering, 2018-11, Vol.44 (11), p.1024-1038
issn 0098-5589
1939-3520
language eng
recordid cdi_ieee_primary_8046093
source IEEE Xplore (Online service)
subjects Best practices
Data science
Demand analysis
development roles
industry
Interviews
Scientists
Sociology
Software
Software engineering
State of the art
Statistics
Telemetry
title Data Scientists in Software Teams: State of the Art and Challenges
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T23%3A25%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Data%20Scientists%20in%20Software%20Teams:%20State%20of%20the%20Art%20and%20Challenges&rft.jtitle=IEEE%20transactions%20on%20software%20engineering&rft.au=Kim,%20Miryung&rft.date=2018-11-01&rft.volume=44&rft.issue=11&rft.spage=1024&rft.epage=1038&rft.pages=1024-1038&rft.issn=0098-5589&rft.eissn=1939-3520&rft.coden=IESEDJ&rft_id=info:doi/10.1109/TSE.2017.2754374&rft_dat=%3Cproquest_ieee_%3E2132075479%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c404t-aa1197028420ef0e4c18a6a526846940c413b336d0067f7326a78fec76a650893%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2132075479&rft_id=info:pmid/&rft_ieee_id=8046093&rfr_iscdi=true