Loading…

A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)

In recent decades, the amount of text available for organizational science research has grown tremendously. Despite the availability of text and advances in text analysis methods, many of these techniques remain largely segmented by discipline. Moreover, there is an increasing number of open-source...

Full description

Saved in:
Bibliographic Details
Published in:Journal of business and psychology 2018-08, Vol.33 (4), p.445-459
Main Authors: Banks, George C., Woznyj, Haley M., Wesslen, Ryan S., Ross, Roxanne L.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c418t-d544b875c4dc108fce9250f192b8d6d9944001d91791048540e0dcd38c8c59183
cites cdi_FETCH-LOGICAL-c418t-d544b875c4dc108fce9250f192b8d6d9944001d91791048540e0dcd38c8c59183
container_end_page 459
container_issue 4
container_start_page 445
container_title Journal of business and psychology
container_volume 33
creator Banks, George C.
Woznyj, Haley M.
Wesslen, Ryan S.
Ross, Roxanne L.
description In recent decades, the amount of text available for organizational science research has grown tremendously. Despite the availability of text and advances in text analysis methods, many of these techniques remain largely segmented by discipline. Moreover, there is an increasing number of open-source tools (R, Python) for text analysis, yet these tools are not easily taken advantage of by social science researchers who likely have limited programming knowledge and exposure to computational methods. In this article, we compare quantitative and qualitative text analysis methods used across social sciences. We describe basic terminology and the overlooked, but critically important, steps in pre-processing raw text (e.g., selection of stop words; stemming). Next, we provide an exploratory analysis of open-ended responses from a prototypical survey dataset using topic modeling with R. We provide a list of best practice recommendations for text analysis focused on (1) hypothesis and question formation, (2) design and data collection, (3) data pre-processing, and (4) topic modeling. We also discuss the creation of scale scores for more traditional correlation and regression analyses. All the data are available in an online repository for the interested reader to practice with, along with a reference list for additional reading, an R markdown file, and an open source interactive topic model tool (topicApp; see https://github.com/wesslen/topicApp, https://github.com/wesslen/text-analysis-org-science, https://dataverse.unc.edu/dataset.xhtml?persistentId=doi:10.15139/S3/R4W7ZS).
doi_str_mv 10.1007/s10869-017-9528-3
format article
fullrecord <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_journals_1992788273</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>48700765</jstor_id><sourcerecordid>48700765</sourcerecordid><originalsourceid>FETCH-LOGICAL-c418t-d544b875c4dc108fce9250f192b8d6d9944001d91791048540e0dcd38c8c59183</originalsourceid><addsrcrecordid>eNp9kEFLAzEQhYMoWKs_wIMQ8KKH6GST7CbHWqwKglJaEC9hm2RlS7tZk63af2_KinjyNDB87828h9AphSsKUFxHCjJXBGhBlMgkYXtoQEXBCBPsZR8NQEpFWJbLQ3QU4xIABM1hgF5HeOo-aveJfYVvXOzwcyhNVxuX9sav166xZVf7JuLKBzxzXx0eNeVqG-uI6wZP8UXZWFzieXSBTEKd-NUWj9r28hgdVOUqupOfOUTzye1sfE8en-4exqNHYjiVHbGC84UshOHWpBCVcSoTUFGVLaTNrVKcA1CraKEocCk4OLDGMmmkEYpKNkTnvW8b_PsmRdBLvwnpx6ipUlkhZVawRNGeMsHHGFyl21Cvy7DVFPSuQt1XqFOFeleh3mmyXhMT27y58Mf5H9FZL1rGzoffK1wWic8F-wZy43sD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1992788273</pqid></control><display><type>article</type><title>A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)</title><source>EBSCOhost Business Source Ultimate</source><source>ABI/INFORM global</source><source>JSTOR Archival Journals and Primary Sources Collection</source><source>Springer Nature</source><creator>Banks, George C. ; Woznyj, Haley M. ; Wesslen, Ryan S. ; Ross, Roxanne L.</creator><creatorcontrib>Banks, George C. ; Woznyj, Haley M. ; Wesslen, Ryan S. ; Ross, Roxanne L.</creatorcontrib><description>In recent decades, the amount of text available for organizational science research has grown tremendously. Despite the availability of text and advances in text analysis methods, many of these techniques remain largely segmented by discipline. Moreover, there is an increasing number of open-source tools (R, Python) for text analysis, yet these tools are not easily taken advantage of by social science researchers who likely have limited programming knowledge and exposure to computational methods. In this article, we compare quantitative and qualitative text analysis methods used across social sciences. We describe basic terminology and the overlooked, but critically important, steps in pre-processing raw text (e.g., selection of stop words; stemming). Next, we provide an exploratory analysis of open-ended responses from a prototypical survey dataset using topic modeling with R. We provide a list of best practice recommendations for text analysis focused on (1) hypothesis and question formation, (2) design and data collection, (3) data pre-processing, and (4) topic modeling. We also discuss the creation of scale scores for more traditional correlation and regression analyses. All the data are available in an online repository for the interested reader to practice with, along with a reference list for additional reading, an R markdown file, and an open source interactive topic model tool (topicApp; see https://github.com/wesslen/topicApp, https://github.com/wesslen/text-analysis-org-science, https://dataverse.unc.edu/dataset.xhtml?persistentId=doi:10.15139/S3/R4W7ZS).</description><identifier>ISSN: 0889-3268</identifier><identifier>EISSN: 1573-353X</identifier><identifier>DOI: 10.1007/s10869-017-9528-3</identifier><language>eng</language><publisher>New York: Springer Science + Business Media</publisher><subject>Behavioral Science and Psychology ; Business and Management ; Community and Environmental Psychology ; Computer programming ; Data mining ; Industrial and Organizational Psychology ; Information technology ; Organization theory ; ORIGINAL PAPER ; Personality and Social Psychology ; Psychology ; Social Sciences ; Text analysis</subject><ispartof>Journal of business and psychology, 2018-08, Vol.33 (4), p.445-459</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2018</rights><rights>Journal of Business and Psychology is a copyright of Springer, (2018). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c418t-d544b875c4dc108fce9250f192b8d6d9944001d91791048540e0dcd38c8c59183</citedby><cites>FETCH-LOGICAL-c418t-d544b875c4dc108fce9250f192b8d6d9944001d91791048540e0dcd38c8c59183</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1992788273/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1992788273?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,44363,58238,58471,74895</link.rule.ids></links><search><creatorcontrib>Banks, George C.</creatorcontrib><creatorcontrib>Woznyj, Haley M.</creatorcontrib><creatorcontrib>Wesslen, Ryan S.</creatorcontrib><creatorcontrib>Ross, Roxanne L.</creatorcontrib><title>A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)</title><title>Journal of business and psychology</title><addtitle>J Bus Psychol</addtitle><description>In recent decades, the amount of text available for organizational science research has grown tremendously. Despite the availability of text and advances in text analysis methods, many of these techniques remain largely segmented by discipline. Moreover, there is an increasing number of open-source tools (R, Python) for text analysis, yet these tools are not easily taken advantage of by social science researchers who likely have limited programming knowledge and exposure to computational methods. In this article, we compare quantitative and qualitative text analysis methods used across social sciences. We describe basic terminology and the overlooked, but critically important, steps in pre-processing raw text (e.g., selection of stop words; stemming). Next, we provide an exploratory analysis of open-ended responses from a prototypical survey dataset using topic modeling with R. We provide a list of best practice recommendations for text analysis focused on (1) hypothesis and question formation, (2) design and data collection, (3) data pre-processing, and (4) topic modeling. We also discuss the creation of scale scores for more traditional correlation and regression analyses. All the data are available in an online repository for the interested reader to practice with, along with a reference list for additional reading, an R markdown file, and an open source interactive topic model tool (topicApp; see https://github.com/wesslen/topicApp, https://github.com/wesslen/text-analysis-org-science, https://dataverse.unc.edu/dataset.xhtml?persistentId=doi:10.15139/S3/R4W7ZS).</description><subject>Behavioral Science and Psychology</subject><subject>Business and Management</subject><subject>Community and Environmental Psychology</subject><subject>Computer programming</subject><subject>Data mining</subject><subject>Industrial and Organizational Psychology</subject><subject>Information technology</subject><subject>Organization theory</subject><subject>ORIGINAL PAPER</subject><subject>Personality and Social Psychology</subject><subject>Psychology</subject><subject>Social Sciences</subject><subject>Text analysis</subject><issn>0889-3268</issn><issn>1573-353X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kEFLAzEQhYMoWKs_wIMQ8KKH6GST7CbHWqwKglJaEC9hm2RlS7tZk63af2_KinjyNDB87828h9AphSsKUFxHCjJXBGhBlMgkYXtoQEXBCBPsZR8NQEpFWJbLQ3QU4xIABM1hgF5HeOo-aveJfYVvXOzwcyhNVxuX9sav166xZVf7JuLKBzxzXx0eNeVqG-uI6wZP8UXZWFzieXSBTEKd-NUWj9r28hgdVOUqupOfOUTzye1sfE8en-4exqNHYjiVHbGC84UshOHWpBCVcSoTUFGVLaTNrVKcA1CraKEocCk4OLDGMmmkEYpKNkTnvW8b_PsmRdBLvwnpx6ipUlkhZVawRNGeMsHHGFyl21Cvy7DVFPSuQt1XqFOFeleh3mmyXhMT27y58Mf5H9FZL1rGzoffK1wWic8F-wZy43sD</recordid><startdate>20180801</startdate><enddate>20180801</enddate><creator>Banks, George C.</creator><creator>Woznyj, Haley M.</creator><creator>Wesslen, Ryan S.</creator><creator>Ross, Roxanne L.</creator><general>Springer Science + Business Media</general><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88C</scope><scope>88G</scope><scope>8AO</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>FYUFA</scope><scope>F~G</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>K60</scope><scope>K6~</scope><scope>L.-</scope><scope>M0C</scope><scope>M0T</scope><scope>M2M</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>Q9U</scope></search><sort><creationdate>20180801</creationdate><title>A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)</title><author>Banks, George C. ; Woznyj, Haley M. ; Wesslen, Ryan S. ; Ross, Roxanne L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c418t-d544b875c4dc108fce9250f192b8d6d9944001d91791048540e0dcd38c8c59183</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Behavioral Science and Psychology</topic><topic>Business and Management</topic><topic>Community and Environmental Psychology</topic><topic>Computer programming</topic><topic>Data mining</topic><topic>Industrial and Organizational Psychology</topic><topic>Information technology</topic><topic>Organization theory</topic><topic>ORIGINAL PAPER</topic><topic>Personality and Social Psychology</topic><topic>Psychology</topic><topic>Social Sciences</topic><topic>Text analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Banks, George C.</creatorcontrib><creatorcontrib>Woznyj, Haley M.</creatorcontrib><creatorcontrib>Wesslen, Ryan S.</creatorcontrib><creatorcontrib>Ross, Roxanne L.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ABI/INFORM Collection (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Psychology Database (Alumni)</collection><collection>ProQuest Pharma Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>Health Research Premium Collection</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM global</collection><collection>Healthcare Administration Database (Proquest)</collection><collection>Psychology Database (ProQuest)</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>ProQuest Central Basic</collection><jtitle>Journal of business and psychology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Banks, George C.</au><au>Woznyj, Haley M.</au><au>Wesslen, Ryan S.</au><au>Ross, Roxanne L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)</atitle><jtitle>Journal of business and psychology</jtitle><stitle>J Bus Psychol</stitle><date>2018-08-01</date><risdate>2018</risdate><volume>33</volume><issue>4</issue><spage>445</spage><epage>459</epage><pages>445-459</pages><issn>0889-3268</issn><eissn>1573-353X</eissn><abstract>In recent decades, the amount of text available for organizational science research has grown tremendously. Despite the availability of text and advances in text analysis methods, many of these techniques remain largely segmented by discipline. Moreover, there is an increasing number of open-source tools (R, Python) for text analysis, yet these tools are not easily taken advantage of by social science researchers who likely have limited programming knowledge and exposure to computational methods. In this article, we compare quantitative and qualitative text analysis methods used across social sciences. We describe basic terminology and the overlooked, but critically important, steps in pre-processing raw text (e.g., selection of stop words; stemming). Next, we provide an exploratory analysis of open-ended responses from a prototypical survey dataset using topic modeling with R. We provide a list of best practice recommendations for text analysis focused on (1) hypothesis and question formation, (2) design and data collection, (3) data pre-processing, and (4) topic modeling. We also discuss the creation of scale scores for more traditional correlation and regression analyses. All the data are available in an online repository for the interested reader to practice with, along with a reference list for additional reading, an R markdown file, and an open source interactive topic model tool (topicApp; see https://github.com/wesslen/topicApp, https://github.com/wesslen/text-analysis-org-science, https://dataverse.unc.edu/dataset.xhtml?persistentId=doi:10.15139/S3/R4W7ZS).</abstract><cop>New York</cop><pub>Springer Science + Business Media</pub><doi>10.1007/s10869-017-9528-3</doi><tpages>15</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0889-3268
ispartof Journal of business and psychology, 2018-08, Vol.33 (4), p.445-459
issn 0889-3268
1573-353X
language eng
recordid cdi_proquest_journals_1992788273
source EBSCOhost Business Source Ultimate; ABI/INFORM global; JSTOR Archival Journals and Primary Sources Collection; Springer Nature
subjects Behavioral Science and Psychology
Business and Management
Community and Environmental Psychology
Computer programming
Data mining
Industrial and Organizational Psychology
Information technology
Organization theory
ORIGINAL PAPER
Personality and Social Psychology
Psychology
Social Sciences
Text analysis
title A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T02%3A17%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Review%20of%20Best%20Practice%20Recommendations%20for%20Text%20Analysis%20in%20R%20(and%20a%20User-Friendly%20App)&rft.jtitle=Journal%20of%20business%20and%20psychology&rft.au=Banks,%20George%20C.&rft.date=2018-08-01&rft.volume=33&rft.issue=4&rft.spage=445&rft.epage=459&rft.pages=445-459&rft.issn=0889-3268&rft.eissn=1573-353X&rft_id=info:doi/10.1007/s10869-017-9528-3&rft_dat=%3Cjstor_proqu%3E48700765%3C/jstor_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c418t-d544b875c4dc108fce9250f192b8d6d9944001d91791048540e0dcd38c8c59183%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1992788273&rft_id=info:pmid/&rft_jstor_id=48700765&rfr_iscdi=true