Loading…

Using text data instead of SIC codes to tag innovative firms and classify industrial activities

The paper uses text mining and semantic algorithms to tag innovative firms and offer an alternative perspective to classify industrial activities. Instead of referring to firms’ standard industrial classification codes, we gather information from companies’ websites and corporate purposes, extract k...

Full description

Saved in:
Bibliographic Details
Published in:PloS one 2022-06, Vol.17 (6), p.e0270041-e0270041
Main Authors: Marra, Alessandro, Baldassari, Cristiano
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c669t-319f6c5c5d585da14e2507b6c2f31edaaf67cb576590e138cc36b4dddfa4c4733
cites cdi_FETCH-LOGICAL-c669t-319f6c5c5d585da14e2507b6c2f31edaaf67cb576590e138cc36b4dddfa4c4733
container_end_page e0270041
container_issue 6
container_start_page e0270041
container_title PloS one
container_volume 17
creator Marra, Alessandro
Baldassari, Cristiano
description The paper uses text mining and semantic algorithms to tag innovative firms and offer an alternative perspective to classify industrial activities. Instead of referring to firms’ standard industrial classification codes, we gather information from companies’ websites and corporate purposes, extract keywords and generate tags concerning firms’ activities, specializations, and competences. Evidence is interesting because allows us to understand ‘what firms do’ in a more penetrating and updated way than referring to standard industrial classification codes. Moreover, through matching firms’ keywords, we can explore the degree of closeness between the firms under observation, a measure by which researchers can derive industrial proximity. The analysis can provide policymakers with a detailed and comprehensive picture of the innovative trajectories underlying the industrial structure in a geographic area.
doi_str_mv 10.1371/journal.pone.0270041
format article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2686271710</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A708717405</galeid><doaj_id>oai_doaj_org_article_160f000a58f54de8854ce8ec6158ba43</doaj_id><sourcerecordid>A708717405</sourcerecordid><originalsourceid>FETCH-LOGICAL-c669t-319f6c5c5d585da14e2507b6c2f31edaaf67cb576590e138cc36b4dddfa4c4733</originalsourceid><addsrcrecordid>eNqNk99rFDEQxxdRbK3-B4IBQfThzmSz-bEvhXL446BQsNbXMJcfezn2NucmW-x_b7a3Slf6IHlImPnMdzLDTFG8JnhJqCAfd2HoO2iXh9DZJS4FxhV5UpySmpYLXmL69MH7pHgR4w5jRiXnz4sTyoQgsqanhbqJvmtQsr8SMpAA-S4mCwYFh67XK6SDsRGlgBI02deFW0j-1iLn-31E0BmkW4jRu7vsNUNMvYcWgc6QT97Gl8UzB220r6b7rLj5_On76uvi8urLenVxudCc12lBSe24ZpoZJpkBUtmSYbHhunSUWAPguNAbJjirsSVUak35pjLGOKh0JSg9K94cdQ9tiGrqTVQll7wURBCcifWRMAF26tD7PfR3KoBX94bQNwr65HVrFeHYYYyBSccqY6VklbbSak6Y3EA1Zjufsg2bvTXadqmHdiY693R-q5pwq-qy4oSzLPB-EujDz8HGpPY-atu20Nkw3P-7FJISWmb07T_o49VNVAO5AN-5kPPqUVRdCCwzVOEx7fIRKh9j917nQXI-22cBH2YBmRlHpYEhRrW-_vb_7NWPOfvuAbu10KZtDO2QfOjiHKyOoO5DjL11f5tMsBr34E831LgHatoD-htQjPjX</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2686271710</pqid></control><display><type>article</type><title>Using text data instead of SIC codes to tag innovative firms and classify industrial activities</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Marra, Alessandro ; Baldassari, Cristiano</creator><contributor>Kim, Wonjoon</contributor><creatorcontrib>Marra, Alessandro ; Baldassari, Cristiano ; Kim, Wonjoon</creatorcontrib><description>The paper uses text mining and semantic algorithms to tag innovative firms and offer an alternative perspective to classify industrial activities. Instead of referring to firms’ standard industrial classification codes, we gather information from companies’ websites and corporate purposes, extract keywords and generate tags concerning firms’ activities, specializations, and competences. Evidence is interesting because allows us to understand ‘what firms do’ in a more penetrating and updated way than referring to standard industrial classification codes. Moreover, through matching firms’ keywords, we can explore the degree of closeness between the firms under observation, a measure by which researchers can derive industrial proximity. The analysis can provide policymakers with a detailed and comprehensive picture of the innovative trajectories underlying the industrial structure in a geographic area.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0270041</identifier><identifier>PMID: 35771893</identifier><language>eng</language><publisher>San Francisco: Public Library of Science</publisher><subject>Algorithms ; Analysis ; Classification ; Codes ; Computer and Information Sciences ; Data mining ; Economic activity ; Engineering and Technology ; Evaluation ; Industrial areas ; Industrial research ; Innovations ; Italy ; Keywords ; North American Industry Classification System ; Product reviews ; Semantics ; Social Sciences ; Websites</subject><ispartof>PloS one, 2022-06, Vol.17 (6), p.e0270041-e0270041</ispartof><rights>COPYRIGHT 2022 Public Library of Science</rights><rights>2022 Marra, Baldassari. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2022 Marra, Baldassari 2022 Marra, Baldassari</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c669t-319f6c5c5d585da14e2507b6c2f31edaaf67cb576590e138cc36b4dddfa4c4733</citedby><cites>FETCH-LOGICAL-c669t-319f6c5c5d585da14e2507b6c2f31edaaf67cb576590e138cc36b4dddfa4c4733</cites><orcidid>0000-0002-6313-5047 ; 0000-0002-3963-3003</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2686271710/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2686271710?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,882,25734,27905,27906,36993,36994,44571,53772,53774,74875</link.rule.ids></links><search><contributor>Kim, Wonjoon</contributor><creatorcontrib>Marra, Alessandro</creatorcontrib><creatorcontrib>Baldassari, Cristiano</creatorcontrib><title>Using text data instead of SIC codes to tag innovative firms and classify industrial activities</title><title>PloS one</title><description>The paper uses text mining and semantic algorithms to tag innovative firms and offer an alternative perspective to classify industrial activities. Instead of referring to firms’ standard industrial classification codes, we gather information from companies’ websites and corporate purposes, extract keywords and generate tags concerning firms’ activities, specializations, and competences. Evidence is interesting because allows us to understand ‘what firms do’ in a more penetrating and updated way than referring to standard industrial classification codes. Moreover, through matching firms’ keywords, we can explore the degree of closeness between the firms under observation, a measure by which researchers can derive industrial proximity. The analysis can provide policymakers with a detailed and comprehensive picture of the innovative trajectories underlying the industrial structure in a geographic area.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Classification</subject><subject>Codes</subject><subject>Computer and Information Sciences</subject><subject>Data mining</subject><subject>Economic activity</subject><subject>Engineering and Technology</subject><subject>Evaluation</subject><subject>Industrial areas</subject><subject>Industrial research</subject><subject>Innovations</subject><subject>Italy</subject><subject>Keywords</subject><subject>North American Industry Classification System</subject><subject>Product reviews</subject><subject>Semantics</subject><subject>Social Sciences</subject><subject>Websites</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqNk99rFDEQxxdRbK3-B4IBQfThzmSz-bEvhXL446BQsNbXMJcfezn2NucmW-x_b7a3Slf6IHlImPnMdzLDTFG8JnhJqCAfd2HoO2iXh9DZJS4FxhV5UpySmpYLXmL69MH7pHgR4w5jRiXnz4sTyoQgsqanhbqJvmtQsr8SMpAA-S4mCwYFh67XK6SDsRGlgBI02deFW0j-1iLn-31E0BmkW4jRu7vsNUNMvYcWgc6QT97Gl8UzB220r6b7rLj5_On76uvi8urLenVxudCc12lBSe24ZpoZJpkBUtmSYbHhunSUWAPguNAbJjirsSVUak35pjLGOKh0JSg9K94cdQ9tiGrqTVQll7wURBCcifWRMAF26tD7PfR3KoBX94bQNwr65HVrFeHYYYyBSccqY6VklbbSak6Y3EA1Zjufsg2bvTXadqmHdiY693R-q5pwq-qy4oSzLPB-EujDz8HGpPY-atu20Nkw3P-7FJISWmb07T_o49VNVAO5AN-5kPPqUVRdCCwzVOEx7fIRKh9j917nQXI-22cBH2YBmRlHpYEhRrW-_vb_7NWPOfvuAbu10KZtDO2QfOjiHKyOoO5DjL11f5tMsBr34E831LgHatoD-htQjPjX</recordid><startdate>20220630</startdate><enddate>20220630</enddate><creator>Marra, Alessandro</creator><creator>Baldassari, Cristiano</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6313-5047</orcidid><orcidid>https://orcid.org/0000-0002-3963-3003</orcidid></search><sort><creationdate>20220630</creationdate><title>Using text data instead of SIC codes to tag innovative firms and classify industrial activities</title><author>Marra, Alessandro ; Baldassari, Cristiano</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c669t-319f6c5c5d585da14e2507b6c2f31edaaf67cb576590e138cc36b4dddfa4c4733</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Classification</topic><topic>Codes</topic><topic>Computer and Information Sciences</topic><topic>Data mining</topic><topic>Economic activity</topic><topic>Engineering and Technology</topic><topic>Evaluation</topic><topic>Industrial areas</topic><topic>Industrial research</topic><topic>Innovations</topic><topic>Italy</topic><topic>Keywords</topic><topic>North American Industry Classification System</topic><topic>Product reviews</topic><topic>Semantics</topic><topic>Social Sciences</topic><topic>Websites</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Marra, Alessandro</creatorcontrib><creatorcontrib>Baldassari, Cristiano</creatorcontrib><collection>CrossRef</collection><collection>Opposing Viewpoints Resource Center</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agriculture Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Marra, Alessandro</au><au>Baldassari, Cristiano</au><au>Kim, Wonjoon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using text data instead of SIC codes to tag innovative firms and classify industrial activities</atitle><jtitle>PloS one</jtitle><date>2022-06-30</date><risdate>2022</risdate><volume>17</volume><issue>6</issue><spage>e0270041</spage><epage>e0270041</epage><pages>e0270041-e0270041</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>The paper uses text mining and semantic algorithms to tag innovative firms and offer an alternative perspective to classify industrial activities. Instead of referring to firms’ standard industrial classification codes, we gather information from companies’ websites and corporate purposes, extract keywords and generate tags concerning firms’ activities, specializations, and competences. Evidence is interesting because allows us to understand ‘what firms do’ in a more penetrating and updated way than referring to standard industrial classification codes. Moreover, through matching firms’ keywords, we can explore the degree of closeness between the firms under observation, a measure by which researchers can derive industrial proximity. The analysis can provide policymakers with a detailed and comprehensive picture of the innovative trajectories underlying the industrial structure in a geographic area.</abstract><cop>San Francisco</cop><pub>Public Library of Science</pub><pmid>35771893</pmid><doi>10.1371/journal.pone.0270041</doi><orcidid>https://orcid.org/0000-0002-6313-5047</orcidid><orcidid>https://orcid.org/0000-0002-3963-3003</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2022-06, Vol.17 (6), p.e0270041-e0270041
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_2686271710
source Publicly Available Content Database; PubMed Central
subjects Algorithms
Analysis
Classification
Codes
Computer and Information Sciences
Data mining
Economic activity
Engineering and Technology
Evaluation
Industrial areas
Industrial research
Innovations
Italy
Keywords
North American Industry Classification System
Product reviews
Semantics
Social Sciences
Websites
title Using text data instead of SIC codes to tag innovative firms and classify industrial activities
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T19%3A30%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20text%20data%20instead%20of%20SIC%20codes%20to%20tag%20innovative%20firms%20and%20classify%20industrial%20activities&rft.jtitle=PloS%20one&rft.au=Marra,%20Alessandro&rft.date=2022-06-30&rft.volume=17&rft.issue=6&rft.spage=e0270041&rft.epage=e0270041&rft.pages=e0270041-e0270041&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0270041&rft_dat=%3Cgale_plos_%3EA708717405%3C/gale_plos_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c669t-319f6c5c5d585da14e2507b6c2f31edaaf67cb576590e138cc36b4dddfa4c4733%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2686271710&rft_id=info:pmid/35771893&rft_galeid=A708717405&rfr_iscdi=true