Loading…
A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements
Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test mul...
Saved in:
Published in: | Machine learning and knowledge extraction 2023-06, Vol.5 (2), p.460-472 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c361t-b553becc88824062447faac33ed4b0c6d490f242100f76613456fc1c15e589a33 |
container_end_page | 472 |
container_issue | 2 |
container_start_page | 460 |
container_title | Machine learning and knowledge extraction |
container_volume | 5 |
creator | Summers, Lucia Shallenberger, Alyssa N. Cruz, John Fulton, Lawrence V. |
description | Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (>0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims. |
doi_str_mv | 10.3390/make5020028 |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_76fc3eae60ca4bf0a973d6e0c90f7c00</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A758381051</galeid><doaj_id>oai_doaj_org_article_76fc3eae60ca4bf0a973d6e0c90f7c00</doaj_id><sourcerecordid>A758381051</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-b553becc88824062447faac33ed4b0c6d490f242100f76613456fc1c15e589a33</originalsourceid><addsrcrecordid>eNpNUU1rGzEQXUoLDUlO_QOCHsumo69d7XExaWNwyKHpWcjakSNnV3IluTT_PnJdSpiDRo_33jxmmuYThRvOB_i6mGeUwACYetdcMAmiFcMA79_0H5vrnPdQKf0gKIiL5mkk98e5-HYdDsdC7o198gHJBk0KPuzIeDikWEFSIlnNJmfvXk74D_xDHpNxztvn09-luJCHMJ_Et9nGVMg4_cZUfMYFQ8lXzQdn5ozX_97L5ue328fVXbt5-L5ejZvW8o6Wdisl36K1SikmoGNC9M4YyzlOYgu2m8QAjglGAVzfdZQL2TlLLZUo1WA4v2zWZ98pmr0-JL-Y9KKj8fovENNOm5rKzqj7quRosANrxNaBGXo-dQi2jugtQPX6fPaqO_h1xFz0Ph5TqPE1U2xQde2sq6ybM2tnqqkPLpZkbK0JF29jQOcrPvZScUVB0ir4chbYFHNO6P7HpKBPp9RvTslfAa81kIc</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2829833926</pqid></control><display><type>article</type><title>A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements</title><source>Publicly Available Content (ProQuest)</source><creator>Summers, Lucia ; Shallenberger, Alyssa N. ; Cruz, John ; Fulton, Lawrence V.</creator><creatorcontrib>Summers, Lucia ; Shallenberger, Alyssa N. ; Cruz, John ; Fulton, Lawrence V.</creatorcontrib><description>Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (>0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims.</description><identifier>ISSN: 2504-4990</identifier><identifier>EISSN: 2504-4990</identifier><identifier>DOI: 10.3390/make5020028</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>(convolutional) neural networks ; Artificial neural networks ; Classification ; Criminal investigations ; Datasets ; Deep learning ; Drug trafficking ; Emojis ; Human smuggling ; Human trafficking ; Image classification ; Law enforcement ; Machine learning ; multi-input models ; natural language processing ; Neural networks ; Prostitution ; Random access memory ; Recurrent neural networks ; Regression analysis ; Sex ; sex trafficking ; Support vector machines ; Test sets</subject><ispartof>Machine learning and knowledge extraction, 2023-06, Vol.5 (2), p.460-472</ispartof><rights>COPYRIGHT 2023 MDPI AG</rights><rights>2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c361t-b553becc88824062447faac33ed4b0c6d490f242100f76613456fc1c15e589a33</cites><orcidid>0000-0001-8674-5369 ; 0000-0001-8603-1913 ; 0000-0002-0633-0372</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2829833926/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2829833926?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,44590,75126</link.rule.ids></links><search><creatorcontrib>Summers, Lucia</creatorcontrib><creatorcontrib>Shallenberger, Alyssa N.</creatorcontrib><creatorcontrib>Cruz, John</creatorcontrib><creatorcontrib>Fulton, Lawrence V.</creatorcontrib><title>A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements</title><title>Machine learning and knowledge extraction</title><description>Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (>0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims.</description><subject>(convolutional) neural networks</subject><subject>Artificial neural networks</subject><subject>Classification</subject><subject>Criminal investigations</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Drug trafficking</subject><subject>Emojis</subject><subject>Human smuggling</subject><subject>Human trafficking</subject><subject>Image classification</subject><subject>Law enforcement</subject><subject>Machine learning</subject><subject>multi-input models</subject><subject>natural language processing</subject><subject>Neural networks</subject><subject>Prostitution</subject><subject>Random access memory</subject><subject>Recurrent neural networks</subject><subject>Regression analysis</subject><subject>Sex</subject><subject>sex trafficking</subject><subject>Support vector machines</subject><subject>Test sets</subject><issn>2504-4990</issn><issn>2504-4990</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1rGzEQXUoLDUlO_QOCHsumo69d7XExaWNwyKHpWcjakSNnV3IluTT_PnJdSpiDRo_33jxmmuYThRvOB_i6mGeUwACYetdcMAmiFcMA79_0H5vrnPdQKf0gKIiL5mkk98e5-HYdDsdC7o198gHJBk0KPuzIeDikWEFSIlnNJmfvXk74D_xDHpNxztvn09-luJCHMJ_Et9nGVMg4_cZUfMYFQ8lXzQdn5ozX_97L5ue328fVXbt5-L5ejZvW8o6Wdisl36K1SikmoGNC9M4YyzlOYgu2m8QAjglGAVzfdZQL2TlLLZUo1WA4v2zWZ98pmr0-JL-Y9KKj8fovENNOm5rKzqj7quRosANrxNaBGXo-dQi2jugtQPX6fPaqO_h1xFz0Ph5TqPE1U2xQde2sq6ybM2tnqqkPLpZkbK0JF29jQOcrPvZScUVB0ir4chbYFHNO6P7HpKBPp9RvTslfAa81kIc</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Summers, Lucia</creator><creator>Shallenberger, Alyssa N.</creator><creator>Cruz, John</creator><creator>Fulton, Lawrence V.</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-8674-5369</orcidid><orcidid>https://orcid.org/0000-0001-8603-1913</orcidid><orcidid>https://orcid.org/0000-0002-0633-0372</orcidid></search><sort><creationdate>20230601</creationdate><title>A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements</title><author>Summers, Lucia ; Shallenberger, Alyssa N. ; Cruz, John ; Fulton, Lawrence V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-b553becc88824062447faac33ed4b0c6d490f242100f76613456fc1c15e589a33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>(convolutional) neural networks</topic><topic>Artificial neural networks</topic><topic>Classification</topic><topic>Criminal investigations</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Drug trafficking</topic><topic>Emojis</topic><topic>Human smuggling</topic><topic>Human trafficking</topic><topic>Image classification</topic><topic>Law enforcement</topic><topic>Machine learning</topic><topic>multi-input models</topic><topic>natural language processing</topic><topic>Neural networks</topic><topic>Prostitution</topic><topic>Random access memory</topic><topic>Recurrent neural networks</topic><topic>Regression analysis</topic><topic>Sex</topic><topic>sex trafficking</topic><topic>Support vector machines</topic><topic>Test sets</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Summers, Lucia</creatorcontrib><creatorcontrib>Shallenberger, Alyssa N.</creatorcontrib><creatorcontrib>Cruz, John</creatorcontrib><creatorcontrib>Fulton, Lawrence V.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Machine learning and knowledge extraction</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Summers, Lucia</au><au>Shallenberger, Alyssa N.</au><au>Cruz, John</au><au>Fulton, Lawrence V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements</atitle><jtitle>Machine learning and knowledge extraction</jtitle><date>2023-06-01</date><risdate>2023</risdate><volume>5</volume><issue>2</issue><spage>460</spage><epage>472</epage><pages>460-472</pages><issn>2504-4990</issn><eissn>2504-4990</eissn><abstract>Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (>0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/make5020028</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-8674-5369</orcidid><orcidid>https://orcid.org/0000-0001-8603-1913</orcidid><orcidid>https://orcid.org/0000-0002-0633-0372</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2504-4990 |
ispartof | Machine learning and knowledge extraction, 2023-06, Vol.5 (2), p.460-472 |
issn | 2504-4990 2504-4990 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_76fc3eae60ca4bf0a973d6e0c90f7c00 |
source | Publicly Available Content (ProQuest) |
subjects | (convolutional) neural networks Artificial neural networks Classification Criminal investigations Datasets Deep learning Drug trafficking Emojis Human smuggling Human trafficking Image classification Law enforcement Machine learning multi-input models natural language processing Neural networks Prostitution Random access memory Recurrent neural networks Regression analysis Sex sex trafficking Support vector machines Test sets |
title | A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T01%3A34%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Multi-Input%20Machine%20Learning%20Approach%20to%20Classifying%20Sex%20Trafficking%20from%20Online%20Escort%20Advertisements&rft.jtitle=Machine%20learning%20and%20knowledge%20extraction&rft.au=Summers,%20Lucia&rft.date=2023-06-01&rft.volume=5&rft.issue=2&rft.spage=460&rft.epage=472&rft.pages=460-472&rft.issn=2504-4990&rft.eissn=2504-4990&rft_id=info:doi/10.3390/make5020028&rft_dat=%3Cgale_doaj_%3EA758381051%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c361t-b553becc88824062447faac33ed4b0c6d490f242100f76613456fc1c15e589a33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2829833926&rft_id=info:pmid/&rft_galeid=A758381051&rfr_iscdi=true |