Loading…

A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements

Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test mul...

Full description

Saved in:
Bibliographic Details
Published in:Machine learning and knowledge extraction 2023-06, Vol.5 (2), p.460-472
Main Authors: Summers, Lucia, Shallenberger, Alyssa N., Cruz, John, Fulton, Lawrence V.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c361t-b553becc88824062447faac33ed4b0c6d490f242100f76613456fc1c15e589a33
container_end_page 472
container_issue 2
container_start_page 460
container_title Machine learning and knowledge extraction
container_volume 5
creator Summers, Lucia
Shallenberger, Alyssa N.
Cruz, John
Fulton, Lawrence V.
description Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (>0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims.
doi_str_mv 10.3390/make5020028
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_76fc3eae60ca4bf0a973d6e0c90f7c00</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A758381051</galeid><doaj_id>oai_doaj_org_article_76fc3eae60ca4bf0a973d6e0c90f7c00</doaj_id><sourcerecordid>A758381051</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-b553becc88824062447faac33ed4b0c6d490f242100f76613456fc1c15e589a33</originalsourceid><addsrcrecordid>eNpNUU1rGzEQXUoLDUlO_QOCHsumo69d7XExaWNwyKHpWcjakSNnV3IluTT_PnJdSpiDRo_33jxmmuYThRvOB_i6mGeUwACYetdcMAmiFcMA79_0H5vrnPdQKf0gKIiL5mkk98e5-HYdDsdC7o198gHJBk0KPuzIeDikWEFSIlnNJmfvXk74D_xDHpNxztvn09-luJCHMJ_Et9nGVMg4_cZUfMYFQ8lXzQdn5ozX_97L5ue328fVXbt5-L5ejZvW8o6Wdisl36K1SikmoGNC9M4YyzlOYgu2m8QAjglGAVzfdZQL2TlLLZUo1WA4v2zWZ98pmr0-JL-Y9KKj8fovENNOm5rKzqj7quRosANrxNaBGXo-dQi2jugtQPX6fPaqO_h1xFz0Ph5TqPE1U2xQde2sq6ybM2tnqqkPLpZkbK0JF29jQOcrPvZScUVB0ir4chbYFHNO6P7HpKBPp9RvTslfAa81kIc</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2829833926</pqid></control><display><type>article</type><title>A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements</title><source>Publicly Available Content (ProQuest)</source><creator>Summers, Lucia ; Shallenberger, Alyssa N. ; Cruz, John ; Fulton, Lawrence V.</creator><creatorcontrib>Summers, Lucia ; Shallenberger, Alyssa N. ; Cruz, John ; Fulton, Lawrence V.</creatorcontrib><description>Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (&gt;0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims.</description><identifier>ISSN: 2504-4990</identifier><identifier>EISSN: 2504-4990</identifier><identifier>DOI: 10.3390/make5020028</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>(convolutional) neural networks ; Artificial neural networks ; Classification ; Criminal investigations ; Datasets ; Deep learning ; Drug trafficking ; Emojis ; Human smuggling ; Human trafficking ; Image classification ; Law enforcement ; Machine learning ; multi-input models ; natural language processing ; Neural networks ; Prostitution ; Random access memory ; Recurrent neural networks ; Regression analysis ; Sex ; sex trafficking ; Support vector machines ; Test sets</subject><ispartof>Machine learning and knowledge extraction, 2023-06, Vol.5 (2), p.460-472</ispartof><rights>COPYRIGHT 2023 MDPI AG</rights><rights>2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c361t-b553becc88824062447faac33ed4b0c6d490f242100f76613456fc1c15e589a33</cites><orcidid>0000-0001-8674-5369 ; 0000-0001-8603-1913 ; 0000-0002-0633-0372</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2829833926/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2829833926?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,44590,75126</link.rule.ids></links><search><creatorcontrib>Summers, Lucia</creatorcontrib><creatorcontrib>Shallenberger, Alyssa N.</creatorcontrib><creatorcontrib>Cruz, John</creatorcontrib><creatorcontrib>Fulton, Lawrence V.</creatorcontrib><title>A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements</title><title>Machine learning and knowledge extraction</title><description>Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (&gt;0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims.</description><subject>(convolutional) neural networks</subject><subject>Artificial neural networks</subject><subject>Classification</subject><subject>Criminal investigations</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Drug trafficking</subject><subject>Emojis</subject><subject>Human smuggling</subject><subject>Human trafficking</subject><subject>Image classification</subject><subject>Law enforcement</subject><subject>Machine learning</subject><subject>multi-input models</subject><subject>natural language processing</subject><subject>Neural networks</subject><subject>Prostitution</subject><subject>Random access memory</subject><subject>Recurrent neural networks</subject><subject>Regression analysis</subject><subject>Sex</subject><subject>sex trafficking</subject><subject>Support vector machines</subject><subject>Test sets</subject><issn>2504-4990</issn><issn>2504-4990</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1rGzEQXUoLDUlO_QOCHsumo69d7XExaWNwyKHpWcjakSNnV3IluTT_PnJdSpiDRo_33jxmmuYThRvOB_i6mGeUwACYetdcMAmiFcMA79_0H5vrnPdQKf0gKIiL5mkk98e5-HYdDsdC7o198gHJBk0KPuzIeDikWEFSIlnNJmfvXk74D_xDHpNxztvn09-luJCHMJ_Et9nGVMg4_cZUfMYFQ8lXzQdn5ozX_97L5ue328fVXbt5-L5ejZvW8o6Wdisl36K1SikmoGNC9M4YyzlOYgu2m8QAjglGAVzfdZQL2TlLLZUo1WA4v2zWZ98pmr0-JL-Y9KKj8fovENNOm5rKzqj7quRosANrxNaBGXo-dQi2jugtQPX6fPaqO_h1xFz0Ph5TqPE1U2xQde2sq6ybM2tnqqkPLpZkbK0JF29jQOcrPvZScUVB0ir4chbYFHNO6P7HpKBPp9RvTslfAa81kIc</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Summers, Lucia</creator><creator>Shallenberger, Alyssa N.</creator><creator>Cruz, John</creator><creator>Fulton, Lawrence V.</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-8674-5369</orcidid><orcidid>https://orcid.org/0000-0001-8603-1913</orcidid><orcidid>https://orcid.org/0000-0002-0633-0372</orcidid></search><sort><creationdate>20230601</creationdate><title>A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements</title><author>Summers, Lucia ; Shallenberger, Alyssa N. ; Cruz, John ; Fulton, Lawrence V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-b553becc88824062447faac33ed4b0c6d490f242100f76613456fc1c15e589a33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>(convolutional) neural networks</topic><topic>Artificial neural networks</topic><topic>Classification</topic><topic>Criminal investigations</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Drug trafficking</topic><topic>Emojis</topic><topic>Human smuggling</topic><topic>Human trafficking</topic><topic>Image classification</topic><topic>Law enforcement</topic><topic>Machine learning</topic><topic>multi-input models</topic><topic>natural language processing</topic><topic>Neural networks</topic><topic>Prostitution</topic><topic>Random access memory</topic><topic>Recurrent neural networks</topic><topic>Regression analysis</topic><topic>Sex</topic><topic>sex trafficking</topic><topic>Support vector machines</topic><topic>Test sets</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Summers, Lucia</creatorcontrib><creatorcontrib>Shallenberger, Alyssa N.</creatorcontrib><creatorcontrib>Cruz, John</creatorcontrib><creatorcontrib>Fulton, Lawrence V.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Machine learning and knowledge extraction</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Summers, Lucia</au><au>Shallenberger, Alyssa N.</au><au>Cruz, John</au><au>Fulton, Lawrence V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements</atitle><jtitle>Machine learning and knowledge extraction</jtitle><date>2023-06-01</date><risdate>2023</risdate><volume>5</volume><issue>2</issue><spage>460</spage><epage>472</epage><pages>460-472</pages><issn>2504-4990</issn><eissn>2504-4990</eissn><abstract>Sex trafficking victims are often advertised through online escort sites. These ads can be publicly accessed, but law enforcement lacks the resources to comb through hundreds of ads to identify those that may feature sex-trafficked individuals. The purpose of this study was to implement and test multi-input, deep learning (DL) binary classification models to predict the probability of an online escort ad being associated with sex trafficking (ST) activity and aid in the detection and investigation of ST. Data from 12,350 scraped and classified ads were split into training and test sets (80% and 20%, respectively). Multi-input models that included recurrent neural networks (RNN) for text classification, convolutional neural networks (CNN, specifically EfficientNetB6 or ENET) for image/emoji classification, and neural networks (NN) for feature classification were trained and used to classify the 20% test set. The best-performing DL model included text and imagery inputs, resulting in an accuracy of 0.82 and an F1 score of 0.70. More importantly, the best classifier (RNN + ENET) correctly identified 14 of 14 sites that had classification probability estimates of 0.845 or greater (1.0 precision); precision was 96% for the multi-input model (NN + RNN + ENET) when only the ads associated with the highest positive classification probabilities (&gt;0.90) were considered (n = 202 ads). The models developed could be productionalized and piloted with criminal investigators, as they could potentially increase their efficiency in identifying potential ST victims.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/make5020028</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-8674-5369</orcidid><orcidid>https://orcid.org/0000-0001-8603-1913</orcidid><orcidid>https://orcid.org/0000-0002-0633-0372</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2504-4990
ispartof Machine learning and knowledge extraction, 2023-06, Vol.5 (2), p.460-472
issn 2504-4990
2504-4990
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_76fc3eae60ca4bf0a973d6e0c90f7c00
source Publicly Available Content (ProQuest)
subjects (convolutional) neural networks
Artificial neural networks
Classification
Criminal investigations
Datasets
Deep learning
Drug trafficking
Emojis
Human smuggling
Human trafficking
Image classification
Law enforcement
Machine learning
multi-input models
natural language processing
Neural networks
Prostitution
Random access memory
Recurrent neural networks
Regression analysis
Sex
sex trafficking
Support vector machines
Test sets
title A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T01%3A34%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Multi-Input%20Machine%20Learning%20Approach%20to%20Classifying%20Sex%20Trafficking%20from%20Online%20Escort%20Advertisements&rft.jtitle=Machine%20learning%20and%20knowledge%20extraction&rft.au=Summers,%20Lucia&rft.date=2023-06-01&rft.volume=5&rft.issue=2&rft.spage=460&rft.epage=472&rft.pages=460-472&rft.issn=2504-4990&rft.eissn=2504-4990&rft_id=info:doi/10.3390/make5020028&rft_dat=%3Cgale_doaj_%3EA758381051%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c361t-b553becc88824062447faac33ed4b0c6d490f242100f76613456fc1c15e589a33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2829833926&rft_id=info:pmid/&rft_galeid=A758381051&rfr_iscdi=true