Loading…

File Fragment Analysis Using Machine Learning

Files on a storage device, irrespective of the file system, when deleted, are unallocated and marked for garbage collection by the system. Once unallocated, the storage locations they once occupied are now available for use by newer files created in the system. This often leads to remnants of the de...

Full description

Saved in:
Bibliographic Details
Main Authors: Jinad, Razaq, Islam, ABM, Shashidhar, Narasimha
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 0962
container_issue
container_start_page 0956
container_title
container_volume
creator Jinad, Razaq
Islam, ABM
Shashidhar, Narasimha
description Files on a storage device, irrespective of the file system, when deleted, are unallocated and marked for garbage collection by the system. Once unallocated, the storage locations they once occupied are now available for use by newer files created in the system. This often leads to remnants of the deleted files persisting on the storage medium. Unless forensically erased by rewriting random bits on the storage medium, one can confidently expect to find fragments of deleted files on any modern digital storage medium. Often, suspects also delete files in a desperate attempt to destroy evidence. In these instances, it becomes imperative that a forensic examiner can identify any remnant file fragments and, with some luck, assemble them into a composite whole to conduct a sound investigation of a digital crime. Other scenarios where fragments are commonplace include network traffic, infection by malware, and other related circumstances. Most file fragment analysis and restoration techniques illustrated in the literature are based on offline algorithms, with hardcoded file signature (header/footer) analysis, or in the absence of header/footer information, byte frequency analysis, and similar processes. We contend that file fragment analysis is critical to a sound forensic examination and, as such, demands a more sophisticated recovery strategy. To this end, in this research, we propose to apply machine learning techniques to solve this problem. We show the efficacy of byte frequency analysis and grayscale imaging in improving the classification of the accuracy of file fragments over the traditional techniques in the existing body of work. We also demonstrate the comparative effectiveness of these two analysis methods and their applicability to this problem.
doi_str_mv 10.1109/DASC/PiCom/CBDCom/Cy59711.2023.10361430
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10361430</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10361430</ieee_id><sourcerecordid>10361430</sourcerecordid><originalsourceid>FETCH-LOGICAL-i119t-7818dc7ed630b86900898376098e405b06f05493f70d08a444bc42b27afedd113</originalsourceid><addsrcrecordid>eNo1j8tOhEAURFsTEycjf-CCHwDu7dv0Y4mMOCYYTXTWkwaasQ2goWfD30t8rE5SlZxUMZYhpIhgsl3xWmYvvvwcs_Ju94MlNwox5cApRSCJguCCRUYZTTkQCAn8km24JpWAEnDNohA-ANYKDGrasKTyg4ur2Z5GN53jYrLDEnyID8FPp_jJtu9-cnHt7DytwQ276u0QXPTHLTtU92_lPqmfHx7Lok48ojknSqPuWuU6SdBoaQD0OkhJMNoJyBuQPeTCUK-gA22FEE0reMOV7V3XIdKW3f56vXPu-DX70c7L8f8hfQM7FEfn</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>File Fragment Analysis Using Machine Learning</title><source>IEEE Xplore All Conference Series</source><creator>Jinad, Razaq ; Islam, ABM ; Shashidhar, Narasimha</creator><creatorcontrib>Jinad, Razaq ; Islam, ABM ; Shashidhar, Narasimha</creatorcontrib><description>Files on a storage device, irrespective of the file system, when deleted, are unallocated and marked for garbage collection by the system. Once unallocated, the storage locations they once occupied are now available for use by newer files created in the system. This often leads to remnants of the deleted files persisting on the storage medium. Unless forensically erased by rewriting random bits on the storage medium, one can confidently expect to find fragments of deleted files on any modern digital storage medium. Often, suspects also delete files in a desperate attempt to destroy evidence. In these instances, it becomes imperative that a forensic examiner can identify any remnant file fragments and, with some luck, assemble them into a composite whole to conduct a sound investigation of a digital crime. Other scenarios where fragments are commonplace include network traffic, infection by malware, and other related circumstances. Most file fragment analysis and restoration techniques illustrated in the literature are based on offline algorithms, with hardcoded file signature (header/footer) analysis, or in the absence of header/footer information, byte frequency analysis, and similar processes. We contend that file fragment analysis is critical to a sound forensic examination and, as such, demands a more sophisticated recovery strategy. To this end, in this research, we propose to apply machine learning techniques to solve this problem. We show the efficacy of byte frequency analysis and grayscale imaging in improving the classification of the accuracy of file fragments over the traditional techniques in the existing body of work. We also demonstrate the comparative effectiveness of these two analysis methods and their applicability to this problem.</description><identifier>EISSN: 2837-0740</identifier><identifier>EISBN: 9798350304602</identifier><identifier>DOI: 10.1109/DASC/PiCom/CBDCom/Cy59711.2023.10361430</identifier><language>eng</language><publisher>IEEE</publisher><subject>Analytical models ; Byte frequency analysis ; File fragment ; File systems ; Forensics ; Gray-scale ; Grayscale imaging ; Imaging ; Machine learning ; Telecommunication traffic</subject><ispartof>2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), 2023, p.0956-0962</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10361430$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27902,54530,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10361430$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jinad, Razaq</creatorcontrib><creatorcontrib>Islam, ABM</creatorcontrib><creatorcontrib>Shashidhar, Narasimha</creatorcontrib><title>File Fragment Analysis Using Machine Learning</title><title>2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)</title><addtitle>DASC/PICOM/CBDCOM/CYBERSCITECH</addtitle><description>Files on a storage device, irrespective of the file system, when deleted, are unallocated and marked for garbage collection by the system. Once unallocated, the storage locations they once occupied are now available for use by newer files created in the system. This often leads to remnants of the deleted files persisting on the storage medium. Unless forensically erased by rewriting random bits on the storage medium, one can confidently expect to find fragments of deleted files on any modern digital storage medium. Often, suspects also delete files in a desperate attempt to destroy evidence. In these instances, it becomes imperative that a forensic examiner can identify any remnant file fragments and, with some luck, assemble them into a composite whole to conduct a sound investigation of a digital crime. Other scenarios where fragments are commonplace include network traffic, infection by malware, and other related circumstances. Most file fragment analysis and restoration techniques illustrated in the literature are based on offline algorithms, with hardcoded file signature (header/footer) analysis, or in the absence of header/footer information, byte frequency analysis, and similar processes. We contend that file fragment analysis is critical to a sound forensic examination and, as such, demands a more sophisticated recovery strategy. To this end, in this research, we propose to apply machine learning techniques to solve this problem. We show the efficacy of byte frequency analysis and grayscale imaging in improving the classification of the accuracy of file fragments over the traditional techniques in the existing body of work. We also demonstrate the comparative effectiveness of these two analysis methods and their applicability to this problem.</description><subject>Analytical models</subject><subject>Byte frequency analysis</subject><subject>File fragment</subject><subject>File systems</subject><subject>Forensics</subject><subject>Gray-scale</subject><subject>Grayscale imaging</subject><subject>Imaging</subject><subject>Machine learning</subject><subject>Telecommunication traffic</subject><issn>2837-0740</issn><isbn>9798350304602</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1j8tOhEAURFsTEycjf-CCHwDu7dv0Y4mMOCYYTXTWkwaasQ2goWfD30t8rE5SlZxUMZYhpIhgsl3xWmYvvvwcs_Ju94MlNwox5cApRSCJguCCRUYZTTkQCAn8km24JpWAEnDNohA-ANYKDGrasKTyg4ur2Z5GN53jYrLDEnyID8FPp_jJtu9-cnHt7DytwQ276u0QXPTHLTtU92_lPqmfHx7Lok48ojknSqPuWuU6SdBoaQD0OkhJMNoJyBuQPeTCUK-gA22FEE0reMOV7V3XIdKW3f56vXPu-DX70c7L8f8hfQM7FEfn</recordid><startdate>20231114</startdate><enddate>20231114</enddate><creator>Jinad, Razaq</creator><creator>Islam, ABM</creator><creator>Shashidhar, Narasimha</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20231114</creationdate><title>File Fragment Analysis Using Machine Learning</title><author>Jinad, Razaq ; Islam, ABM ; Shashidhar, Narasimha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i119t-7818dc7ed630b86900898376098e405b06f05493f70d08a444bc42b27afedd113</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Analytical models</topic><topic>Byte frequency analysis</topic><topic>File fragment</topic><topic>File systems</topic><topic>Forensics</topic><topic>Gray-scale</topic><topic>Grayscale imaging</topic><topic>Imaging</topic><topic>Machine learning</topic><topic>Telecommunication traffic</topic><toplevel>online_resources</toplevel><creatorcontrib>Jinad, Razaq</creatorcontrib><creatorcontrib>Islam, ABM</creatorcontrib><creatorcontrib>Shashidhar, Narasimha</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library Online</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jinad, Razaq</au><au>Islam, ABM</au><au>Shashidhar, Narasimha</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>File Fragment Analysis Using Machine Learning</atitle><btitle>2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)</btitle><stitle>DASC/PICOM/CBDCOM/CYBERSCITECH</stitle><date>2023-11-14</date><risdate>2023</risdate><spage>0956</spage><epage>0962</epage><pages>0956-0962</pages><eissn>2837-0740</eissn><eisbn>9798350304602</eisbn><abstract>Files on a storage device, irrespective of the file system, when deleted, are unallocated and marked for garbage collection by the system. Once unallocated, the storage locations they once occupied are now available for use by newer files created in the system. This often leads to remnants of the deleted files persisting on the storage medium. Unless forensically erased by rewriting random bits on the storage medium, one can confidently expect to find fragments of deleted files on any modern digital storage medium. Often, suspects also delete files in a desperate attempt to destroy evidence. In these instances, it becomes imperative that a forensic examiner can identify any remnant file fragments and, with some luck, assemble them into a composite whole to conduct a sound investigation of a digital crime. Other scenarios where fragments are commonplace include network traffic, infection by malware, and other related circumstances. Most file fragment analysis and restoration techniques illustrated in the literature are based on offline algorithms, with hardcoded file signature (header/footer) analysis, or in the absence of header/footer information, byte frequency analysis, and similar processes. We contend that file fragment analysis is critical to a sound forensic examination and, as such, demands a more sophisticated recovery strategy. To this end, in this research, we propose to apply machine learning techniques to solve this problem. We show the efficacy of byte frequency analysis and grayscale imaging in improving the classification of the accuracy of file fragments over the traditional techniques in the existing body of work. We also demonstrate the comparative effectiveness of these two analysis methods and their applicability to this problem.</abstract><pub>IEEE</pub><doi>10.1109/DASC/PiCom/CBDCom/Cy59711.2023.10361430</doi><tpages>7</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2837-0740
ispartof 2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), 2023, p.0956-0962
issn 2837-0740
language eng
recordid cdi_ieee_primary_10361430
source IEEE Xplore All Conference Series
subjects Analytical models
Byte frequency analysis
File fragment
File systems
Forensics
Gray-scale
Grayscale imaging
Imaging
Machine learning
Telecommunication traffic
title File Fragment Analysis Using Machine Learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T03%3A04%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=File%20Fragment%20Analysis%20Using%20Machine%20Learning&rft.btitle=2023%20IEEE%20Intl%20Conf%20on%20Dependable,%20Autonomic%20and%20Secure%20Computing,%20Intl%20Conf%20on%20Pervasive%20Intelligence%20and%20Computing,%20Intl%20Conf%20on%20Cloud%20and%20Big%20Data%20Computing,%20Intl%20Conf%20on%20Cyber%20Science%20and%20Technology%20Congress%20(DASC/PiCom/CBDCom/CyberSciTech)&rft.au=Jinad,%20Razaq&rft.date=2023-11-14&rft.spage=0956&rft.epage=0962&rft.pages=0956-0962&rft.eissn=2837-0740&rft_id=info:doi/10.1109/DASC/PiCom/CBDCom/Cy59711.2023.10361430&rft.eisbn=9798350304602&rft_dat=%3Cieee_CHZPO%3E10361430%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i119t-7818dc7ed630b86900898376098e405b06f05493f70d08a444bc42b27afedd113%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10361430&rfr_iscdi=true