Loading…

File Fragment Analysis Using Machine Learning

Files on a storage device, irrespective of the file system, when deleted, are unallocated and marked for garbage collection by the system. Once unallocated, the storage locations they once occupied are now available for use by newer files created in the system. This often leads to remnants of the de...

Full description

Saved in:
Bibliographic Details
Main Authors: Jinad, Razaq, Islam, ABM, Shashidhar, Narasimha
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Files on a storage device, irrespective of the file system, when deleted, are unallocated and marked for garbage collection by the system. Once unallocated, the storage locations they once occupied are now available for use by newer files created in the system. This often leads to remnants of the deleted files persisting on the storage medium. Unless forensically erased by rewriting random bits on the storage medium, one can confidently expect to find fragments of deleted files on any modern digital storage medium. Often, suspects also delete files in a desperate attempt to destroy evidence. In these instances, it becomes imperative that a forensic examiner can identify any remnant file fragments and, with some luck, assemble them into a composite whole to conduct a sound investigation of a digital crime. Other scenarios where fragments are commonplace include network traffic, infection by malware, and other related circumstances. Most file fragment analysis and restoration techniques illustrated in the literature are based on offline algorithms, with hardcoded file signature (header/footer) analysis, or in the absence of header/footer information, byte frequency analysis, and similar processes. We contend that file fragment analysis is critical to a sound forensic examination and, as such, demands a more sophisticated recovery strategy. To this end, in this research, we propose to apply machine learning techniques to solve this problem. We show the efficacy of byte frequency analysis and grayscale imaging in improving the classification of the accuracy of file fragments over the traditional techniques in the existing body of work. We also demonstrate the comparative effectiveness of these two analysis methods and their applicability to this problem.
ISSN:2837-0740
DOI:10.1109/DASC/PiCom/CBDCom/Cy59711.2023.10361430