Loading…

A criteria-based classification model using augmentation and contrastive learning for analyzing imbalanced statement data

Criteria Based Content Analysis (CBCA) is a forensic tool that analyzes victim statements. It involves the categorization of victims' statements into 19 distinct criteria classifications, playing a crucial role in evaluating the authenticity of testimonies by discerning whether they are rooted...

Full description

Saved in:
Bibliographic Details
Published in:Heliyon 2024-06, Vol.10 (12), p.e32929, Article e32929
Main Authors: Shin, Junho, Kwak, Jinhee, Jung, Jaehee
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Criteria Based Content Analysis (CBCA) is a forensic tool that analyzes victim statements. It involves the categorization of victims' statements into 19 distinct criteria classifications, playing a crucial role in evaluating the authenticity of testimonies by discerning whether they are rooted in genuine experiences or fabricated accounts. The exclusion of subjective opinions becomes imperative to assess statements through this forensic tool objectively. This study proposes developing an objective classification model for CBCA-based statement analysis using natural language processing techniques. Nevertheless, achieving optimal classification performance proves challenging due to imbalances in data distribution among the various criterion classifications. To enhance the accuracy and reliability of the classification model, this research employs data augmentation techniques and dual contrastive learning methods for fine-tuning the RoBERTa language model. Furthermore, model-based optimization techniques are also applied to identify augmented hyper-parameters and maximize the model's classification performance. The study's findings, including an 8.5% improvement in macro F1 score compared to human classification results, a 24% improvement in macro F1 score, and a 13% improvement in accuracy compared to previous human classification results, suggest that the proposed model is highly effective in reducing the influence of human subjectivity in statement analysis. The proposed model has significant implications for legal proceedings and criminal investigations, as it can provide a more objective and reliable method for evaluating the credibility of victim statements. Reducing human subjectivity in the statement analysis process can increase the accuracy of verdicts and help ensure that justice is served.
ISSN:2405-8440
2405-8440
DOI:10.1016/j.heliyon.2024.e32929