Loading…

Crash causing information extraction via text mining techniques: Implementation of the Chinese state-related crash narratives

Crash data is the foundation of traffic safety analysis, which can help the experts find the cause of the crashes and propose corresponding countermeasures. In China, the accident reporting form (ARF) only allows reporting one crash cause for each crash based on the prespecified crash cause code. Th...

Full description

Saved in:
Bibliographic Details
Published in:Transportation safety and environment Online 2024-09
Main Authors: Zou, Guoqing, Huang, Helai, Zhou, Hanchu, Li, Jipu
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Crash data is the foundation of traffic safety analysis, which can help the experts find the cause of the crashes and propose corresponding countermeasures. In China, the accident reporting form (ARF) only allows reporting one crash cause for each crash based on the prespecified crash cause code. This designation may lead to inaccuracy in recording crash data, especially for state-related crashes. The crash narratives, which is the responding officer's written account of what occurred before, during, and after the crash, contain considerable free-form information associated with the crash occurrence. This study investigated the directly contributory factors behind the state-related crashes through the development of natural language processing (NLP) and deep learning models based on 1 625 state-related crash narratives. According to the directly causative factors described in the crash narratives, the state-related crashes were labeled by speed-related, turning-related, and other causes. Then the crash narratives were vectorized for model training and frequent analysis. The text-CNN, LSTM, and GRU, and SVM models were applied to reclass the vectorized crash. The results showed that the text-CNN model showed the best model performance in text classification, and the AUC value of this model reached 0.90 for micro-average curves. The results from this study can engage the usage of crash narratives and help identify the actual causative reason hidden behind some inaccurate crash value designation.
ISSN:2631-4428
2631-4428
DOI:10.1093/tse/tdae018