Loading…
Crash causing information extraction via text mining techniques: Implementation of the Chinese state-related crash narratives
Crash data is the foundation of traffic safety analysis, which can help the experts find the cause of the crashes and propose corresponding countermeasures. In China, the accident reporting form (ARF) only allows reporting one crash cause for each crash based on the prespecified crash cause code. Th...
Saved in:
Published in: | Transportation safety and environment Online 2024-09 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Crash data is the foundation of traffic safety analysis, which can help the experts find the cause of the crashes and propose corresponding countermeasures. In China, the accident reporting form (ARF) only allows reporting one crash cause for each crash based on the prespecified crash cause code. This designation may lead to inaccuracy in recording crash data, especially for state-related crashes. The crash narratives, which is the responding officer's written account of what occurred before, during, and after the crash, contain considerable free-form information associated with the crash occurrence. This study investigated the directly contributory factors behind the state-related crashes through the development of natural language processing (NLP) and deep learning models based on 1 625 state-related crash narratives. According to the directly causative factors described in the crash narratives, the state-related crashes were labeled by speed-related, turning-related, and other causes. Then the crash narratives were vectorized for model training and frequent analysis. The text-CNN, LSTM, and GRU, and SVM models were applied to reclass the vectorized crash. The results showed that the text-CNN model showed the best model performance in text classification, and the AUC value of this model reached 0.90 for micro-average curves. The results from this study can engage the usage of crash narratives and help identify the actual causative reason hidden behind some inaccurate crash value designation. |
---|---|
ISSN: | 2631-4428 2631-4428 |
DOI: | 10.1093/tse/tdae018 |