Loading…
Toward Dual-View X-Ray Baggage Inspection: A Large-Scale Benchmark and Adaptive Hierarchical Cross Refinement for Prohibited Item Discovery
Dual-view baggage inspection has been widely applied in real-world scenarios, where orthogonal viewpoints are deployed to capture diverse and complementary information. Compared with single-view, it can effectively improve the identification performance when rotation and overlay hinder the viewabili...
Saved in:
Published in: | IEEE transactions on information forensics and security 2024, Vol.19, p.3866-3878 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Dual-view baggage inspection has been widely applied in real-world scenarios, where orthogonal viewpoints are deployed to capture diverse and complementary information. Compared with single-view, it can effectively improve the identification performance when rotation and overlay hinder the viewability of the objects. However, this topic has not been rigorously explored due to the scarcity of datasets. To overcome this limitation, we contribute the first fully public large-scale D ual- v iew X-ray dataset. Our dataset, named DvXray, contains 16,000 pairs, 32,000 X-ray images, in which 15 common classes of 5,496 prohibited items are manually labeled. Besides, we propose an approach named A daptive H ierarchical C ross R efinement (AHCR) to establish a strong baseline for prohibited item discovery in dual-view X-ray images. AHCR hypothesizes that each input pair is sampled from one mixture distribution, hence gathering the non-overlapping and position-aware cues along the shared axis and complementarily delivering to the other in a hierarchical structure to enrich the feature discriminability of the objects of interest from background overlaps. Upon this structure, we propose an adaptive control strategy and a confidence-weighted view fusion term to make it robust to difficult samples. Extensive experiments on DvXray show that AHCR not only brings significant classification gains over various backbones, such as recent Swin Transformer and ConvNeXt, but also exhibits an impressively better ability to localize objects. In addition, AHCR performs favorably against the counterparts and some recent multi-view learning approaches, moving a step closer towards potential application in practice. Dataset and code are available at https://github.com/Mbwslib/DvXray . |
---|---|
ISSN: | 1556-6013 1556-6021 |
DOI: | 10.1109/TIFS.2024.3372797 |