SSM-Det: State Space Model-based Object Detector for Intelligent Transportation System

1Beihang University, 2Beijing University of Technology, 3Fujian Agriculture and Forestry University

Abstract

The State Space Model (SSM) has been a growth of interest in computer vision due to its long-term dependency modeling with linear complexity. Despite massive endeavor, it has not been extensively explored in intelligent transportation system (ITS) yet. In this paper, we propose State Space Model-based object Detector (SSM-Det), that is meticulously curated with Direction-aware Visual State Space Encoder (D-VSSE). Specifically, it customizes multi-path pixel exchange and patch re-arrangement via four-direction scanning mechanism, promoting for information communication. To bridge the information bottleneck across high-low level, we further design Split-Fusion (SF) and Skip-Connection (SC) modules for contextual feature propagation before decoding: SF performs multi-channel semantic separation and re-weighting in global-local scope, while SC is responsible for cross-layer feature interaction in a cascaded manner. Empirical studies is conducted on both VisDrone2019-DET and SEU_PML benchmarks, and our proposed SSM-Det reports the state-of-the-art performance against all counterparts by a substantial margin, while maintaining the real-time inference speed. We hope this work contributes to the in-depth investigation of SSM-based detector for intelligent transportation applications.

Pipeline

pipeline

The overall structure of State Space Model-based object Detector (SSM-Det), that is mainly composed of Backbone, Direction-aware Visual State-Space Encoder (D-VSSE), Split-Fusion (SF), Skip-Connection (SC), Transformer Decoder and Header. After feature extraction, D-VSSE performs the long-term dependency modeling at the expense of linear complexity, and SF-SC fuses multi-scale representations for cross-layer information interaction. Finally, Transformer Decoder adopts Uncertainty-minimal Query Selection and Detection Header for high-quality box prediction. The U, D and Rep mark Up-sampled, Down-sampled convolution and Re-parametrization.

Quantitative Results

Visdrone 2019-DET

(a) Performance comparisons with the state-of-the-arts on the Visdrone 2019-DET.

seu_pml

(b) Performance comparisons with the state-of-the-arts on the SEU_PML dataset.

Visualization

Visdrone 2019-DET

(a) Performance comparisons with the state-of-the-arts on the Visdrone 2019-DET.

seu_pml

(b) Performance comparisons with the state-of-the-arts on the SEU_PML dataset.

BibTeX


      @article{wangSSM-Det,
        author={Wang, Jiaqi and Lin, Chunmian and Guo, Kan and Guo, Jiangang},
        journal={IEEE Transactions on Intelligent Transportation Systems}, 
        title={SSM-Det: State Space Model-Based Object Detector for Intelligent Transportation System}, 
        year={2026},
        volume={27},
        number={4},
        pages={4587--4597},
        doi={10.1109/TITS.2025.3640934}}