V2M: Visual 2-Dimensional Mamba for Image Representation Learning
Publication date: 14 Oct 2024
Topic: Object detection
Paper: https://arxiv.org/pdf/2410.10382v1.pdfGitHub: https://github.com/wangck20/v2mDescription:
In this paper, we propose a Visual 2-Dimensional Mamba (V2M) model as a complete solution, which directly processes image tokens in the 2D space. We first generalize SSM to the 2-dimensional space which generates the next state considering two adjacent states on both dimensions (e.g., columns and rows). We then construct our V2M based on the 2-dimensional SSM formulation and incorporate Mamba to achieve hardware-efficient parallel processing. The proposed V2M effectively incorporates the 2D locality prior yet inherits the efficiency and input-dependent scalability of Mamba.