Text4Seg: Reimagining Image Segmentation as Text Generation
Publication date: 13 Oct 2024
Topic: Semantic Segmentation
Paper: https://arxiv.org/pdf/2410.09855v1.pdfGitHub: https://github.com/mc-lan/text4segDescription:
In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly simplifying the segmentation process. Our key innovation is semantic descriptors, a new textual representation of segmentation masks where each image patch is mapped to its corresponding text label. This unified representation allows seamless integration into the auto-regressive training pipeline of MLLMs for easier optimization. We demonstrate that representing an image with
semantic descriptors yields competitive segmentation performance.