Data science research papers

Channel

2.09K

subscribers

Stay updated with the latest data science research! Join our Telegram channel for quick insights, cutting-edge papers, and trends in AI, machine learning, and big data.

Data science research papers

Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation

Publication date: 11 Oct 2024

Topic: Semantic Segmentation

Paper: https://arxiv.org/pdf/2410.08613v1.pdf

GitHub: https://github.com/hit-sirs/crobim

Description:

In contrast to natural scenarios, expressions in RRSIS often involve complex geospatial relationships, with target objects of interest that vary significantly in scale and lack visual saliency, thereby increasing the difficulty of achieving precise segmentation. To address the aforementioned challenges, a novel RRSIS framework is proposed, termed the cross-modal bidirectional interaction model (CroBIM). Specifically, a context-aware prompt modulation (CAPM) module is designed to integrate spatial positional relationships and task-specific knowledge into the linguistic features, thereby enhancing the ability to capture the target object.

82 views07:03

Data science research papers

Hyper-Representations: Learning from Populations of Neural Networks

Publication date: 7 Oct 2024

Topic: Representation Learning

Paper: https://arxiv.org/pdf/2410.05107v1.pdf

GitHub: https://github.com/hsg-aiml/sane

Description:

This thesis addresses the challenge of understanding Neural Networks through the lens of their most fundamental component: the weights, which encapsulate the learned information and determine the model behavior. At the core of this thesis is a fundamental question: Can we learn general, task-agnostic representations from populations of Neural Network models? The key contribution of this thesis to answer that question are hyper-representations, a self-supervised method to learn representations of NN weights. Work in this thesis finds that trained NN models indeed occupy meaningful structures in the weight space, that can be learned and used.

190 views07:51

Data science research papers

CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models

Publication date: 17 Oct 2024

Topic: Contrastive Learning

Paper: https://arxiv.org/pdf/2410.13267v1.pdf

GitHub: https://github.com/sanderwood/clamp2

Description:

We introduce CLaMP 2, a system compatible with 101 languages that supports both ABC notation (a text-based musical notation format) and MIDI (Musical Instrument Digital Interface) for music information retrieval. CLaMP 2, pre-trained on 1.5 million ABC-MIDI-text triplets, includes a multilingual text encoder and a multimodal music encoder aligned via contrastive learning. By leveraging large language models, we obtain refined and consistent multilingual descriptions at scale, significantly reducing textual noise and balancing language distribution.

256 views07:14

Data science research papers

V2M: Visual 2-Dimensional Mamba for Image Representation Learning

Publication date: 14 Oct 2024

Topic: Object detection

Paper: https://arxiv.org/pdf/2410.10382v1.pdf

GitHub: https://github.com/wangck20/v2m

Description:

In this paper, we propose a Visual 2-Dimensional Mamba (V2M) model as a complete solution, which directly processes image tokens in the 2D space. We first generalize SSM to the 2-dimensional space which generates the next state considering two adjacent states on both dimensions (e.g., columns and rows). We then construct our V2M based on the 2-dimensional SSM formulation and incorporate Mamba to achieve hardware-efficient parallel processing. The proposed V2M effectively incorporates the 2D locality prior yet inherits the efficiency and input-dependent scalability of Mamba.

298 views07:10

Data science research papers

SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification

Publication date: 7 Oct 2024

Topic: Image Classification

Paper: https://arxiv.org/pdf/2410.05057v1.pdf

GitHub: https://github.com/jimmyxu123/select

Description:

In this work, we take steps towards a formal evaluation of data curation strategies and introduce SELECT, the first large-scale benchmark of curation strategies for image classification. In order to generate baseline methods for the SELECT benchmark, we create a new dataset, ImageNet++, which constitutes the largest superset of ImageNet-1K to date. Our dataset extends ImageNet with 5 new training-data shifts, each approximately the size of ImageNet-1K itself, and each assembled using a distinct curation strategy. We evaluate our data curation baselines in two ways: (i) using each training-data shift to train identical image classification models from scratch (ii) using the data itself to fit a pretrained self-supervised representation.

338 views07:07

Data science research papers

Text4Seg: Reimagining Image Segmentation as Text Generation

Publication date: 13 Oct 2024

Topic: Semantic Segmentation

Paper: https://arxiv.org/pdf/2410.09855v1.pdf

GitHub: https://github.com/mc-lan/text4seg

Description:

In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly simplifying the segmentation process. Our key innovation is semantic descriptors, a new textual representation of segmentation masks where each image patch is mapped to its corresponding text label. This unified representation allows seamless integration into the auto-regressive training pipeline of MLLMs for easier optimization. We demonstrate that representing an image with
semantic descriptors yields competitive segmentation performance.

358 views07:54

Data science research papers

Explanation-Preserving Augmentation for Semi-Supervised Graph Representation Learning

Publication date: 16 Oct 2024

Topic: Representation Learning

Paper: https://arxiv.org/pdf/2410.12657v1.pdf

GitHub: https://github.com/junxia97/simgrace

Description:

In this paper, we propose a novel method, Explanation-Preserving Augmentation (EPA), that leverages graph explanation techniques for generating augmented graphs that can bridge the gap between semantics-preservation and data-perturbation. EPA first uses a small number of labels to train a graph explainer to infer sub-structures (explanations) that are most relevant to a graph's semantics. These explanations are then used to generate semantics-preserving augmentations for self-supervised GRL, namely EPA-GRL. We demonstrate theoretically, using an analytical example, and through extensive experiments on a variety of benchmark datasets that EPA-GRL outperforms the state-of-the-art (SOTA) GRL methods, which are built upon semantics-agnostic data augmentations.

391 views07:48

Data science research papers

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Publication date: EMNLP 2021

Topic: Contrastive Learning

Paper: https://arxiv.org/pdf/2104.08821v4.pdf

GitHub: https://github.com/princeton-nlp/SimCSE

Description:

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings. We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise. This simple method works surprisingly well, performing on par with previous supervised counterparts. We find that dropout acts as minimal data augmentation, and removing it leads to a representation collapse. Then, we propose a supervised approach, which incorporates annotated pairs from natural language inference datasets into our contrastive learning framework by using "entailment" pairs as positives and "contradiction" pairs as hard negatives.

426 views07:48

Data science research papers

OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images

Publication date: IEEE Transactions on Geoscience and Remote Sensing 2024

Topic: Object detection

Paper: https://arxiv.org/pdf/2409.19648v1.pdf

GitHub: https://github.com/wokaikaixinxin/OrientedFormer

Description:

In this paper, we propose an end-to-end transformer-based oriented object detector, consisting of three dedicated modules to address these issues. First, Gaussian positional encoding is proposed to encode the angle, position, and size of oriented boxes using Gaussian distributions. Second, Wasserstein self-attention is proposed to introduce geometric relations and facilitate interaction between content and positional queries by utilizing Gaussian Wasserstein distance scores. Third, oriented cross-attention is proposed to align values and positional queries by rotating sampling points around the positional query according to their angles.

444 views07:45

Data science research papers

KPCA-CAM: Visual Explainability of Deep Computer Vision Models using Kernel PCA

Publication date: 30 Sep 2024

Topic: Image Classification

Paper: https://arxiv.org/pdf/2410.00267v1.pdf

GitHub: https://github.com/jacobgil/pytorch-grad-cam

Description:

This research introduces KPCA-CAM, a technique designed to enhance the interpretability of Convolutional Neural Networks (CNNs) through improved class activation maps. KPCA-CAM leverages Principal Component Analysis (PCA) with the kernel trick to capture nonlinear relationships within CNN activations more effectively. By mapping data into higher-dimensional spaces with kernel functions and extracting principal components from this transformed hyperplane, KPCA-CAM provides more accurate representations of the underlying data manifold. This enables a deeper understanding of the features influencing CNN decisions.

484 views07:36

Data science research papers

MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model

Publication date: 8 Oct 2024

Topic: Semantic Segmentation

Paper: https://arxiv.org/pdf/2410.05905v1.pdf

GitHub: https://github.com/yeerwen/uniseg

Description:

We evaluate MedUniSeg on a comprehensive multi-modal upstream dataset consisting of 17 sub-datasets. The results demonstrate that MedUniSeg achieves superior multi-task segmentation performance, attaining a 1.2% improvement in the mean Dice score across the 17 upstream tasks compared to nnUNet baselines, while using less than 1/10 of the parameters. For tasks that underperform during the initial multi-task joint training, we freeze MedUniSeg and introduce new modules to re-learn these tasks. This approach yields an enhanced version, MedUniSeg*, which consistently outperforms MedUniSeg across all tasks.

480 views07:54

Data science research papers

Unsupervised Representation Learning from Sparse Transformation Analysis

Publication date: 7 Oct 2024

Topic: Representation Learning

Paper: https://arxiv.org/pdf/2410.05564v1.pdf

GitHub: https://github.com/kingjamessong/latent-flow

Description:

In this paper we propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components. Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model, before being decoded to predict a future input state. The flow model is decomposed into a number of rotational (divergence-free) vector fields and a number of potential flow (curl-free) fields. Our sparsity prior encourages only a small number of these fields to be active at any instant and infers the speed with which the probability flows along these fields.

491 views07:25

Data science research papers

Improved Baselines with Momentum Contrastive Learning

Publication date: 9 Mar 2020

Topic: Contrastive Learning

Paper: https://arxiv.org/pdf/2003.04297v1.pdf

GitHub: https://github.com/facebookresearch/moco

Description:

Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR's design improvements by implementing them in the MoCo framework. With simple modifications to MoCo---namely, using an MLP projection head and more data augmentation---we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible. Code will be made public.

488 views07:44

Data science research papers

HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes

Publication date: 30 Sep 2024

Topic: Object detection

Paper: https://arxiv.org/pdf/2409.19833v1.pdf

GitHub: https://github.com/grokcv/hazydet

Description:

We introduce HazyDet, a large-scale dataset tailored for drone-based object detection in hazy scenes. It encompasses 383,000 real-world instances, collected from both naturally hazy environments and normal scenes with synthetically imposed haze effects to simulate adverse weather conditions. By observing the significant variations in object scale and clarity under different depth and haze conditions, we designed a Depth Conditioned Detector (DeCoDet) to incorporate this prior knowledge. DeCoDet features a Multi-scale Depth-aware Detection Head that seamlessly integrates depth perception, with the resulting depth cues harnessed by a dynamic Depth Condition Kernel module.

514 views07:43

Data science research papers

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Publication date: 09 June 2024

Topic: Image Classification

Paper: https://arxiv.org/pdf/2410.07170v1.pdf

GitHub: https://github.com/ml-jku/EVA

Description:

We propose to enhance LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition on minibatches of activation vectors. Then, we initialize the LoRA matrices with the obtained right-singular vectors and re-distribute ranks among all weight matrices to explain the maximal amount of variance and continue the standard LoRA fine-tuning procedure. This results in our new method Explained Variance Adaptation (EVA). We apply EVA to a variety of fine-tuning tasks ranging from language generation and understanding to image classification and reinforcement learning. EVA exhibits faster convergence than competitors and attains the highest average score across a multitude of tasks per domain.

521 views07:32

Data science research papers

Towards Natural Image Matting in the Wild via Real-Scenario Prior

Publication date: 9 Oct 2024

Topic: Semantic Segmentation

Paper: https://arxiv.org/pdf/2410.06593v1.pdf

GitHub: https://github.com/xiarho/semat

Description:

We propose SEMat which revamps the network architecture and training objectives. For network architecture, the proposed feature-aligned transformer learns to extract fine-grained edge and transparency features. The proposed matte-aligned decoder aims to segment matting-specific objects and convert coarse masks into high-precision mattes. For training objectives, the proposed regularization and trimap loss aim to retain the prior from the pre-trained model and push the matting logits extracted from the mask decoder to contain trimap-based semantic information. Extensive experiments across seven diverse datasets demonstrate the superior performance of our method, proving its efficacy in interactive natural image matting.

520 views07:48

Data science research papers

UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation

Publication date: 14 Oct 2024

Topic: Semantic Segmentation

Paper: https://arxiv.org/pdf/2410.10777v1.pdf

GitHub: https://github.com/LiheYoung/UniMatch-V2

Description:

In this work, we argue that, it is necessary to switch the baseline of SSS from ResNet-based encoders to more capable ViT-based encoders (e.g., DINOv2) that are pre-trained on massive data. A simple update on the encoder (even using 2x fewer parameters) can bring more significant improvement than careful method designs. Built on this competitive baseline, we present our upgraded and simplified UniMatch V2, inheriting the core spirit of weak-to-strong consistency from V1, but requiring less training cost and providing consistently better results. Additionally, witnessing the gradually saturated performance on Pascal and Cityscapes, we appeal that we should focus on more challenging benchmarks with complex taxonomy, such as ADE20K and COCO datasets.

530 views07:42

Data science research papers

MatMamba: A Matryoshka State Space Model

Publication date: 9 Oct 2024

Topic: Representation Learning

Paper: https://arxiv.org/pdf/2410.06718v1.pdf

GitHub: https://github.com/scaledfoundations/matmamba

Description:

In this work, we present MatMamba: a state space model which combines Matryoshka-style learning with Mamba2, by modifying the block to contain nested dimensions to enable joint training and adaptive inference. MatMamba allows for efficient and adaptive deployment across various model sizes. We train a single large MatMamba model and are able to get a number of smaller nested models for free -- while maintaining or improving upon the performance of a baseline smaller model trained from scratch. We train language and image models at a variety of parameter sizes from 35M to 1.4B. Our results on ImageNet and FineWeb show that MatMamba models scale comparably to Transformers, while having more efficient inference characteristics.

531 views07:19

Data science research papers

Momentum Contrast for Unsupervised Visual Representation Learning

Publication date: CVPR 2020

Topic: Contrastive Learning

Paper: https://arxiv.org/pdf/1911.05722v3.pdf

GitHub: https://github.com/facebookresearch/moco

Description:

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins.

511 views07:36

Data science research papers

OSSA: Unsupervised One-Shot Style Adaptation

Publication date: 1 Oct 2024

Topic: Object detection

Paper: https://arxiv.org/pdf/2410.00900v1.pdf

GitHub: https://github.com/robingerster7/ossa

Description:

We introduce One-Shot Style Adaptation (OSSA), a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Specifically, OSSA generates diverse target styles by perturbing the style statistics derived from a single target image and then applies these styles to a labeled source dataset at the feature level using Adaptive Instance Normalization (AdaIN). Extensive experiments show that OSSA establishes a new state-of-the-art among one-shot domain adaptation methods by a significant margin, and in some cases, even outperforms strong baselines that use thousands of unlabeled target images.

533 views07:40