source: export.arxiv.org/rss/cs.CV
maintainer: @tmaehara.bsky.social
UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment
https://arxiv.org/abs/2511.15831
UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment
https://arxiv.org/abs/2511.15831
EfficientSAM3: Progressive Hierarchical Distillation for Video Concept Segmentation from SAM1, 2, and 3
https://arxiv.org/abs/2511.15833
EfficientSAM3: Progressive Hierarchical Distillation for Video Concept Segmentation from SAM1, 2, and 3
https://arxiv.org/abs/2511.15833
WALDO: Where Unseen Model-based 6D Pose Estimation Meets Occlusion
https://arxiv.org/abs/2511.15874
WALDO: Where Unseen Model-based 6D Pose Estimation Meets Occlusion
https://arxiv.org/abs/2511.15874
Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation
https://arxiv.org/abs/2511.15875
Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation
https://arxiv.org/abs/2511.15875
Box6D : Zero-shot Category-level 6D Pose Estimation of Warehouse Boxes
https://arxiv.org/abs/2511.15884
Box6D : Zero-shot Category-level 6D Pose Estimation of Warehouse Boxes
https://arxiv.org/abs/2511.15884
RB-FT: Rationale-Bootstrapped Fine-Tuning for Video Classification
https://arxiv.org/abs/2511.15923
RB-FT: Rationale-Bootstrapped Fine-Tuning for Video Classification
https://arxiv.org/abs/2511.15923
Boosting Medical Visual Understanding From Multi-Granular Language Learning
https://arxiv.org/abs/2511.15943
Boosting Medical Visual Understanding From Multi-Granular Language Learning
https://arxiv.org/abs/2511.15943
Automated Interpretable 2D Video Extraction from 3D Echocardiography
https://arxiv.org/abs/2511.15946
Automated Interpretable 2D Video Extraction from 3D Echocardiography
https://arxiv.org/abs/2511.15946
Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click
https://arxiv.org/abs/2511.15948
Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click
https://arxiv.org/abs/2511.15948
InfoCLIP: Bridging Vision-Language Pretraining and Open-Vocabulary Semantic Segmentation via Information-Theoretic Alignment Transfer
https://arxiv.org/abs/2511.15967
InfoCLIP: Bridging Vision-Language Pretraining and Open-Vocabulary Semantic Segmentation via Information-Theoretic Alignment Transfer
https://arxiv.org/abs/2511.15967
Externally Validated Multi-Task Learning via Consistency Regularization Using Differentiable BI-RADS Features for Breast Ultrasound Tumor Segmentation
https://arxiv.org/abs/2511.15968
Externally Validated Multi-Task Learning via Consistency Regularization Using Differentiable BI-RADS Features for Breast Ultrasound Tumor Segmentation
https://arxiv.org/abs/2511.15968
UniDGF: A Unified Detection-to-Generation Framework for Hierarchical Object Visual Recognition
https://arxiv.org/abs/2511.15984
UniDGF: A Unified Detection-to-Generation Framework for Hierarchical Object Visual Recognition
https://arxiv.org/abs/2511.15984
Fairness in Multi-modal Medical Diagnosis with Demonstration Selection
https://arxiv.org/abs/2511.15986
Fairness in Multi-modal Medical Diagnosis with Demonstration Selection
https://arxiv.org/abs/2511.15986
Exploiting Inter-Sample Information for Long-tailed Out-of-Distribution Detection
https://arxiv.org/abs/2511.16015
Exploiting Inter-Sample Information for Long-tailed Out-of-Distribution Detection
https://arxiv.org/abs/2511.16015
Physically Realistic Sequence-Level Adversarial Clothing for Robust Human-Detection Evasion
https://arxiv.org/abs/2511.16020
Physically Realistic Sequence-Level Adversarial Clothing for Robust Human-Detection Evasion
https://arxiv.org/abs/2511.16020
Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution
https://arxiv.org/abs/2511.16024
Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution
https://arxiv.org/abs/2511.16024
Towards a Safer and Sustainable Manufacturing Process: Material classification in Laser Cutting Using Deep Learning
https://arxiv.org/abs/2511.16026
Towards a Safer and Sustainable Manufacturing Process: Material classification in Laser Cutting Using Deep Learning
https://arxiv.org/abs/2511.16026
CuriGS: Curriculum-Guided Gaussian Splatting for Sparse View Synthesis
https://arxiv.org/abs/2511.16030
CuriGS: Curriculum-Guided Gaussian Splatting for Sparse View Synthesis
https://arxiv.org/abs/2511.16030
Crossmodal learning for Crop Canopy Trait Estimation
https://arxiv.org/abs/2511.16031
Crossmodal learning for Crop Canopy Trait Estimation
https://arxiv.org/abs/2511.16031
LLMs-based Augmentation for Domain Adaptation in Long-tailed Food Datasets
https://arxiv.org/abs/2511.16037
LLMs-based Augmentation for Domain Adaptation in Long-tailed Food Datasets
https://arxiv.org/abs/2511.16037
AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers
https://arxiv.org/abs/2511.16047
AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers
https://arxiv.org/abs/2511.16047
LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving
https://arxiv.org/abs/2511.16049
LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving
https://arxiv.org/abs/2511.16049
VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning
https://arxiv.org/abs/2511.16077
VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning
https://arxiv.org/abs/2511.16077
SpectralTrain: A Universal Framework for Hyperspectral Image Classification
https://arxiv.org/abs/2511.16084
SpectralTrain: A Universal Framework for Hyperspectral Image Classification
https://arxiv.org/abs/2511.16084
Rad-GS: Radar-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments
https://arxiv.org/abs/2511.16091
Rad-GS: Radar-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments
https://arxiv.org/abs/2511.16091