ArXiv Computer Vision research for Wednesday, June 12, 2024.
00:20: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
02:09: APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio
03:57: 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction
05:47: DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor
06:58: Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze
08:02: LaneCPP: Continuous 3D Lane Detection using Physical Priors
09:23: FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
11:10: VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
12:46: MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
14:39: OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
16:49: AWGUNET: Attention-Aided Wavelet Guided U-Net for Nuclei Segmentation in Histopathology Images
18:15: Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
19:58: Coherent Optical Modems for Full-Wavefield Lidar
21:32: Transformation-Dependent Adversarial Attacks
22:45: PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement
24:10: GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
25:57: ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery
27:26: Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement
28:51: Real2Code: Reconstruct Articulated Objects via Code Generation
30:02: Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models
31:42: RMem: Restricted Memory Banks Improve Video Object Segmentation
33:12: What If We Recaption Billions of Web Images with LLaMA-3?
34:42: Real3D: Scaling Up Large Reconstruction Models with Real-World Images
36:07: Enhancing End-to-End Autonomous Driving with Latent World Model
37:12: Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation
38:43: On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models
40:16: Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
42:15: ICE-G: Image Conditional Editing of 3D Gaussian Splats