• Ep. 247 - Part 3 - June 13, 2024

  • Jun 15 2024
  • Length: 52 mins
  • Podcast

Ep. 247 - Part 3 - June 13, 2024

  • Summary

  • ArXiv Computer Vision research for Thursday, June 13, 2024.


    00:21: LRM-Zero: Training Large Reconstruction Models with Synthesized Data

    01:56: Scale-Invariant Monocular Depth Estimation via SSI Depth

    03:08: GGHead: Fast and Generalizable 3D Gaussian Heads

    04:55: Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

    06:34: Towards Vision-Language Geo-Foundation Model: A Survey

    08:11: SimGen: Simulator-conditioned Driving Scene Generation

    09:44: Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition

    11:03: Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

    12:32: LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

    13:56: WonderWorld: Interactive 3D Scene Generation from a Single Image

    15:21: Modeling Ambient Scene Dynamics for Free-view Synthesis

    16:29: Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA

    17:50: Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

    19:39: Real-Time Deepfake Detection in the Real-World

    21:17: OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

    23:02: Yo'LLaVA: Your Personalized Language and Vision Assistant

    24:30: MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

    26:26: Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

    28:03: Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

    29:59: ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

    31:24: 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

    33:16: Towards Evaluating the Robustness of Visual State Space Models

    34:57: Data Attribution for Text-to-Image Models by Unlearning Synthesized Images

    36:09: CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras

    37:37: Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

    40:02: MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    41:40: Explore the Limits of Omni-modal Pretraining at Scale

    42:46: Interpreting the Weight Space of Customized Diffusion Models

    43:58: Depth Anything V2

    45:12: An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

    46:23: Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

    48:11: Rethinking Score Distillation as a Bridge Between Image Distributions

    49:44: VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

    Show More Show Less

What listeners say about Ep. 247 - Part 3 - June 13, 2024

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.