• Ep. 246 - Part 3 - June 12, 2024

  • Jun 13 2024
  • Length: 44 mins
  • Podcast

Ep. 246 - Part 3 - June 12, 2024

  • Summary

  • ArXiv Computer Vision research for Wednesday, June 12, 2024.


    00:20: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

    02:09: APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio

    03:57: 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

    05:47: DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor

    06:58: Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

    08:02: LaneCPP: Continuous 3D Lane Detection using Physical Priors

    09:23: FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

    11:10: VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

    12:46: MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    14:39: OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    16:49: AWGUNET: Attention-Aided Wavelet Guided U-Net for Nuclei Segmentation in Histopathology Images

    18:15: Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

    19:58: Coherent Optical Modems for Full-Wavefield Lidar

    21:32: Transformation-Dependent Adversarial Attacks

    22:45: PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

    24:10: GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

    25:57: ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery

    27:26: Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement

    28:51: Real2Code: Reconstruct Articulated Objects via Code Generation

    30:02: Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

    31:42: RMem: Restricted Memory Banks Improve Video Object Segmentation

    33:12: What If We Recaption Billions of Web Images with LLaMA-3?

    34:42: Real3D: Scaling Up Large Reconstruction Models with Real-World Images

    36:07: Enhancing End-to-End Autonomous Driving with Latent World Model

    37:12: Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

    38:43: On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models

    40:16: Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

    42:15: ICE-G: Image Conditional Editing of 3D Gaussian Splats

    Show More Show Less

What listeners say about Ep. 246 - Part 3 - June 12, 2024

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.