• Ep. 245 - Part 3 - June 11, 2024

  • Jun 13 2024
  • Length: 38 mins
  • Podcast

Ep. 245 - Part 3 - June 11, 2024

  • Summary

  • ArXiv Computer Vision research for Tuesday, June 11, 2024.


    00:21: DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

    01:44: Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

    02:49: Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

    04:04: OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

    06:01: 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

    07:24: VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

    08:58: Image Neural Field Diffusion Models

    10:11: Comparing Deep Learning Models for Rice Mapping in Bhutan Using High Resolution Satellite Imagery

    12:29: GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

    14:26: ReduceFormer: Attention with Tensor Reduction by Summation

    15:23: Trim 3D Gaussian Splatting for Accurate Geometry Representation

    16:44: SPIN: Spacecraft Imagery for Navigation

    18:24: Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

    20:00: Understanding Visual Concepts Across Models

    21:12: Instant 3D Human Avatar Generation using Image Diffusion Models

    22:47: Neural Gaffer: Relighting Any Object via Diffusion

    24:19: Autoregressive Pretraining with Mamba in Vision

    25:51: Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

    27:19: Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

    28:50: Situational Awareness Matters in 3D Vision Language Reasoning

    30:10: Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

    31:46: Zero-shot Image Editing with Reference Imitation

    33:08: Image and Video Tokenization with Binary Spherical Quantization

    34:18: An Image is Worth 32 Tokens for Reconstruction and Generation

    36:28: Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

    Show More Show Less
activate_samplebutton_t1

What listeners say about Ep. 245 - Part 3 - June 11, 2024

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.