• Ep. 246 - Part 1 - June 12, 2024

  • Jun 13 2024
  • Length: 46 mins
  • Podcast

Ep. 246 - Part 1 - June 12, 2024

  • Summary

  • ArXiv Computer Vision research for Wednesday, June 12, 2024.


    00:20: FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image

    01:21: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

    02:49: Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification

    04:26: Flexible Music-Conditioned Dance Generation with Style Description Prompts

    05:52: Robust 3D Face Alignment with Multi-Path Neural Architecture Search

    07:00: Small Scale Data-Free Knowledge Distillation

    08:48: KernelWarehouse: Rethinking the Design of Dynamic Convolution

    10:31: A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

    12:34: Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

    14:02: IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

    14:54: Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

    16:30: DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

    18:10: Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

    20:07: Accurate Explanation Model for Image Classifiers using Class Association Embedding

    21:55: Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network

    23:11: SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation

    24:06: Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

    25:34: OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

    26:58: Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

    28:26: Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

    29:52: Deep Learning for Slum Mapping in Remote Sensing Images: A Meta-analysis and Review

    31:49: LVBench: An Extreme Long Video Understanding Benchmark

    33:14: Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

    34:48: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR

    36:23: 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement

    37:29: MWIRSTD: A MWIR Small Target Detection Dataset

    38:34: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

    40:27: A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

    42:35: Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

    44:26: Identification of Conversation Partners from Egocentric Video

    Show More Show Less

What listeners say about Ep. 246 - Part 1 - June 12, 2024

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.