• Ep. 247 - Part 2 - June 13, 2024

  • Jun 15 2024
  • Length: 53 mins
  • Podcast

Ep. 247 - Part 2 - June 13, 2024

  • Summary

  • ArXiv Computer Vision research for Thursday, June 13, 2024.


    00:21: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

    02:11: Large-Scale Evaluation of Open-Set Image Classification Techniques

    03:43: PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation

    05:00: MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

    06:41: Auto-Vocabulary Segmentation for LiDAR Points

    07:30: AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

    08:43: EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

    10:23: Fine-Grained Domain Generalization with Feature Structuralization

    12:03: SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution

    14:13: ReMI: A Dataset for Reasoning with Multiple Images

    15:41: A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

    17:26: Thoracic Surgery Video Analysis for Surgical Phase Recognition

    18:58: Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval

    20:40: Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

    22:26: CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification

    24:22: Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

    25:21: Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns

    26:30: WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

    27:44: MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

    29:28: Comparison Visual Instruction Tuning

    30:51: MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

    32:14: Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV

    33:10: Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

    34:33: Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

    36:04: StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning

    37:30: Parameter-Efficient Active Learning for Foundational models

    38:31: Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

    40:22: Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    42:38: Towards AI Lesion Tracking in PET/CT Imaging: A Siamese-based CNN Pipeline applied on PSMA PET/CT Scans

    44:36: Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

    46:19: Instance-level quantitative saliency in multiple sclerosis lesion segmentation

    48:37: CMC-Bench: Towards a New Paradigm of Visual Signal Compression

    50:05: Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

    52:05: CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models

    Show More Show Less
activate_samplebutton_t1

What listeners say about Ep. 247 - Part 2 - June 13, 2024

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.