• Ep. 246 - Part 1 - June 12, 2024

  • Jun 13 2024
  • Length: 46 mins
  • Podcast

Ep. 246 - Part 1 - June 12, 2024

  • Summary

  • ArXiv Computer Vision research for Wednesday, June 12, 2024.


    00:20: FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image

    01:21: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

    02:49: Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification

    04:26: Flexible Music-Conditioned Dance Generation with Style Description Prompts

    05:52: Robust 3D Face Alignment with Multi-Path Neural Architecture Search

    07:00: Small Scale Data-Free Knowledge Distillation

    08:48: KernelWarehouse: Rethinking the Design of Dynamic Convolution

    10:31: A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

    12:34: Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

    14:02: IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

    14:54: Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

    16:30: DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

    18:10: Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

    20:07: Accurate Explanation Model for Image Classifiers using Class Association Embedding

    21:55: Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network

    23:11: SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation

    24:06: Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

    25:34: OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

    26:58: Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

    28:26: Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

    29:52: Deep Learning for Slum Mapping in Remote Sensing Images: A Meta-analysis and Review

    31:49: LVBench: An Extreme Long Video Understanding Benchmark

    33:14: Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

    34:48: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR

    36:23: 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement

    37:29: MWIRSTD: A MWIR Small Target Detection Dataset

    38:34: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

    40:27: A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

    42:35: Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

    44:26: Identification of Conversation Partners from Egocentric Video

    Show more Show less
activate_Holiday_promo_in_buybox_DT_T2

What listeners say about Ep. 246 - Part 1 - June 12, 2024

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.