• Ep. 245 - Part 3 - June 11, 2024

  • Jun 13 2024
  • Length: 38 mins
  • Podcast

Ep. 245 - Part 3 - June 11, 2024

  • Summary

  • ArXiv Computer Vision research for Tuesday, June 11, 2024.


    00:21: DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

    01:44: Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

    02:49: Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

    04:04: OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

    06:01: 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

    07:24: VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

    08:58: Image Neural Field Diffusion Models

    10:11: Comparing Deep Learning Models for Rice Mapping in Bhutan Using High Resolution Satellite Imagery

    12:29: GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

    14:26: ReduceFormer: Attention with Tensor Reduction by Summation

    15:23: Trim 3D Gaussian Splatting for Accurate Geometry Representation

    16:44: SPIN: Spacecraft Imagery for Navigation

    18:24: Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

    20:00: Understanding Visual Concepts Across Models

    21:12: Instant 3D Human Avatar Generation using Image Diffusion Models

    22:47: Neural Gaffer: Relighting Any Object via Diffusion

    24:19: Autoregressive Pretraining with Mamba in Vision

    25:51: Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

    27:19: Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

    28:50: Situational Awareness Matters in 3D Vision Language Reasoning

    30:10: Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

    31:46: Zero-shot Image Editing with Reference Imitation

    33:08: Image and Video Tokenization with Binary Spherical Quantization

    34:18: An Image is Worth 32 Tokens for Reconstruction and Generation

    36:28: Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

    Show more Show less
activate_Holiday_promo_in_buybox_DT_T2

What listeners say about Ep. 245 - Part 3 - June 11, 2024

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.