Audible Premium Plus. $0.99/mo for the first 3 months + $20 Audible credits. 12 days only. Get this deal. Cancel anytime.

Ep. 245 - Part 3 - June 11, 2024
Jun 13 2024
Length: 38 mins
Podcast

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

Ep. 245 - Part 3 - June 11, 2024

Listen for free

View show details

Summary
ArXiv Computer Vision research for Tuesday, June 11, 2024.

00:21: DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

01:44: Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

02:49: Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

04:04: OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

06:01: 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

07:24: VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

08:58: Image Neural Field Diffusion Models

10:11: Comparing Deep Learning Models for Rice Mapping in Bhutan Using High Resolution Satellite Imagery

12:29: GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

14:26: ReduceFormer: Attention with Tensor Reduction by Summation

15:23: Trim 3D Gaussian Splatting for Accurate Geometry Representation

16:44: SPIN: Spacecraft Imagery for Navigation

18:24: Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

20:00: Understanding Visual Concepts Across Models

21:12: Instant 3D Human Avatar Generation using Image Diffusion Models

22:47: Neural Gaffer: Relighting Any Object via Diffusion

24:19: Autoregressive Pretraining with Mamba in Vision

25:51: Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

27:19: Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

28:50: Situational Awareness Matters in 3D Vision Language Reasoning

30:10: Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

31:46: Zero-shot Image Editing with Reference Imitation

33:08: Image and Video Tokenization with Binary Spherical Quantization

34:18: An Image is Worth 32 Tokens for Reconstruction and Generation

36:28: Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

Show more Show less

Show more Show less

What listeners say about Ep. 245 - Part 3 - June 11, 2024

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.

Audible.com reviews

Amazon reviews

No Reviews are Available

Report a review on Amazon