• Ep. 246 - Part 3 - June 12, 2024

  • Jun 13 2024
  • Length: 44 mins
  • Podcast

Ep. 246 - Part 3 - June 12, 2024

  • Summary

  • ArXiv Computer Vision research for Wednesday, June 12, 2024.


    00:20: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

    02:09: APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio

    03:57: 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

    05:47: DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor

    06:58: Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

    08:02: LaneCPP: Continuous 3D Lane Detection using Physical Priors

    09:23: FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

    11:10: VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

    12:46: MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    14:39: OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    16:49: AWGUNET: Attention-Aided Wavelet Guided U-Net for Nuclei Segmentation in Histopathology Images

    18:15: Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

    19:58: Coherent Optical Modems for Full-Wavefield Lidar

    21:32: Transformation-Dependent Adversarial Attacks

    22:45: PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

    24:10: GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

    25:57: ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery

    27:26: Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement

    28:51: Real2Code: Reconstruct Articulated Objects via Code Generation

    30:02: Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

    31:42: RMem: Restricted Memory Banks Improve Video Object Segmentation

    33:12: What If We Recaption Billions of Web Images with LLaMA-3?

    34:42: Real3D: Scaling Up Large Reconstruction Models with Real-World Images

    36:07: Enhancing End-to-End Autonomous Driving with Latent World Model

    37:12: Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

    38:43: On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models

    40:16: Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

    42:15: ICE-G: Image Conditional Editing of 3D Gaussian Splats

    Show more Show less
activate_Holiday_promo_in_buybox_DT_T2

What listeners say about Ep. 246 - Part 3 - June 12, 2024

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.