Linked e-resources
Details
Table of Contents
Most and Least Retrievable Images in Visual-Language Query Systems
Sports Video Analysis on Large-Scale Data
Grounding Visual Representations with Texts for Domain Generalization
Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation
End-to-End Active Speaker Detection
Emotion Recognition for Multiple Context Awareness
Adaptive Fine-Grained Sketch-Based Image Retrieval
Quantized GAN for Complex Music Generation from Dance Videos
Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction
Localizing Visual Sounds the Easy Way
Learning Visual Styles from Audio-Visual Associations
Remote Respiration Monitoring of Moving Person Using Radio Signals
Camera Pose Estimation and Localization with Active Audio Sensing
PACS: A Dataset for Physical Audiovisual Commonsense Reasoning
VoViT: Low Latency Graph-Based Audio-Visual Voice Separation Transformer
Telepresence Video Quality Assessment
MultiMAE: Multi-modal Multi-task Masked Autoencoders
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
AudioVisual Segmentation
Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression
Relationformer: A Unified Framework for Image-to-Graph Generation
GAMa: Cross-view Video Geo-localization
Revisiting a kNN-based Image Classification System with High-capacity Storage
Geometric Representation Learning for Document Image Rectification
S2-VER: Semi-Supervised Visual Emotion Recognition
Image Coding for Machines with Omnipotent Feature Learning
Feature Representation Learning for Unsupervised Cross-Domain Image Retrieval
Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition
Semantic-Guided Multi-Mask Image Harmonization
Learning an Isometric Surface Parameterization for Texture Unwrapping
Towards Regression-Free Neural Networks for Diverse Compute Platforms
Relationship Spatialization for Depth Estimation
Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models
FAR: Fourier Aerial Video Recognition
Translating a Visual LEGO Manual to a Machine-Executable Plan
Fabric Material Recovery from Video Using Multi-Scale Geometric Auto-Encoder
MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment
The One Where They Reconstructed 3D Humans and Environments in TV Shows.
Sports Video Analysis on Large-Scale Data
Grounding Visual Representations with Texts for Domain Generalization
Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation
End-to-End Active Speaker Detection
Emotion Recognition for Multiple Context Awareness
Adaptive Fine-Grained Sketch-Based Image Retrieval
Quantized GAN for Complex Music Generation from Dance Videos
Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction
Localizing Visual Sounds the Easy Way
Learning Visual Styles from Audio-Visual Associations
Remote Respiration Monitoring of Moving Person Using Radio Signals
Camera Pose Estimation and Localization with Active Audio Sensing
PACS: A Dataset for Physical Audiovisual Commonsense Reasoning
VoViT: Low Latency Graph-Based Audio-Visual Voice Separation Transformer
Telepresence Video Quality Assessment
MultiMAE: Multi-modal Multi-task Masked Autoencoders
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
AudioVisual Segmentation
Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression
Relationformer: A Unified Framework for Image-to-Graph Generation
GAMa: Cross-view Video Geo-localization
Revisiting a kNN-based Image Classification System with High-capacity Storage
Geometric Representation Learning for Document Image Rectification
S2-VER: Semi-Supervised Visual Emotion Recognition
Image Coding for Machines with Omnipotent Feature Learning
Feature Representation Learning for Unsupervised Cross-Domain Image Retrieval
Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition
Semantic-Guided Multi-Mask Image Harmonization
Learning an Isometric Surface Parameterization for Texture Unwrapping
Towards Regression-Free Neural Networks for Diverse Compute Platforms
Relationship Spatialization for Depth Estimation
Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models
FAR: Fourier Aerial Video Recognition
Translating a Visual LEGO Manual to a Machine-Executable Plan
Fabric Material Recovery from Video Using Multi-Scale Geometric Auto-Encoder
MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment
The One Where They Reconstructed 3D Humans and Environments in TV Shows.