BEVFormer: Learning Birds-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
Category-Level 6D Object Pose and Size Estimation Using Self-Supervised Deep Prior Deformation Networks
Dense Teacher: Dense Pseudo-Labels for Semi-Supervised Object Detection
Point-to-Box Network for Accurate Object Detection via Single Point Supervision
Domain Adaptive Hand Keypoint and Pixel Localization in the Wild
Towards Data-Efficient Detection Transformers
Open-Vocabulary DETR with Conditional Matching
Prediction-Guided Distillation for Dense Object Detection
Multimodal Object Detection via Probabilistic Ensembling
Exploiting Unlabeled Data with Vision and Language Models for Object Detection
CPO: Change Robust Panorama to Point Cloud Localization
INT: Towards Infinite-Frames 3D Detection with an Efficient Framework
End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution
Calibration-Free Multi-View Crowd Counting
Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training
SuperLine3D: Self-Supervised Line Segmentation and Description for LiDAR Point Cloud
Exploring Plain Vision Transformer Backbones for Object Detection
Adversarially-Aware Robust Object Detector
HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
You Should Look at All Objects
Detecting Twenty-Thousand Classes Using Image-Level Supervision
DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation
Monocular 3D Object Detection with Depth from Motion
DISP6D: Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose Estimation
Distilling Object Detectors with Global Knowledge
Unifying Visual Perception by Dispersible Points Learning
PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection
Exploring Resolution and Degradation Clues As Self-Supervised Signal for Low Quality Object Detection
Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features
Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection
RFLA: Gaussian Receptive Field Based Label Assignment for Tiny Object Detection
Rethinking IoU-Based Optimization for Single-Stage 3D Object Detection
TD-Road: Top-Down Road Network Extraction with Holistic Graph Construction
Multi-faceted Distillation of Base-Novel Commonality for Few-Shot Object Detection
PointCLM: A Contrastive Learning-Based Framework for Multi-Instance Point Cloud Registration
Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration
MTTrans: Cross-Domain Object Detection with Mean Teacher Transformer
Multi-Domain Multi-Definition Landmark Localization for Small Datasets
DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection
Label-Guided Auxiliary Training Improves 3D Object Detector
PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images
Densely Constrained Depth Estimator for Monocular 3D Object Detection
Polarimetric Pose Prediction.

