CVPR 2021 论文和开源项目合集(Papers with Code)
- Best Paper
- Backbone
- NAS
- GAN
- VAE
- Visual Transformer
- Regularization
- SLAM
- 长尾分布(Long-Tailed)
- 数据增广(Data Augmentation)
- 无监督/自监督(Self-Supervised)
- 半监督(Semi-Supervised)
- 胶囊网络(Capsule Network)
- 图像分类(Image Classification
- 2D目标检测(Object Detection)
- 单/多目标跟踪(Object Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video-Object-Segmentation)
- 交互式视频目标分割(Interactive-Video-Object-Segmentation)
- 显著性检测(Saliency Detection)
- 伪装物体检测(Camouflaged Object Detection)
- 协同显著性检测(Co-Salient Object Detection)
- 图像抠图(Image Matting)
- 行人重识别(Person Re-identification)
- 行人搜索(Person Search)
- 视频理解/行为识别(Video Understanding)
- 人脸识别(Face Recognition)
- 人脸检测(Face Detection)
- 人脸活体检测(Face Anti-Spoofing)
- Deepfake检测(Deepfake Detection)
- 人脸年龄估计(Age-Estimation)
- 人脸表情识别(Facial-Expression-Recognition)
- Deepfakes
- 人体解析(Human Parsing)
- 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
- 动物姿态估计(Animal Pose Estimation)
- 手部姿态估计(Hand Pose Estimation)
- Human Volumetric Capture
- 场景文本识别(Scene Text Recognition)
- 图像压缩(Image Compression)
- 模型压缩/剪枝/量化
- 知识蒸馏(Knowledge Distillation)
- 超分辨率(Super-Resolution)
- 去雾(Dehazing)
- 图像恢复(Image Restoration)
- 图像补全(Image Inpainting)
- 图像编辑(Image Editing)
- 图像描述(Image Captioning)
- 字体生成(Font Generation)
- 图像匹配(Image Matching)
- 图像融合(Image Blending)
- 反光去除(Reflection Removal)
- 3D点云分类(3D Point Clouds Classification)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D全景分割(3D Panoptic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D点云配准(3D Point Cloud Registration)
- 3D点云补全(3D-Point-Cloud-Completion)
- 3D重建(3D Reconstruction)
- 6D位姿估计(6D Pose Estimation)
- 相机姿态估计(Camera Pose Estimation)
- 深度估计(Depth Estimation)
- 立体匹配(Stereo Matching)
- 光流估计(Flow Estimation)
- 车道线检测(Lane Detection)
- 轨迹预测(Trajectory Prediction)
- 人群计数(Crowd Counting)
- 对抗样本(Adversarial-Examples)
- 图像检索(Image Retrieval)
- 视频检索(Video Retrieval)
- 跨模态检索(Cross-modal Retrieval)
- Zero-Shot Learning
- 联邦学习(Federated Learning)
- 视频插帧(Video Frame Interpolation)
- 视觉推理(Visual Reasoning)
- 图像合成(Image Synthesis)
- 视图合成(Visual Synthesis)
- 风格迁移(Style Transfer)
- 布局生成(Layout Generation)
- Domain Generalization
- Domain Adaptation
- Open-Set
- Adversarial Attack
- "人-物"交互(HOI)检测
- 阴影去除(Shadow Removal)
- 虚拟试衣(Virtual Try-On)
- 标签噪声(Label Noise)
- 视频稳像(Video Stabilization)
- 数据集(Datasets)
- 其他(Others)
- 待添加(TODO)
- 不确定中没中(Not Sure)
Best Paper
GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields
- Homepage: m-niemeyer.github.io/project-pag…
- Paper(Oral): arxiv.org/abs/2011.12…
- Code: github.com/autonomousv…
- Demo: www.youtube.com/watch?v=fIa…
Backbone
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers
- Paper(Oral): arxiv.org/abs/2106.06…
- Code: github.com/dingmyu/HR-…
BCNet: Searching for Network Width with Bilaterally Coupled Network
- Paper: arxiv.org/abs/2105.10…
- Code: None
Decoupled Dynamic Filter Networks
- Homepage: thefoxofsky.github.io/project_pag…
- Paper: arxiv.org/abs/2104.14…
- Code: github.com/thefoxofsky…
Lite-HRNet: A Lightweight High-Resolution Network
CondenseNet V2: Sparse Feature Reactivation for Deep Networks
- Paper: arxiv.org/abs/2104.04…
- Code: github.com/jianghaojun…
Diverse Branch Block: Building a Convolution as an Inception-like Unit
- Paper: arxiv.org/abs/2103.13…
- Code: github.com/DingXiaoH/D…
Scaling Local Self-Attention For Parameter Efficient Visual Backbones
- Paper(Oral): arxiv.org/abs/2103.12…
- Code: None
ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
- Paper: arxiv.org/abs/2007.00…
- Code: github.com/clovaai/rex…
Involution: Inverting the Inherence of Convolution for Visual Recognition
- Paper: github.com/d-li14/invo…
- Code: arxiv.org/abs/2103.06…
Coordinate Attention for Efficient Mobile Network Design
- Paper: arxiv.org/abs/2103.02…
- Code: github.com/Andrew-Qibi…
Inception Convolution with Efficient Dilation Search
- Paper: arxiv.org/abs/2012.13…
- Code: github.com/yifan123/IC…
RepVGG: Making VGG-style ConvNets Great Again
- Paper: arxiv.org/abs/2101.03…
- Code: github.com/DingXiaoH/R…
NAS
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers
- Paper(Oral): arxiv.org/abs/2106.06…
- Code: github.com/dingmyu/HR-…
BCNet: Searching for Network Width with Bilaterally Coupled Network
- Paper: arxiv.org/abs/2105.10…
- Code: None
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
- Paper: ttps://arxiv.org/abs/2105.10154
- Code: None
Combined Depth Space based Architecture Search For Person Re-identification
- Paper: arxiv.org/abs/2104.04…
- Code: None
DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
- Paper(Oral): arxiv.org/abs/2103.15…
- Code: None
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
- Paper(Oral): None
- Code: github.com/dingmyu/HR-…
Neural Architecture Search with Random Labels
- Paper: arxiv.org/abs/2101.11…
- Code: None
Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search
- Paper: arxiv.org/abs/2101.11…
- Code: None
Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
- Paper: arxiv.org/abs/2105.12…
- Code: None
Prioritized Architecture Sampling with Monto-Carlo Tree Search
- Paper: arxiv.org/abs/2103.11…
- Code: github.com/xiusu/NAS-B…
Contrastive Neural Architecture Search with Neural Architecture Comparators
- Paper: arxiv.org/abs/2103.05…
- Code: github.com/chenyaofo/C…
AttentiveNAS: Improving Neural Architecture Search via Attentive
- Paper: arxiv.org/abs/2011.09…
- Code: None
ReNAS: Relativistic Evaluation of Neural Architecture Search
- Paper: arxiv.org/abs/1910.01…
- Code: None
HourNAS: Extremely Fast Neural Architecture
- Paper: arxiv.org/abs/2005.14…
- Code: None
Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator
- Paper: arxiv.org/abs/2103.07…
- Code: github.com/eric8607242…
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
- Paper: arxiv.org/abs/2103.04…
- Code: github.com/VDIGPKU/OPA…
Inception Convolution with Efficient Dilation Search
- Paper: arxiv.org/abs/2012.13…
- Code: None
GAN
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
- Paper: arxiv.org/abs/2105.09…
- Code: github.com/csjliang/LP…
- Dataset: github.com/csjliang/LP…
DG-Font: Deformable Generative Networks for Unsupervised Font Generation
- Paper: arxiv.org/abs/2104.03…
- Code: github.com/ecnuycxie/D…
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
- Paper: arxiv.org/abs/2105.02…
- Code: github.com/KumapowerLI…
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Paper: arxiv.org/abs/2104.14…
- Code: github.com/naver-ai/St…
- Demo Video: youtu.be/qCapNyRA_Ng
Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
- Paper: arxiv.org/abs/2104.05…
- Code: github.com/PaddlePaddl…
Regularizing Generative Adversarial Networks under Limited Data
- Homepage: hytseng0509.github.io/lecam-gan/
- Paper: faculty.ucmerced.edu/mhyang/pape…
- Code: github.com/google/leca…
Towards Real-World Blind Face Restoration with Generative Facial Prior
- Paper: arxiv.org/abs/2101.04…
- Code: None
TediGAN: Text-Guided Diverse Image Generation and Manipulation
- Homepage: xiaweihao.com/projects/te…
- Paper: arxiv.org/abs/2012.03…
- Code: github.com/weihaox/Ted…
Generative Hierarchical Features from Synthesizing Image
- Homepage: genforce.github.io/ghfeat/
- Paper(Oral): arxiv.org/abs/2007.10…
- Code: github.com/genforce/gh…
Teachers Do More Than Teach: Compressing Image-to-Image Models
- Paper: arxiv.org/abs/2103.03…
- Code: github.com/snap-resear…
HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms
- Paper: arxiv.org/abs/2011.11…
- Code: github.com/mahmoudnafi…
pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
- Homepage: marcoamonteiro.github.io/pi-GAN-webs…
- Paper(Oral): arxiv.org/abs/2012.00…
- Code: None
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
- Paper: arxiv.org/abs/2103.07…
- Code: None
Diverse Semantic Image Synthesis via Probability Distribution Modeling
- Paper: arxiv.org/abs/2103.06…
- Code: github.com/tzt101/INAD…
LOHO: Latent Optimization of Hairstyles via Orthogonalization
- Paper: arxiv.org/abs/2103.03…
- Code: None
PISE: Person Image Synthesis and Editing with Decoupled GAN
- Paper: arxiv.org/abs/2103.04…
- Code: github.com/Zhangjinso/…
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
- Paper: raywzy.com/
- Code: raywzy.com/
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
- Paper: raywzy.com/
- Code: raywzy.com/
Efficient Conditional GAN Transfer with Knowledge Propagation across Classes
- Paper: www.researchgate.net/publication…
- Code: github.com/mshahbazi72…
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Paper: None
- Code: None
Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs
- Paper: arxiv.org/abs/2011.14…
- Code: None
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
- Homepage: eladrich.github.io/pixel2style…
- Paper: arxiv.org/abs/2008.00…
- Code: github.com/eladrich/pi…
A 3D GAN for Improved Large-pose Facial Recognition
- Paper: arxiv.org/abs/2012.10…
- Code: None
HumanGAN: A Generative Model of Humans Images
- Paper: arxiv.org/abs/2103.06…
- Code: None
ID-Unet: Iterative Soft and Hard Deformation for View Synthesis
- Paper: arxiv.org/abs/2103.02…
- Code: github.com/MingyuY/Ite…
CoMoGAN: continuous model-guided image-to-image translation
- Paper(Oral): arxiv.org/abs/2103.06…
- Code: github.com/cv-rits/CoM…
Training Generative Adversarial Networks in One Stage
- Paper: arxiv.org/abs/2103.00…
- Code: None
Closed-Form Factorization of Latent Semantics in GANs
- Homepage: genforce.github.io/sefa/
- Paper(Oral): arxiv.org/abs/2007.06…
- Code: github.com/genforce/se…
Anycost GANs for Interactive Image Synthesis and Editing
- Paper: arxiv.org/abs/2103.03…
- Code: github.com/mit-han-lab…
Image-to-image Translation via Hierarchical Style Disentanglement
- Paper: arxiv.org/abs/2103.01…
- Code: github.com/imlixinyang…
VAE
Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders
- Homepage: taldatech.github.io/soft-intro-…
- Paper: arxiv.org/abs/2012.13…
- Code: github.com/taldatech/s…
Visual Transformer
1. End-to-End Human Pose and Mesh Reconstruction with Transformers
- Paper: arxiv.org/abs/2012.09…
- Code: github.com/microsoft/M…
2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition
- Paper: arxiv.org/abs/2101.06…
- Code: github.com/tobyperrett…
3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain
- Paper: arxiv.org/abs/2103.16…
- Code: github.com/mczhuge/Kal…
4. HOTR: End-to-End Human-Object Interaction Detection with Transformers
- Paper: arxiv.org/abs/2104.13…
- Code: github.com/kakaobrain/…
5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
- Paper: arxiv.org/abs/2104.09…
- Code: github.com/autonomousv…
6. Pose Recognition with Cascade Transformers
- Paper: arxiv.org/abs/2104.06…
- Code: github.com/mlpc-ucsd/P…
7. Variational Transformer Networks for Layout Generation
- Paper: arxiv.org/abs/2104.02…
- Code: None
8. LoFTR: Detector-Free Local Feature Matching with Transformers
- Homepage: zju3dv.github.io/loftr/
- Paper: arxiv.org/abs/2104.00…
- Code: github.com/zju3dv/LoFT…
9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
- Paper: arxiv.org/abs/2012.15…
- Code: github.com/fudan-zvg/S…
10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
- Paper: arxiv.org/abs/2103.16…
- Code: None
11. Transformer Tracking
- Paper: arxiv.org/abs/2103.15…
- Code: github.com/chenxin-dlu…
12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
- Paper(Oral): arxiv.org/abs/2106.06…
- Code: github.com/dingmyu/HR-…
13. MIST: Multiple Instance Spatial Transformer
- Paper: arxiv.org/abs/1811.10…
- Code: None
14. Multimodal Motion Prediction with Stacked Transformers
15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
- Paper: www.amazon.science/publication…
- Code: github.com/amzn/image-…
16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
- Paper(Oral): arxiv.org/abs/2103.11…
- Code: github.com/594422814/T…
17. Pre-Trained Image Processing Transformer
- Paper: arxiv.org/abs/2012.00…
- Code: None
18. End-to-End Video Instance Segmentation with Transformers
- Paper(Oral): arxiv.org/abs/2011.14…
- Code: github.com/Epiphqny/Vi…
19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
- Paper(Oral): arxiv.org/abs/2011.09…
- Code: github.com/dddzg/up-de…
20. End-to-End Human Object Interaction Detection with HOI Transformer
- Paper: arxiv.org/abs/2103.04…
- Code: github.com/bbepoch/Hoi…
21. Transformer Interpretability Beyond Attention Visualization
- Paper: arxiv.org/abs/2012.09…
- Code: github.com/hila-chefer…
22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer
- Paper: None
- Code: None
23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
- Paper: None
- Code: None
24. Line Segment Detection Using Transformers without Edges
- Paper(Oral): arxiv.org/abs/2101.01…
- Code: None
25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
- Paper(Oral): arxiv.org/abs/2101.08…
- Code: github.com/dukebw/SSTV…
27. Facial Action Unit Detection With Transformers
- Paper: None
- Code: None
28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition
- Paper: None
- Code: None
29. Lesion-Aware Transformers for Diabetic Retinopathy Grading
- Paper: None
- Code: None
30. Topological Planning With Transformers for Vision-and-Language Navigation
- Paper: arxiv.org/abs/2012.05…
- Code: None
31. Adaptive Image Transformer for One-Shot Object Detection
- Paper: None
- Code: None
32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos
- Paper: None
- Code: None
33. Taming Transformers for High-Resolution Image Synthesis
- Homepage: compvis.github.io/taming-tran…
- Paper(Oral): arxiv.org/abs/2012.09…
- Code: github.com/CompVis/tam…
34. Self-Supervised Video Hashing via Bidirectional Transformers
- Paper: None
- Code: None
35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos
- Paper(Oral): hehefan.github.io/pdfs/p4tran…
- Code: None
36. Gaussian Context Transformer
- Paper: None
- Code: None
37. General Multi-Label Image Classification With Transformers
- Paper: arxiv.org/abs/2011.14…
- Code: None
38. Bottleneck Transformers for Visual Recognition
- Paper: arxiv.org/abs/2101.11…
- Code: None
39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
- Paper(Oral): arxiv.org/abs/2011.13…
- Code: github.com/YicongHong/…
40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
- Paper(Oral): arxiv.org/abs/2102.06…
- Code: github.com/jayleicn/Cl…
41. Self-attention based Text Knowledge Mining for Text Detection
- Paper: None
- Code: github.com/CVI-SZU/STK…
42. SSAN: Separable Self-Attention Network for Video Representation Learning
- Paper: None
- Code: None
43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones
- Paper(Oral): arxiv.org/abs/2103.12…
- Code: None
Regularization
Regularizing Neural Networks via Adversarial Model Perturbation
- Paper: arxiv.org/abs/2010.04…
- Code: github.com/hiyouga/AMP…
SLAM
Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation
- Paper: arxiv.org/abs/2105.07…
- Code: None
Generalizing to the Open World: Deep Visual Odometry with Online Adaptation
- Paper: arxiv.org/abs/2103.15…
- Code: arxiv.org/abs/2103.15…
长尾分布(Long-Tailed)
Adversarial Robustness under Long-Tailed Distribution
- Paper(Oral): arxiv.org/abs/2104.02…
- Code: github.com/wutong16/Ad…
Distribution Alignment: A Unified Framework for Long-tail Visual Recognition
- Paper: arxiv.org/abs/2103.16…
- Code: github.com/Megvii-Base…
Adaptive Class Suppression Loss for Long-Tail Object Detection
- Paper: arxiv.org/abs/2104.00…
- Code: github.com/CASIA-IVA-L…
Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification
- Paper: arxiv.org/abs/2103.14…
- Code: None
数据增广(Data Augmentation)
Scale-aware Automatic Augmentation for Object Detection
- Paper: arxiv.org/abs/2103.17…
- Code: github.com/Jia-Researc…
无监督/自监督(Un/Self-Supervised)
Domain-Specific Suppression for Adaptive Object Detection
- Paper: arxiv.org/abs/2105.03…
- Code: None
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
- Paper: arxiv.org/abs/2104.14…
- Code: github.com/facebookres…
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
- Paper: arxiv.org/abs/2104.12…
- Code: None
Self-supervised Video Representation Learning by Context and Motion Decoupling
- Paper: arxiv.org/abs/2104.00…
- Code: None
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
- Homepage: fingerrec.github.io/index_files…
- Paper: arxiv.org/abs/2009.05…
- Code: github.com/FingerRec/B…
Spatially Consistent Representation Learning
- Paper: arxiv.org/abs/2103.06…
- Code: None
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
- Paper: arxiv.org/abs/2103.05…
- Code: github.com/tinapan-pt/…
Exploring Simple Siamese Representation Learning
- Paper(Oral): arxiv.org/abs/2011.10…
- Code: None
Dense Contrastive Learning for Self-Supervised Visual Pre-Training
- Paper(Oral): arxiv.org/abs/2011.09…
- Code: github.com/WXinlong/De…
半监督学习(Semi-Supervised )
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
- 作者单位: 阿里巴巴
- Paper: arxiv.org/abs/2103.11…
- Code: None
Adaptive Consistency Regularization for Semi-Supervised Transfer Learning
- Paper: arxiv.org/abs/2103.02…
- Code: github.com/SHI-Labs/Se…
胶囊网络(Capsule Network)
Capsule Network is Not More Robust than Convolutional Network
- Paper: arxiv.org/abs/2103.15…
- Code: None
图像分类(Image Classification)
Correlated Input-Dependent Label Noise in Large-Scale Image Classification
- Paper(Oral): arxiv.org/abs/2105.10…
- Code: github.com/google/unce…
2D目标检测(Object Detection)
2D目标检测
1. Scaled-YOLOv4: Scaling Cross Stage Partial Network
- 作者单位: 中央研究院, 英特尔, 静宜大学
- Paper: arxiv.org/abs/2011.08…
- Code: github.com/WongKinYiu/…
- 中文解读: YOLOv4官方改进版来了!55.8% AP!速度最高达1774 FPS,Scaled-YOLOv4正式开源!
2. You Only Look One-level Feature
- 作者单位: 中科院, 国科大, 旷视科技
- Paper: arxiv.org/abs/2103.09…
- Code: github.com/megvii-mode…
- 中文解读: CVPR 2021 | 没有FPN!中科院&旷视提出YOLOF:你只需看一层特征
3. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
- 作者单位: 香港大学, 同济大学, 字节跳动AI Lab, 加利福尼亚大学伯克利分校
- Paper: arxiv.org/abs/2011.12…
- Code: github.com/PeizeSun/Sp…
- 中文解读: 目标检测新范式!港大同济伯克利提出Sparse R-CNN,代码刚刚开源!
4. End-to-End Object Detection with Fully Convolutional Network
- 作者单位: 旷视科技, 西安交通大学
- Paper: arxiv.org/abs/2012.03…
- Code: github.com/Megvii-Base…
5. Dynamic Head: Unifying Object Detection Heads with Attentions
- 作者单位: 微软
- Paper: arxiv.org/abs/2106.08…
- Code: github.com/microsoft/D…
- 中文解读: 60.6 AP!打破COCO记录!微软提出DyHead:将注意力与目标检测Heads统一
6. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
- 作者单位: 南京理工大学, Momenta, 南京大学, 清华大学
- Paper: arxiv.org/abs/2011.12…
- Code: github.com/implus/GFoc…
- 中文解读:CVPR 2021 | GFLV2:目标检测良心技术,无Cost涨点!
7. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
- 作者单位: 华南理工大学, 腾讯微信AI
- Paper(Oral): arxiv.org/abs/2011.09…
- Code: github.com/dddzg/up-de…
- 中文解读: CVPR 2021 Oral | Transformer再发力!华南理工和微信提出UP-DETR:无监督预训练检测器
8. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators
- 作者单位: 威斯康星大学, 谷歌
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/tensorflow/…
9. Tracking Pedestrian Heads in Dense Crowd
- 作者单位: 雷恩第一大学
- Homepage: project.inria.fr/crowdscienc…
- Paper: openaccess.thecvf.com/content/CVP…
- Code1: github.com/Sentient07/…
- Code2: github.com/Sentient07/…
- Dataset: project.inria.fr/crowdscienc…
10. Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
- 作者单位: 香港科技大学, 华为诺亚
- Paper: arxiv.org/abs/2105.12…
- Code: None
11. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
- 作者单位: A*star, 四川大学, 南洋理工大学
- Paper: arxiv.org/abs/2105.12…
- Code: None
12. IQDet: Instance-wise Quality Distribution Sampling for Object Detection
- 作者单位: 旷视科技
- Paper: arxiv.org/abs/2104.06…
- Code: None
13. Multi-Scale Aligned Distillation for Low-Resolution Detection
- 作者单位: 香港中文大学, Adobe研究院, 思谋科技
- Paper: jiaya.me/papers/ms_a…
- Code: github.com/Jia-Researc…
14. Adaptive Class Suppression Loss for Long-Tail Object Detection
- 作者单位: 中科院, 国科大, ObjectEye, 北京大学, 鹏城实验室, Nexwise
- Paper: arxiv.org/abs/2104.00…
- Code: github.com/CASIA-IVA-L…
15. VarifocalNet: An IoU-aware Dense Object Detector
- 作者单位: 昆士兰科技大学, 昆士兰大学
- Paper(Oral): arxiv.org/abs/2008.13…
- Code: github.com/hyz-xmaster…
16. OTA: Optimal Transport Assignment for Object Detection
- 作者单位: 早稻田大学, 旷视科技
- Paper: arxiv.org/abs/2103.14…
- Code: github.com/Megvii-Base…
17. Distilling Object Detectors via Decoupled Features
- 作者单位: 华为诺亚, 悉尼大学
- Paper: arxiv.org/abs/2103.14…
- Code: github.com/ggjy/DeFeat…
18. Robust and Accurate Object Detection via Adversarial Learning
- 作者单位: 谷歌, UCLA, UCSC
- Paper: arxiv.org/abs/2103.13…
- Code: None
19. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
- 作者单位: 北京大学, Anyvision, 石溪大学
- Paper: arxiv.org/abs/2103.04…
- Code: github.com/VDIGPKU/OPA…
20. Multiple Instance Active Learning for Object Detection
- 作者单位: 国科大, 华为诺亚, 清华大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/yuantn/MI-A…
21. Towards Open World Object Detection
- 作者单位: 印度理工学院, MBZUAI, 澳大利亚国立大学, 林雪平大学
- Paper(Oral): arxiv.org/abs/2103.02…
- Code: github.com/JosephKJ/OW…
22. RankDetNet: Delving Into Ranking Constraints for Object Detection
- 作者单位: 赛灵思
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
旋转目标检测
23. Dense Label Encoding for Boundary Discontinuity Free Rotation Detection
- 作者单位: 上海交通大学, 国科大
- Paper: arxiv.org/abs/2011.09…
- Code1: github.com/Thinklab-SJ…
- Code2: github.com/yangxue0827…
24. ReDet: A Rotation-equivariant Detector for Aerial Object Detection
- 作者单位: 武汉大学
- Paper: arxiv.org/abs/2103.07…
- Code: github.com/csuhan/ReDe…
25. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection
- 作者单位: 国科大, 清华大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/SDL-GuoZong…
Few-Shot目标检测
26. Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss
- 作者单位: 复旦大学, 同济大学, 浙江大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
27. Adaptive Image Transformer for One-Shot Object Detection
- 作者单位: 中央研究院, 台湾AI Labs
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
28. Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection
- 作者单位: 北京大学, 北邮
- Paper: arxiv.org/abs/2103.17…
- Code: github.com/hzhupku/DCN…
29. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
- 作者单位: 卡内基梅隆大学(CMU)
- Paper: arxiv.org/abs/2103.01…
- Code: None
30. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding
- 作者单位: 南加利福尼亚大学, 旷视科技
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/MegviiDetec…
31. Hallucination Improves Few-Shot Object Detection
- 作者单位: 伊利诺伊大学厄巴纳-香槟分校
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/pppplin/Hal…
32. Few-Shot Object Detection via Classification Refinement and Distractor Retreatment
- 作者单位: 新加坡国立大学, SIMTech
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
33. Generalized Few-Shot Object Detection Without Forgetting
- 作者单位: 旷视科技
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
34. Transformation Invariant Few-Shot Object Detection
- 作者单位: 华为诺亚方舟实验室
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
35. UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation
- 作者单位: 不列颠哥伦比亚大学, Vector AI, CIFAR AI Chair
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/ubc-vision/…
36. Beyond Max-Margin: Class Margin Equilibrium for Few-Shot Object Detection
- 作者单位: 国科大, 厦门大学, 鹏城实验室
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/Bohao-Lee/C…
半监督目标检测
37. Points As Queries: Weakly Semi-Supervised Object Detection by Points]
- 作者单位: 旷视科技, 复旦大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
38. Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection
- 作者单位: 清华大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
39. Positive-Unlabeled Data Purification in the Wild for Object Detection
- 作者单位: 华为诺亚方舟实验室, 悉尼大学, 北京大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
40. Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection
- 作者单位: 阿里巴巴, 香港理工大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
41. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
- 作者单位: 阿里巴巴
- Paper: arxiv.org/abs/2103.11…
- Code: None
42. Humble Teachers Teach Better Students for Semi-Supervised Object Detection
- 作者单位: 卡内基梅隆大学(CMU), 亚马逊
- Homepage: yihet.com/humble-teac…
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/lryta/Humbl…
43. Interpolation-Based Semi-Supervised Learning for Object Detection
- 作者单位: 首尔大学, 阿尔托大学等
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/soo89/ISD-S…
域自适应目标检测
44. Domain-Specific Suppression for Adaptive Object Detection
- 作者单位: 中科院, 寒武纪, 国科大
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
45. MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
- 作者单位: 约翰斯·霍普金斯大学, 梅赛德斯—奔驰
- Paper: arxiv.org/abs/2103.04…
- Code: None
46. Unbiased Mean Teacher for Cross-Domain Object Detection
- 作者单位: 电子科技大学, ETH Zurich
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/kinredon/um…
47. I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors
- 作者单位: 香港大学, 厦门大学, Deepwise AI Lab
- Paper: arxiv.org/abs/2103.13…
- Code: None
自监督目标检测
48. There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge
- 作者单位: 弗莱堡大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: rl.uni-freiburg.de/research/mu…
49. Instance Localization for Self-supervised Detection Pretraining
- 作者单位: 香港中文大学, 微软亚洲研究院
- Paper: arxiv.org/abs/2102.08…
- Code: github.com/limbo0000/I…
弱监督目标检测
50. Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection
- 作者单位: 北航, 鹏城实验室, 商汤科技
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
51. DAP: Detection-Aware Pre-training with Weak Supervision
- 作者单位: UIUC, 微软
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
其他
52. Open-Vocabulary Object Detection Using Captions
- 作者单位:Snap, 哥伦比亚大学
- Paper(Oral): openaccess.thecvf.com/content/CVP…
- Code: github.com/alirezazare…
53. Depth From Camera Motion and Object Detection
- 作者单位: 密歇根大学, SIAI
- Paper: arxiv.org/abs/2103.01…
- Code: github.com/griffbr/ODM…
- Dataset: github.com/griffbr/ODM…
54. Unsupervised Object Detection With LIDAR Clues
- 作者单位: 商汤科技, 国科大, 中科大
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
55. GAIA: A Transfer Learning System of Object Detection That Fits Your Needs
- 作者单位: 国科大, 北理, 中科院, 商汤科技
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/GAIA-vision…
56. General Instance Distillation for Object Detection
- 作者单位: 旷视科技, 北航
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
57. AQD: Towards Accurate Quantized Object Detection
- 作者单位: 蒙纳士大学, 阿德莱德大学, 华南理工大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/aim-uofa/mo…
58. Scale-Aware Automatic Augmentation for Object Detection
- 作者单位: 香港中文大学, 字节跳动AI Lab, 思谋科技
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/Jia-Researc…
59. Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection
- 作者单位: 同济大学, 商汤科技, 清华大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/tztztztztz/…
60. Class-Aware Robust Adversarial Training for Object Detection
- 作者单位: 哥伦比亚大学, 中央研究院
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
61. Improved Handling of Motion Blur in Online Object Detection
- 作者单位: 伦敦大学学院
- Homepage: visual.cs.ucl.ac.uk/pubs/handli…
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
62. Multiple Instance Active Learning for Object Detection
- 作者单位: 国科大, 华为诺亚
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/yuantn/MI-A…
63. Neural Auto-Exposure for High-Dynamic Range Object Detection
- 作者单位: Algolux, 普林斯顿大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
64. Generalizable Pedestrian Detection: The Elephant in the Room
- 作者单位: IIAI, 阿尔托大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/hasanirtiza…
65. Neural Auto-Exposure for High-Dynamic Range Object Detection
- 作者单位: Algolux, 普林斯顿大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
单/多目标跟踪(Object Tracking)
单目标跟踪
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
- Paper: arxiv.org/abs/2104.14…
- Code: github.com/researchmm/…
Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark
- Homepage: sites.google.com/view/langtr…
- Paper: arxiv.org/abs/2103.16…
- Evaluation Toolkit: github.com/wangxiao579…
- Demo Video: www.youtube.com/watch?v=7lv…
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
- Paper: arxiv.org/abs/2103.14…
- Code: github.com/VISION-SJTU…
Graph Attention Tracking
- Paper: arxiv.org/abs/2011.11…
- Code: github.com/ohhhyeahhh/…
Rotation Equivariant Siamese Networks for Tracking
- Paper: arxiv.org/abs/2012.13…
- Code: None
Track to Detect and Segment: An Online Multi-Object Tracker
- Homepage: jialianwu.com/projects/Tr…
- Paper: None
- Code: None
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
- Paper(Oral): arxiv.org/abs/2103.11…
- Code: github.com/594422814/T…
Transformer Tracking
- Paper: arxiv.org/abs/2103.15…
- Code: github.com/chenxin-dlu…
多目标跟踪
Tracking Pedestrian Heads in Dense Crowd
- Homepage: project.inria.fr/crowdscienc…
- Paper: openaccess.thecvf.com/content/CVP…
- Code1: github.com/Sentient07/…
- Code2: github.com/Sentient07/…
- Dataset: project.inria.fr/crowdscienc…
Multiple Object Tracking with Correlation Learning
- Paper: arxiv.org/abs/2104.03…
- Code: None
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
- Paper: arxiv.org/abs/2012.02…
- Code: None
Learning a Proposal Classifier for Multiple Object Tracking
- Paper: arxiv.org/abs/2103.07…
- Code: github.com/daip13/LPC_…
Track to Detect and Segment: An Online Multi-Object Tracker
- Homepage: jialianwu.com/projects/Tr…
- Paper: arxiv.org/abs/2103.08…
- Code: github.com/JialianW/Tr…
语义分割(Semantic Segmentation)
1. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation
- 作者单位: Facebook AI, 巴伊兰大学, 特拉维夫大学
- Homepage: nirkin.com/hyperseg/
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/YuvalNirkin…
2. Rethinking BiSeNet For Real-time Semantic Segmentation
- 作者单位: 美团
- Paper: arxiv.org/abs/2104.13…
- Code: github.com/MichaelFan0…
3. Progressive Semantic Segmentation
- 作者单位: VinAI Research, VinUniversity, 阿肯色大学, 石溪大学
- Paper: arxiv.org/abs/2104.03…
- Code: github.com/VinAIResear…
4. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
- 作者单位: 复旦大学, 牛津大学, 萨里大学, 腾讯优图, Facebook AI
- Homepage: fudan-zvg.github.io/SETR
- Paper: arxiv.org/abs/2012.15…
- Code: github.com/fudan-zvg/S…
5. Capturing Omni-Range Context for Omnidirectional Segmentation
- 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司, 华为
- Paper: arxiv.org/abs/2103.05…
- Code: None
6. Learning Statistical Texture for Semantic Segmentation
- 作者单位: 北航, 商汤科技
- Paper: arxiv.org/abs/2103.04…
- Code: None
7. InverseForm: A Loss Function for Structured Boundary-Aware Segmentation
- 作者单位: 高通AI研究院
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
8. DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation
- 作者单位: Joyy Inc, 快手, 北航等
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
弱监督语义分割
9. Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation
- 作者单位: 延世大学, 成均馆大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/halbielee/E…
10. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation
- 作者单位: 延世大学
- Homepage: cvlab.yonsei.ac.kr/projects/BA…
- Paper: arxiv.org/abs/2104.00…
- Code: None
11. Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
- 作者单位: 南京理工大学, MBZUAI, 电子科技大学, 阿德莱德大学, 悉尼科技大学
- Paper: arxiv.org/abs/2103.14…
- Code: github.com/NUST-Machin…
12. Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation
- 作者单位: 北京理工大学, 美团
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/allenwu97/E…
13. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation
- 作者单位: 首尔大学
- Paper: arxiv.org/abs/2103.08…
- Code: github.com/jbeomlee93/…
半监督语义分割
14. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
- 作者单位: 北京大学, 微软亚洲研究院
- Paper: arxiv.org/abs/2106.01…
- Code: github.com/charlesCXK/…
15. Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
- 作者单位: 华为, 大连理工大学, 北京大学
- Paper: arxiv.org/abs/2103.04…
- Code: None
16. Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency
- 作者单位: 香港中文大学, 思谋科技, 牛津大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
17. Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization
- 作者单位: NVIDIA, 多伦多大学, 耶鲁大学, MIT, Vector Institute
- Paper: openaccess.thecvf.com/content/CVP…
- Code: nv-tlabs.github.io/semanticGAN…
18. Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation
- 作者单位: ETH Zurich, 伯恩大学, 鲁汶大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/lhoyer/impr…
域自适应语义分割
19. Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation
- 作者单位: ETH Zurich, 鲁汶大学, 电子科技大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
20. Source-Free Domain Adaptation for Semantic Segmentation
- 作者单位: 华东师范大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
21. Uncertainty Reduction for Model Adaptation in Semantic Segmentation
- 作者单位: Idiap Research Institute, EPFL, 日内瓦大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: git.io/JthPp
22. Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation
- 作者单位: 达姆施塔特工业大学, hessian.AI
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/visinf/da-s…
23. RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
- 作者单位: LG AI研究院, KAIST等
- Paper: arxiv.org/abs/2103.15…
- Code: github.com/shachoi/Rob…
24. Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization
- 作者单位: 香港大学, 深睿医疗
- Paper: arxiv.org/abs/2103.13…
- Code: None
25. MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation
- 作者单位: 香港城市大学, 百度
- Paper: arxiv.org/abs/2103.05…
- Code: github.com/cyang-cityu…
26. Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
- 作者单位: 华为云, 华为诺亚, 大连理工大学
- Paper: arxiv.org/abs/2103.04…
- Code: None
27. Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation
- 作者单位: 中国科学技术大学, 微软亚洲研究院
- Paper: arxiv.org/abs/2101.10…
- Code: github.com/microsoft/P…
28. DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation
- 作者单位: 南卡罗来纳大学, 天远视科技
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/W-zx-Y/DANN…
Few-Shot语义分割
29. Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation
- 作者单位: MBZUAI, IIAI, 哈工大
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
30. Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation
- 作者单位: 国科大, 清华大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/Bibkiller/A…
无监督语义分割
31. PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering
- 作者单位: UT-Austin, 康奈尔大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: https:// github.com/janghyuncho/PiCIE
视频语义分割
32. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
- 作者单位: 浙江大学, 百度, 悉尼科技大学
- Homepage: www.vspwdataset.com/
- Paper: www.vspwdataset.com/CVPR2021__m…
- GitHub: github.com/sssdddwww2/…
其它
33. Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
- 作者单位: 帕多瓦大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: lttm.dei.unipd.it/paper_data/…
34. Exploit Visual Dependency Relations for Semantic Segmentation
- 作者单位: 伊利诺伊大学芝加哥分校
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
35. Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs
- 作者单位: Institute for Infocomm Research, 新加坡国立大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
36. PLOP: Learning without Forgetting for Continual Semantic Segmentation
- 作者单位: 索邦大学, Heuritech, Datakalab, Valeo.ai
- Paper: arxiv.org/abs/2011.11…
- Code: github.com/arthurdouil…
37. 3D-to-2D Distillation for Indoor Scene Parsing
- 作者单位: 香港中文大学, 香港大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
38. Bidirectional Projection Network for Cross Dimension Scene Understanding
- 作者单位: 香港中文大学, 牛津大学等
- Paper(Oral): arxiv.org/abs/2103.14…
- Code: github.com/wbhu/BPNet
39. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
- 作者单位: 北京大学, 中科院, 国科大, ETH Zurich, 商汤科技等
- Paper: openaccess.thecvf.com/content/CVP…
- Code: github.com/lxtGH/PFSeg…
实例分割(Instance Segmentation)
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
- Paper: arxiv.org/abs/2011.09…
- Code: github.com/aliyun/DCT-…
Incremental Few-Shot Instance Segmentation
- Paper: arxiv.org/abs/2105.05…
- Code: github.com/danganea/iM…
A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation
- Paper: arxiv.org/abs/2105.03…
- Code: None
RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features
- Paper: arxiv.org/abs/2104.08…
- Code: github.com/zhanggang00…
Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation
- Paper: arxiv.org/abs/2104.05…
- Code: github.com/tinyalpha/B…
Multi-Scale Aligned Distillation for Low-Resolution Detection
- Paper: jiaya.me/papers/ms_a…
- Code: github.com/Jia-Researc…
Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
- Homepage: bowenc0221.github.io/boundary-io…
- Paper: arxiv.org/abs/2103.16…
- Code: github.com/bowenc0221/…
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers
- Paper: arxiv.org/abs/2103.12…
- Code: github.com/lkeab/BCNet
Zero-shot instance segmentation(Not Sure)
- Paper: None
- Code: github.com/CVPR2021-pa…
视频实例分割
STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
End-to-End Video Instance Segmentation with Transformers
- Paper(Oral): arxiv.org/abs/2011.14…
- Code: github.com/Epiphqny/Vi…
全景分割(Panoptic Segmentation)
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
- Paper: arxiv.org/abs/2012.05…
- Code: github.com/joe-siyuan-…
- Dataset: github.com/joe-siyuan-…
Part-aware Panoptic Segmentation
- Paper: arxiv.org/abs/2106.06…
- Code: github.com/tue-mps/pan…
- Dataset: github.com/tue-mps/pan…
Exemplar-Based Open-Set Panoptic Segmentation Network
- Homepage: cv.snu.ac.kr/research/EO…
- Paper: arxiv.org/abs/2105.08…
- Code: github.com/jd730/EOPSN
MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
Panoptic Segmentation Forecasting
- Paper: arxiv.org/abs/2104.03…
- Code: github.com/nianticlabs…
Fully Convolutional Networks for Panoptic Segmentation
- Paper: arxiv.org/abs/2012.00…
- Code: github.com/yanwei-li/P…
Cross-View Regularization for Domain Adaptive Panoptic Segmentation
- Paper: arxiv.org/abs/2103.02…
- Code: None
医学图像分割
1. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling
- 作者单位: 腾讯天衍实验室, 北京同仁医院
- Paper(Best Paper Candidate): openaccess.thecvf.com/content/CVP…
- Code: github.com/jiwei0921/M…
2. Every Annotation Counts: Multi-Label Deep Supervision for Medical Image Segmentation
- 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司等
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
3. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
- 作者单位: 香港中文大学, 香港理工大学
- Paper: arxiv.org/abs/2103.06…
- Code: github.com/liuquande/F…
4. DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
- 作者单位: 约翰斯·霍普金斯大大学, NVIDIA
- Paper(Oral): arxiv.org/abs/2103.15…
- Code: None
5. DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images
- 作者单位: 斯坦福大学
- Paper: openaccess.thecvf.com/content/CVP…
- Code: None
视频目标分割(Video-Object-Segmentation)
Learning Position and Target Consistency for Memory-based Video Object Segmentation
- Paper: arxiv.org/abs/2104.04…
- Code: None
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
- Paper(Oral): arxiv.org/abs/2101.08…
- Code: github.com/dukebw/SSTV…
交互式视频目标分割(Interactive-Video-Object-Segmentation)
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
- Homepage: hkchengrex.github.io/MiVOS/
- Paper: arxiv.org/abs/2103.07…
- Code: github.com/hkchengrex/…
- Demo: hkchengrex.github.io/MiVOS/video…
Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
- Paper: arxiv.org/abs/2103.10…
- Code: github.com/svip-lab/IV…
显著性检测(Saliency Detection)
Uncertainty-aware Joint Salient Object and Camouflaged Object Detection
- Paper: arxiv.org/abs/2104.02…
- Code: github.com/JingZhang61…
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion
- Paper(Oral): arxiv.org/abs/2103.11…
- Code: github.com/sunpeng1996…
伪装物体检测(Camouflaged Object Detection)
Uncertainty-aware Joint Salient Object and Camouflaged Object Detection
- Paper: arxiv.org/abs/2104.02…
- Code: github.com/JingZhang61…
协同显著性检测(Co-Salient Object Detection)
Group Collaborative Learning for Co-Salient Object Detection
- Paper: arxiv.org/abs/2104.01…
- Code: github.com/fanq15/GCoN…
协同显著性检测(Image Matting)
Semantic Image Matting
- Paper: arxiv.org/abs/2104.08…
- Code: github.com/nowsyn/SIM
- Dataset: github.com/nowsyn/SIM
行人重识别(Person Re-identification)
Generalizable Person Re-identification with Relevance-aware Mixture of Experts
- Paper: arxiv.org/abs/2105.09…
- Code: None
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
- Paper: arxiv.org/abs/2104.12…
- Code: None
Combined Depth Space based Architecture Search For Person Re-identification
- Paper: arxiv.org/abs/2104.04…
- Code: None
行人搜索(Person Search)
Anchor-Free Person Search
- Paper: arxiv.org/abs/2103.11…
- Code: github.com/daodaofr/Al…
- Interpretation: 首个无需锚框(Anchor-Free)的行人搜索框架 | CVPR 2021
视频理解/行为识别(Video Understanding)
Temporal-Relational CrossTransformers for Few-Shot Action Recognition
- Paper: arxiv.org/abs/2101.06…
- Code: github.com/tobyperrett…
FrameExit: Conditional Early Exiting for Efficient Video Recognition
- Paper(Oral): arxiv.org/abs/2104.13…
- Code: None
No frame left behind: Full Video Action Recognition
- Paper: arxiv.org/abs/2103.15…
- Code: None
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
- Paper: arxiv.org/abs/2103.13…
- Code: None
Temporal Context Aggregation Network for Temporal Action Proposal Refinement
- Paper: arxiv.org/abs/2103.13…
- Code: None
- Interpretation: CVPR 2021 | TCANet:最强时序动作提名修正网络
ACTION-Net: Multipath Excitation for Action Recognition
- Paper: arxiv.org/abs/2103.07…
- Code: github.com/V-Sense/ACT…
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
- Homepage: fingerrec.github.io/index_files…
- Paper: arxiv.org/abs/2009.05…
- Code: github.com/FingerRec/B…
TDN: Temporal Difference Networks for Efficient Action Recognition
- Paper: arxiv.org/abs/2012.10…
- Code: github.com/MCG-NJU/TDN
人脸识别(Face Recognition)
A 3D GAN for Improved Large-pose Facial Recognition
- Paper: arxiv.org/abs/2012.10…
- Code: None
MagFace: A Universal Representation for Face Recognition and Quality Assessment
- Paper(Oral): arxiv.org/abs/2103.06…
- Code: github.com/IrvingMeng/…
WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition
- Homepage: www.face-benchmark.org/
- Paper: arxiv.org/abs/2103.04…
- Dataset: www.face-benchmark.org/
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
- Paper(Oral): arxiv.org/abs/2103.01…
- Code: github.com/Hzzone/MTLF…
- Dataset: github.com/Hzzone/MTLF…
人脸检测(Face Detection)
HLA-Face: Joint High-Low Adaptation for Low Light Face Detection
- Homepage: daooshee.github.io/HLA-Face-We…
- Paper: arxiv.org/abs/2104.01…
- Code: github.com/daooshee/HL…
CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
- Paper: arxiv.org/abs/2103.07…
- Code: None
人脸活体检测(Face Anti-Spoofing)
Cross Modal Focal Loss for RGBD Face Anti-Spoofing
- Paper: arxiv.org/abs/2103.00…
- Code: None
Deepfake检测(Deepfake Detection)
Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain
- Paper:arxiv.org/abs/2103.01…
- Code: None
Multi-attentional Deepfake Detection
- Paper:arxiv.org/abs/2103.02…
- Code: None
人脸年龄估计(Age Estimation)
Continuous Face Aging via Self-estimated Residual Age Embedding
- Paper: arxiv.org/abs/2105.00…
- Code: None
PML: Progressive Margin Loss for Long-tailed Age Classification
- Paper: arxiv.org/abs/2103.02…
- Code: None
人脸表情识别(Facial Expression Recognition)
Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition
- Paper: arxiv.org/abs/2103.13…
- Code: None
Deepfakes
MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes
- Paper: arxiv.org/abs/2103.14…
- Code: None
人体解析(Human Parsing)
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
- Paper: arxiv.org/abs/2103.04…
- Code: github.com/tfzhou/MG-H…
2D/3D人体姿态估计(2D/3D Human Pose Estimation)
2D 人体姿态估计
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
- Paper: ttps://arxiv.org/abs/2105.10154
- Code: None
When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks
- Paper: arxiv.org/abs/2105.06…
- Code: None
Pose Recognition with Cascade Transformers
- Paper: arxiv.org/abs/2104.06…
- Code: github.com/mlpc-ucsd/P…
DCPose: Deep Dual Consecutive Network for Human Pose Estimation
- Paper: arxiv.org/abs/2103.07…
- Code: github.com/Pose-Group/…
3D 人体姿态估计
End-to-End Human Pose and Mesh Reconstruction with Transformers
- Paper: arxiv.org/abs/2012.09…
- Code: github.com/microsoft/M…
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation
- Paper(Oral): arxiv.org/abs/2105.02…
- Code: github.com/jfzhang95/P…
Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration
- Paper: arxiv.org/abs/2103.02…
- Code: github.com/SeanChenxy/…
Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks
HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation
- Homepage: jeffli.site/HybrIK/
- Paper: arxiv.org/abs/2011.14…
- Code: github.com/Jeff-sjtu/H…
动物姿态估计(Animal Pose Estimation)
From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation
- Paper: arxiv.org/abs/2103.14…
- Code: None
手部姿态估计(Hand Pose Estimation)
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time
- Homepage: stevenlsw.github.io/Semi-Hand-O…
- Paper: arxiv.org/abs/2106.05…
- Code: github.com/stevenlsw/S…
Human Volumetric Capture
POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture
- Homepage: www.liuyebin.com/posefusion/…
- Paper(Oral): arxiv.org/abs/2103.15…
- Code: None
场景文本检测(Scene Text Detection)
Fourier Contour Embedding for Arbitrary-Shaped Text Detection
- Paper: arxiv.org/abs/2104.10…
- Code: None
场景文本识别(Scene Text Recognition)
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
- Paper: arxiv.org/abs/2103.06…
- Code: github.com/FangShanche…
图像压缩
Checkerboard Context Model for Efficient Learned Image Compression
- Paper: arxiv.org/abs/2103.15…
- Code: None
Slimmable Compressive Autoencoders for Practical Neural Image Compression
- Paper: arxiv.org/abs/2103.15…
- Code: None
Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton
- Paper: arxiv.org/abs/2103.15…
- Code: None
模型压缩/剪枝/量化
Teachers Do More Than Teach: Compressing Image-to-Image Models
- Paper: arxiv.org/abs/2103.03…
- Code: github.com/snap-resear…
模型剪枝
Dynamic Slimmable Network
- Paper: arxiv.org/abs/2103.13…
- Code: github.com/changlin31/…
模型量化
Network Quantization with Element-wise Gradient Scaling
- Paper: arxiv.org/abs/2104.00…
- Code: None
Zero-shot Adversarial Quantization
- Paper(Oral): arxiv.org/abs/2103.15…
- Code: git.io/Jqc0y
Learnable Companding Quantization for Accurate Low-bit Neural Networks
- Paper: arxiv.org/abs/2103.07…
- Code: None
知识蒸馏(Knowledge Distillation)
Distilling Knowledge via Knowledge Review
- Paper: arxiv.org/abs/2104.09…
- Code: github.com/Jia-Researc…
Distilling Object Detectors via Decoupled Features
- Paper: arxiv.org/abs/2103.14…
- Code: github.com/ggjy/DeFeat…
超分辨率(Super-Resolution)
Image Super-Resolution with Non-Local Sparse Attention
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
- Homepage: mepro.bjtu.edu.cn/resource.ht…
- Paper: arxiv.org/abs/2104.06…
- Code: None
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
- Paper: arxiv.org/abs/2103.04…
- Code: github.com/Xiangtaokon…
AdderSR: Towards Energy Efficient Image Super-Resolution
- Paper: arxiv.org/abs/2009.08…
- Code: None
去雾(Dehazing)
Contrastive Learning for Compact Single Image Dehazing
- Paper: arxiv.org/abs/2104.09…
- Code: github.com/GlassyWu/AE…
视频超分辨率
Temporal Modulation Network for Controllable Space-Time Video Super-Resolution
- Paper: None
- Code: github.com/CS-GangXu/T…
图像恢复(Image Restoration)
Multi-Stage Progressive Image Restoration
- Paper: arxiv.org/abs/2102.02…
- Code: github.com/swz30/MPRNe…
图像补全(Image Inpainting)
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
- Paper: arxiv.org/abs/2105.02…
- Code: github.com/KumapowerLI…
TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations
- Homepage: yzhouas.github.io/projects/Tr…
- Paper: arxiv.org/abs/2103.15…
- Code: None
图像编辑(Image Editing)
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Paper: arxiv.org/abs/2104.14…
- Code: github.com/naver-ai/St…
- Demo Video: youtu.be/qCapNyRA_Ng
High-Fidelity and Arbitrary Face Editing
- Paper: arxiv.org/abs/2103.15…
- Code: None
Anycost GANs for Interactive Image Synthesis and Editing
- Paper: arxiv.org/abs/2103.03…
- Code: github.com/mit-han-lab…
PISE: Person Image Synthesis and Editing with Decoupled GAN
- Paper: arxiv.org/abs/2103.04…
- Code: github.com/Zhangjinso/…
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
- Paper: raywzy.com/
- Code: raywzy.com/
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Paper: None
- Code: None
图像描述(Image Captioning)
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
- Paper: arxiv.org/abs/2105.03…
- Code: None
字体生成(Font Generation)
DG-Font: Deformable Generative Networks for Unsupervised Font Generation
- Paper: arxiv.org/abs/2104.03…
- Code: github.com/ecnuycxie/D…
图像匹配(Image Matcing)
LoFTR: Detector-Free Local Feature Matching with Transformers
- Homepage: zju3dv.github.io/loftr/
- Paper: arxiv.org/abs/2104.00…
- Code: github.com/zju3dv/LoFT…
Convolutional Hough Matching Networks
- Homapage: cvlab.postech.ac.kr/research/CH…
- Paper(Oral): arxiv.org/abs/2103.16…
- Code: None
图像融合(Image Blending)
Bridging the Visual Gap: Wide-Range Image Blending
- Paper: arxiv.org/abs/2103.15…
- Code: github.com/julia0607/W…
反光去除(Reflection Removal)
Robust Reflection Removal with Reflection-free Flash-only Cues
- Paper: arxiv.org/abs/2103.04…
- Code: github.com/ChenyangLEI…
3D点云分类(3D Point Clouds Classification)
Equivariant Point Network for 3D Point Cloud Analysis
- Paper: arxiv.org/abs/2103.14…
- Code: None
PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
- Paper: arxiv.org/abs/2103.14…
- Code: github.com/CVMI-Lab/PA…
3D目标检测(3D Object Detection)
3D-MAN: 3D Multi-frame Attention Network for Object Detection
- Paper: arxiv.org/abs/2103.16…
- Code: None
Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds
- Paper: arxiv.org/abs/2104.06…
- Code: github.com/cheng052/BR…
HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection
- Homepage: cvlab.yonsei.ac.kr/projects/HV…
- Paper: arxiv.org/abs/2104.00…
- Code: github.com/cvlab-yonse…
LiDAR R-CNN: An Efficient and Universal 3D Object Detector
- Paper: arxiv.org/abs/2103.15…
- Code: github.com/tusimple/Li…
M3DSSD: Monocular 3D Single Stage Object Detector
- Paper: arxiv.org/abs/2103.13…
- Code: github.com/mumianyuxin…
SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud
- Paper: None
- Code: github.com/Vegeta2020/…
Center-based 3D Object Detection and Tracking
- Paper: arxiv.org/abs/2006.11…
- Code: github.com/tianweiy/Ce…
Categorical Depth Distribution Network for Monocular 3D Object Detection
- Paper: arxiv.org/abs/2103.01…
- Code: None
3D语义分割(3D Semantic Segmentation)
Bidirectional Projection Network for Cross Dimension Scene Understanding
- Paper(Oral): arxiv.org/abs/2103.14…
- Code: github.com/wbhu/BPNet
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
- Paper: arxiv.org/abs/2103.07…
- Code: github.com/ShiQiu0419/…
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
- Paper: arxiv.org/abs/2011.10…
- Code: github.com/xinge008/Cy…
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
- Homepage: github.com/QingyongHu/…
- Paper: arxiv.org/abs/2009.03…
- Code: github.com/QingyongHu/…
- Dataset: github.com/QingyongHu/…
3D全景分割(3D Panoptic Segmentation)
Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation
- Paper: arxiv.org/abs/2103.14…
- Code: github.com/edwardzhou1…
3D目标跟踪(3D Object Trancking)
Center-based 3D Object Detection and Tracking
- Paper: arxiv.org/abs/2006.11…
- Code: github.com/tianweiy/Ce…
3D点云配准(3D Point Cloud Registration)
ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning
- Paper: arxiv.org/abs/2103.15…
- Code: None
PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency
- Paper: arxiv.org/abs/2103.05…
- Code: github.com/XuyangBai/P…
PREDATOR: Registration of 3D Point Clouds with Low Overlap
- Paper: arxiv.org/abs/2011.13…
- Code: github.com/ShengyuH/Ov…
3D点云补全(3D Point Cloud Completion)
Unsupervised 3D Shape Completion through GAN Inversion
- Homepage: junzhezhang.github.io/projects/Sh…
- Paper: arxiv.org/abs/2104.13…
- Code: github.com/junzhezhang…
Variational Relational Point Completion Network
- Homepage: paul007pl.github.io/projects/VR…
- Paper: arxiv.org/abs/2104.10…
- Code: github.com/paul007pl/V…
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion
- Homepage: alphapav.github.io/SpareNet/
- Paper: arxiv.org/abs/2103.02…
- Code: github.com/microsoft/S…
3D重建(3D Reconstruction)
Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection
- Paper: arxiv.org/abs/2106.07…
- Code: github.com/TencentYout…
Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
- Paper: arxiv.org/abs/2104.00…
- Code: None
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
- Homepage: zju3dv.github.io/neuralrecon…
- Paper(Oral): arxiv.org/abs/2104.00…
- Code: github.com/zju3dv/Neur…
6D位姿估计(6D Pose Estimation)
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
- Paper(Oral): arxiv.org/abs/2103.07…
- Code: github.com/DC1991/FS-N…
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation
- Paper: arxiv.org/abs/2102.12…
- code: git.io/GDR-Net
FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
- Paper: arxiv.org/abs/2103.02…
- Code: github.com/ethnhe/FFB6…
相机姿态估计
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
- Paper: arxiv.org/abs/2103.09…
- Code: github.com/cvg/pixloc
深度估计(Depth Estimation)
S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
- Paper(Oral): arxiv.org/abs/2104.00…
- Code: None
Beyond Image to Depth: Improving Depth Prediction using Echoes
- Homepage: krantiparida.github.io/projects/bi…
- Paper: arxiv.org/abs/2103.08…
- Code: github.com/krantiparid…
S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation
- Paper: arxiv.org/abs/2103.02…
- Code: None
Depth from Camera Motion and Object Detection
- Paper: arxiv.org/abs/2103.01…
- Code: github.com/griffbr/ODM…
- Dataset: github.com/griffbr/ODM…
立体匹配(Stereo Matching)
A Decomposition Model for Stereo Matching
- Paper: arxiv.org/abs/2104.07…
- Code: None
光流估计(Flow Estimation)
Self-Supervised Multi-Frame Monocular Scene Flow
- Paper: arxiv.org/abs/2105.02…
- Code: github.com/visinf/mult…
RAFT-3D: Scene Flow using Rigid-Motion Embeddings
- Paper: arxiv.org/abs/2012.00…
- Code: None
Learning Optical Flow From Still Images
- Homepage: mattpoggi.github.io/projects/cv…
- Paper: mattpoggi.github.io/assets/pape…
- Code: github.com/mattpoggi/d…
FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
- Paper: arxiv.org/abs/2104.00…
- Code: None
车道线检测(Lane Detection)
Focus on Local: Detecting Lane Marker from Bottom Up via Key Point
- Paper: arxiv.org/abs/2105.13…
- Code: None
Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection
- Paper: arxiv.org/abs/2010.12…
- Code: github.com/lucastabeli…
轨迹预测(Trajectory Prediction)
Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction
- Paper(Oral): arxiv.org/abs/2104.08…
- Code: None
人群计数(Crowd Counting)
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
- Paper: arxiv.org/abs/2105.02…
- Code: github.com/VisDrone/Dr…
- Dataset: github.com/VisDrone/Dr…
对抗样本(Adversarial Examples)
Enhancing the Transferability of Adversarial Attacks through Variance Tuning
- Paper: arxiv.org/abs/2103.15…
- Code: github.com/JHL-HUST/VT
LiBRe: A Practical Bayesian Approach to Adversarial Detection
- Paper: arxiv.org/abs/2103.14…
- Code: None
Natural Adversarial Examples
- Paper: arxiv.org/abs/1907.07…
- Code: github.com/hendrycks/n…
图像检索(Image Retrieval)
StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval
- Paper: arxiv.org/abs/2103.15…
- COde: None
QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval
- Paper: arxiv.org/abs/2103.02…
- Code: None
视频检索(Video Retrieval)
On Semantic Similarity in Video Retrieval
- Paper: arxiv.org/abs/2103.10…
- Homepage: mwray.github.io/SSVR/
- Code: github.com/mwray/Seman…
跨模态检索(Cross-modal Retrieval)
Cross-Modal Center Loss for 3D Cross-Modal Retrieval
- Paper: arxiv.org/abs/2008.03…
- Code: github.com/LongLong-Ji…
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
- Paper: arxiv.org/abs/2103.16…
- Code: None
Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
- Paper: www.amazon.science/publication…
- Code: github.com/amzn/image-…
Zero-Shot Learning
Counterfactual Zero-Shot and Open-Set Visual Recognition
- Paper: arxiv.org/abs/2103.00…
- Code: github.com/yue-zhongqi…
联邦学习(Federated Learning)
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
- Paper: arxiv.org/abs/2103.06…
- Code: github.com/liuquande/F…
视频插帧(Video Frame Interpolation)
CDFI: Compression-Driven Network Design for Frame Interpolation
- Paper: None
- Code: github.com/tding1/CDFI
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
- Homepage: tarun005.github.io/FLAVR/
- Paper: arxiv.org/abs/2012.08…
- Code: github.com/tarun005/FL…
视觉推理(Visual Reasoning)
Transformation Driven Visual Reasoning
- homepage: hongxin2019.github.io/TVR/
- Paper: arxiv.org/abs/2011.13…
- Code: github.com/hughplay/TV…
图像合成(Image Synthesis)
GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields
- Homepage: m-niemeyer.github.io/project-pag…
- Paper(Oral): arxiv.org/abs/2011.12…
- Code: github.com/autonomousv…
- Demo: www.youtube.com/watch?v=fIa…
Taming Transformers for High-Resolution Image Synthesis
- Homepage: compvis.github.io/taming-tran…
- Paper(Oral): arxiv.org/abs/2012.09…
- Code: github.com/CompVis/tam…
视图合成(View Synthesis)
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
- Homepage: virtualhumans.mpi-inf.mpg.de/srf/
- Paper: arxiv.org/abs/2104.06…
Self-Supervised Visibility Learning for Novel View Synthesis
- Paper: arxiv.org/abs/2103.15…
- Code: None
NeX: Real-time View Synthesis with Neural Basis Expansion
- Homepage: nex-mpi.github.io/
- Paper(Oral): arxiv.org/abs/2103.05…
风格迁移(Style Transfer)
Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
- Paper: arxiv.org/abs/2104.05…
- Code: github.com/PaddlePaddl…
布局生成(Layout Generation)
LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
- Paper: None
- Code: None
Variational Transformer Networks for Layout Generation
- Paper: arxiv.org/abs/2104.02…
- Code: None
Domain Generalization
Generalizable Person Re-identification with Relevance-aware Mixture of Experts
- Paper: arxiv.org/abs/2105.09…
- Code: None
RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
- Paper: arxiv.org/abs/2103.15…
- Code: github.com/shachoi/Rob…
Adaptive Methods for Real-World Domain Generalization
- Paper: arxiv.org/abs/2103.15…
- Code: None
FSDR: Frequency Space Domain Randomization for Domain Generalization
- Paper: arxiv.org/abs/2103.02…
- Code: None
Domain Adaptation
Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation
- Paper: arxiv.org/abs/2104.00…
- Code: None
Domain Consensus Clustering for Universal Domain Adaptation
- Paper: reler.net/papers/guan…
- Code: github.com/Solacex/Dom…
Open-Set
Towards Open World Object Detection
- Paper(Oral): arxiv.org/abs/2103.02…
- Code: github.com/JosephKJ/OW…
Exemplar-Based Open-Set Panoptic Segmentation Network
- Homepage: cv.snu.ac.kr/research/EO…
- Paper: arxiv.org/abs/2105.08…
- Code: github.com/jd730/EOPSN
Learning Placeholders for Open-Set Recognition
- Paper(Oral): arxiv.org/abs/2103.15…
- Code: None
Adversarial Attack
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
- Paper: arxiv.org/abs/2103.14…
- Code: github.com/VISION-SJTU…
"人-物"交互(HOI)检测
HOTR: End-to-End Human-Object Interaction Detection with Transformers
- Paper: arxiv.org/abs/2104.13…
- Code: None
Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information
- Paper: arxiv.org/abs/2103.05…
- Code: github.com/hitachi-rd-…
Reformulating HOI Detection as Adaptive Set Prediction
- Paper: arxiv.org/abs/2103.05…
- Code: github.com/yoyomimi/AS…
Detecting Human-Object Interaction via Fabricated Compositional Learning
- Paper: arxiv.org/abs/2103.08…
- Code: github.com/zhihou7/FCL
End-to-End Human Object Interaction Detection with HOI Transformer
- Paper: arxiv.org/abs/2103.04…
- Code: github.com/bbepoch/Hoi…
阴影去除(Shadow Removal)
Auto-Exposure Fusion for Single-Image Shadow Removal
- Paper: arxiv.org/abs/2103.01…
- Code: github.com/tsingqguo/e…
虚拟换衣(Virtual Try-On)
Parser-Free Virtual Try-on via Distilling Appearance Flows
基于外观流蒸馏的无需人体解析的虚拟换装
- Paper: arxiv.org/abs/2103.04…
- Code: github.com/geyuying/PF…
标签噪声(Label Noise)
A Second-Order Approach to Learning with Instance-Dependent Label Noise
- Paper(Oral): arxiv.org/abs/2012.11…
- Code: github.com/UCSC-REAL/C…
视频稳像(Video Stabilization)
Real-Time Selfie Video Stabilization
数据集(Datasets)
Tracking Pedestrian Heads in Dense Crowd
- Homepage: project.inria.fr/crowdscienc…
- Paper: openaccess.thecvf.com/content/CVP…
- Code1: github.com/Sentient07/…
- Code2: github.com/Sentient07/…
- Dataset: project.inria.fr/crowdscienc…
Part-aware Panoptic Segmentation
- Paper: arxiv.org/abs/2106.06…
- Code: github.com/tue-mps/pan…
- Dataset: github.com/tue-mps/pan…
Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
- Homepage: www.yasamin.page/hdnet_tikto…
- Paper(Oral): arxiv.org/abs/2103.03…
- Code: github.com/yasaminjafa…
- Dataset: www.yasamin.page/hdnet_tikto…
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
- Paper: arxiv.org/abs/2105.09…
- Code: github.com/csjliang/LP…
- Dataset: github.com/csjliang/LP…
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
- Paper: arxiv.org/abs/2105.02…
- Code: github.com/VisDrone/Dr…
- Dataset: github.com/VisDrone/Dr…
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
- Homepage: fidler-lab.github.io/efficient-a…
- Paper(Oral): arxiv.org/abs/2104.12…
- Code: github.com/fidler-lab/…
论文下载链接:
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
- Paper: arxiv.org/abs/2012.05…
- Code: github.com/joe-siyuan-…
- Dataset: github.com/joe-siyuan-…
Learning To Count Everything
- Paper: arxiv.org/abs/2104.08…
- Code: github.com/cvlab-stony…
- Dataset: github.com/cvlab-stony…
Semantic Image Matting
- Paper: arxiv.org/abs/2104.08…
- Code: github.com/nowsyn/SIM
- Dataset: github.com/nowsyn/SIM
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
- Homepage: mepro.bjtu.edu.cn/resource.ht…
- Paper: arxiv.org/abs/2104.06…
- Code: None
Visual Semantic Role Labeling for Video Understanding
- Homepage: vidsitu.org/
- Paper: arxiv.org/abs/2104.00…
- Code: github.com/TheShadow29…
- Dataset: github.com/TheShadow29…
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
- Homepage: www.vspwdataset.com/
- Paper: www.vspwdataset.com/CVPR2021__m…
- GitHub: github.com/sssdddwww2/…
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
- Homepage: vap.aau.dk/sewer-ml/
- Paper: arxiv.org/abs/2103.10…
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
- Homepage: vap.aau.dk/sewer-ml/
- Paper: arxiv.org/abs/2103.10…
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
- Paper: arxiv.org/abs/2103.03…
- Dataset: None
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
- Homepage: github.com/QingyongHu/…
- Paper: arxiv.org/abs/2009.03…
- Code: github.com/QingyongHu/…
- Dataset: github.com/QingyongHu/…
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
- Paper(Oral): arxiv.org/abs/2103.01…
- Code: github.com/Hzzone/MTLF…
- Dataset: github.com/Hzzone/MTLF…
Depth from Camera Motion and Object Detection
- Paper: arxiv.org/abs/2103.01…
- Code: github.com/griffbr/ODM…
- Dataset: github.com/griffbr/ODM…
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Homepage: rl.uni-freiburg.de/research/mu…
- Paper: arxiv.org/abs/2103.01…
- Code: rl.uni-freiburg.de/research/mu…
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
- Paper: arxiv.org/abs/2012.02…
- Code: github.com/daveredrum/…
- Dataset: github.com/daveredrum/…
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Paper: arxiv.org/abs/2103.01…
- Code: rl.uni-freiburg.de/research/mu…
- Dataset: rl.uni-freiburg.de/research/mu…
其他(Others)
Fast and Accurate Model Scaling
Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
- Homepage: www.yasamin.page/hdnet_tikto…
- Paper(Oral): arxiv.org/abs/2103.03…
- Code: github.com/yasaminjafa…
- Dataset: www.yasamin.page/hdnet_tikto…
Omnimatte: Associating Objects and Their Effects in Video
- Homepage: omnimatte.github.io/
- Paper(Oral): arxiv.org/abs/2105.06…
- Code: omnimatte.github.io/#code
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
- Homepage: fidler-lab.github.io/efficient-a…
- Paper(Oral): arxiv.org/abs/2104.12…
- Code: github.com/fidler-lab/…
Motion Representations for Articulated Animation
- Paper: arxiv.org/abs/2104.11…
- Code: github.com/snap-resear…
Deep Lucas-Kanade Homography for Multimodal Image Alignment
- Paper: arxiv.org/abs/2104.11…
- Code: github.com/placeforyim…
Skip-Convolutions for Efficient Video Processing
- Paper: arxiv.org/abs/2104.11…
- Code: None
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control
- Homepage: tomasjakab.github.io/KeypointDef…
- Paper(Oral): arxiv.org/abs/2104.11…
- Code: github.com/tomasjakab/…
Learning To Count Everything
- Paper: arxiv.org/abs/2104.08…
- Code: github.com/cvlab-stony…
- Dataset: github.com/cvlab-stony…
SOLD2: Self-supervised Occlusion-aware Line Description and Detection
- Paper(Oral): arxiv.org/abs/2104.03…
- Code: github.com/cvg/SOLD2
Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
- Homepage: li-wanhua.github.io/POEs/
- Paper: arxiv.org/abs/2103.13…
- Code: github.com/Li-Wanhua/P…
LEAP: Learning Articulated Occupancy of People
- Paper: arxiv.org/abs/2104.06…
- Code: None
Visual Semantic Role Labeling for Video Understanding
- Homepage: vidsitu.org/
- Paper: arxiv.org/abs/2104.00…
- Code: github.com/TheShadow29…
- Dataset: github.com/TheShadow29…
UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
- Paper: arxiv.org/abs/2104.00…
- Code: github.com/SUTDCV/UAV-…
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning
- Paper(Oral): arxiv.org/abs/2104.00…
- Code: None
Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
- Paper: arxiv.org/abs/2104.00…
- Code: None
Towards High Fidelity Face Relighting with Realistic Shadows
- Paper: arxiv.org/abs/2104.00…
- Code: None
BRepNet: A topological message passing system for solid models
- Paper(Oral): arxiv.org/abs/2104.00…
- Code: None
Visually Informed Binaural Audio Generation without Binaural Audios
- Homepage: sheldontsui.github.io/projects/Ps…
- Paper: None
- GitHub: github.com/SheldonTsui…
- Demo: www.youtube.com/watch?v=r-u…
Exploring intermediate representation for monocular vehicle pose estimation
- Paper: None
- Code: github.com/Nicholasli1…
Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB
- Paper(Oral): arxiv.org/abs/2103.14…
- Code: None
Invertible Image Signal Processing
- Paper: arxiv.org/abs/2103.15…
- Code: github.com/yzxing87/In…
Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling
- Paper: arxiv.org/abs/2103.14…
- Code: None
SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences
- Paper: arxiv.org/abs/2103.14…
- Code: None
Embedding Transfer with Label Relaxation for Improved Metric Learning
- Paper: arxiv.org/abs/2103.14…
- Code: None
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes
- Paper: arxiv.org/abs/2103.15…
- Code: github.com/hlei-ziyan/…
Meta-Mining Discriminative Samples for Kinship Verification
- Paper: arxiv.org/abs/2103.15…
- Code: None
Cloud2Curve: Generation and Vectorization of Parametric Sketches
- Paper: arxiv.org/abs/2103.15…
- Code: None
TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
- Paper: arxiv.org/abs/2103.15…
- Code: github.com/SUTDCV/SUTD…
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
- Homepage: wellyzhang.github.io/project/pra…
- Paper: arxiv.org/abs/2103.14…
- Code: None
ACRE: Abstract Causal REasoning Beyond Covariation
- Homepage: wellyzhang.github.io/project/acr…
- Paper: arxiv.org/abs/2103.14…
- Code: None
Confluent Vessel Trees with Accurate Bifurcations
- Paper: arxiv.org/abs/2103.14…
- Code: None
Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling
- Paper: arxiv.org/abs/2103.14…
- Code: github.com/HuangZhiCha…
Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks
- Homepage: paschalidoud.github.io/neural_part…
- Paper: None
- Code: github.com/paschalidou…
Knowledge Evolution in Neural Networks
- Paper(Oral): arxiv.org/abs/2103.05…
- Code: github.com/ahmdtaha/kn…
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
- Paper: arxiv.org/abs/2103.02…
- Code: github.com/guopengf/FL…
SGP: Self-supervised Geometric Perception
- Oral
- Paper: arxiv.org/abs/2103.03…
- Code: github.com/theNded/SGP
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
- Paper: arxiv.org/abs/2103.02…
- Code: github.com/guopengf/FL…
Diffusion Probabilistic Models for 3D Point Cloud Generation
- Paper: arxiv.org/abs/2103.01…
- Code: github.com/luost26/dif…
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
- Paper: arxiv.org/abs/2012.02…
- Code: github.com/daveredrum/…
- Dataset: github.com/daveredrum/…
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Paper: arxiv.org/abs/2103.01…
- Code: rl.uni-freiburg.de/research/mu…
- Dataset: rl.uni-freiburg.de/research/mu…
待添加(TODO)
不确定中没中(Not Sure)
CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models
- Paper: none
- Code: github.com/transcenden…
Toward Explainable Reflection Removal with Distilling and Model Uncertainty
- Paper: none
- Code: github.com/ytpeng-aiml…
DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation
- Paper: none
- Code: github.com/lhaippp/Dee…
Exploring Adversarial Fake Images on Face Manifold
- Paper: none
- Code: github.com/ldz666666/S…
Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task
- Paper: none
- Code: github.com/yandamengda…
Temporal Contrastive Graph for Self-supervised Video Representation Learning
- Paper: none
- Code: github.com/YangLiu9208…
Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching
- Paper: none
- Code: github.com/ouranonymou…
Fast and Memory-Efficient Compact Bilinear Pooling
- Paper: none
- Code: github.com/cvpr2021kp2…
Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine
- Paper: none
- Code: github.com/gapDetectio…
Estimating A Child’s Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation
- Paper: none
- Code: github.com/interactive…