CVPR 2021 论文和开源项目合集地址：https://github.Homepage: https://m-nie

CVPR 2021 论文和开源项目合集(Papers with Code)

地址：github.com/amusi/CVPR2…

Best Paper
Backbone
NAS
GAN
VAE
Visual Transformer
Regularization
SLAM
长尾分布(Long-Tailed)
数据增广(Data Augmentation)
无监督/自监督(Self-Supervised)
半监督(Semi-Supervised)
胶囊网络(Capsule Network)
图像分类(Image Classification
2D目标检测(Object Detection)
单/多目标跟踪(Object Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
全景分割(Panoptic Segmentation)
医学图像分割(Medical Image Segmentation)
视频目标分割(Video-Object-Segmentation)
交互式视频目标分割(Interactive-Video-Object-Segmentation)
显著性检测(Saliency Detection)
伪装物体检测(Camouflaged Object Detection)
协同显著性检测(Co-Salient Object Detection)
图像抠图(Image Matting)
行人重识别(Person Re-identification)
行人搜索(Person Search)
视频理解/行为识别(Video Understanding)
人脸识别(Face Recognition)
人脸检测(Face Detection)
人脸活体检测(Face Anti-Spoofing)
Deepfake检测(Deepfake Detection)
人脸年龄估计(Age-Estimation)
人脸表情识别(Facial-Expression-Recognition)
Deepfakes
人体解析(Human Parsing)
2D/3D人体姿态估计(2D/3D Human Pose Estimation)
动物姿态估计(Animal Pose Estimation)
手部姿态估计(Hand Pose Estimation)
Human Volumetric Capture
场景文本识别(Scene Text Recognition)
图像压缩(Image Compression)
模型压缩/剪枝/量化
知识蒸馏(Knowledge Distillation)
超分辨率(Super-Resolution)
去雾(Dehazing)
图像恢复(Image Restoration)
图像补全(Image Inpainting)
图像编辑(Image Editing)
图像描述(Image Captioning)
字体生成(Font Generation)
图像匹配(Image Matching)
图像融合(Image Blending)
反光去除(Reflection Removal)
3D点云分类(3D Point Clouds Classification)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D全景分割(3D Panoptic Segmentation)
3D目标跟踪(3D Object Tracking)
3D点云配准(3D Point Cloud Registration)
3D点云补全(3D-Point-Cloud-Completion)
3D重建(3D Reconstruction)
6D位姿估计(6D Pose Estimation)
相机姿态估计(Camera Pose Estimation)
深度估计(Depth Estimation)
立体匹配(Stereo Matching)
光流估计(Flow Estimation)
车道线检测(Lane Detection)
轨迹预测(Trajectory Prediction)
人群计数(Crowd Counting)
对抗样本(Adversarial-Examples)
图像检索(Image Retrieval)
视频检索(Video Retrieval)
跨模态检索(Cross-modal Retrieval)
Zero-Shot Learning
联邦学习(Federated Learning)
视频插帧(Video Frame Interpolation)
视觉推理(Visual Reasoning)
图像合成(Image Synthesis)
视图合成(Visual Synthesis)
风格迁移(Style Transfer)
布局生成(Layout Generation)
Domain Generalization
Domain Adaptation
Open-Set
Adversarial Attack
"人-物"交互(HOI)检测
阴影去除(Shadow Removal)
虚拟试衣(Virtual Try-On)
标签噪声(Label Noise)
视频稳像(Video Stabilization)
数据集(Datasets)
其他(Others)
待添加(TODO)
不确定中没中(Not Sure)

Best Paper

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Homepage: m-niemeyer.github.io/project-pag…
Paper(Oral): arxiv.org/abs/2011.12…
Code: github.com/autonomousv…
Demo: www.youtube.com/watch?v=fIa…

Backbone

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

Paper(Oral): arxiv.org/abs/2106.06…
Code: github.com/dingmyu/HR-…

BCNet: Searching for Network Width with Bilaterally Coupled Network

Paper: arxiv.org/abs/2105.10…
Code: None

Decoupled Dynamic Filter Networks

Homepage: thefoxofsky.github.io/project_pag…
Paper: arxiv.org/abs/2104.14…
Code: github.com/thefoxofsky…

Lite-HRNet: A Lightweight High-Resolution Network

Paper: arxiv.org/abs/2104.06…
github.com/HRNet/Lite-…

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Paper: arxiv.org/abs/2104.04…
Code: github.com/jianghaojun…

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Paper: arxiv.org/abs/2103.13…
Code: github.com/DingXiaoH/D…

Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Paper(Oral): arxiv.org/abs/2103.12…
Code: None

ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network

Paper: arxiv.org/abs/2007.00…
Code: github.com/clovaai/rex…

Involution: Inverting the Inherence of Convolution for Visual Recognition

Paper: github.com/d-li14/invo…
Code: arxiv.org/abs/2103.06…

Coordinate Attention for Efficient Mobile Network Design

Paper: arxiv.org/abs/2103.02…
Code: github.com/Andrew-Qibi…

Inception Convolution with Efficient Dilation Search

Paper: arxiv.org/abs/2012.13…
Code: github.com/yifan123/IC…

RepVGG: Making VGG-style ConvNets Great Again

Paper: arxiv.org/abs/2101.03…
Code: github.com/DingXiaoH/R…

NAS

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

Paper(Oral): arxiv.org/abs/2106.06…
Code: github.com/dingmyu/HR-…

BCNet: Searching for Network Width with Bilaterally Coupled Network

Paper: arxiv.org/abs/2105.10…
Code: None

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

Paper: ttps://arxiv.org/abs/2105.10154
Code: None

Combined Depth Space based Architecture Search For Person Re-identification

Paper: arxiv.org/abs/2104.04…
Code: None

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

Paper(Oral): arxiv.org/abs/2103.15…
Code: None

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

Paper(Oral): None
Code: github.com/dingmyu/HR-…

Neural Architecture Search with Random Labels

Paper: arxiv.org/abs/2101.11…
Code: None

Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search

Paper: arxiv.org/abs/2101.11…
Code: None

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

Paper: arxiv.org/abs/2105.12…
Code: None

Prioritized Architecture Sampling with Monto-Carlo Tree Search

Paper: arxiv.org/abs/2103.11…
Code: github.com/xiusu/NAS-B…

Contrastive Neural Architecture Search with Neural Architecture Comparators

Paper: arxiv.org/abs/2103.05…
Code: github.com/chenyaofo/C…

AttentiveNAS: Improving Neural Architecture Search via Attentive

Paper: arxiv.org/abs/2011.09…
Code: None

ReNAS: Relativistic Evaluation of Neural Architecture Search

Paper: arxiv.org/abs/1910.01…
Code: None

HourNAS: Extremely Fast Neural Architecture

Paper: arxiv.org/abs/2005.14…
Code: None

Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator

Paper: arxiv.org/abs/2103.07…
Code: github.com/eric8607242…

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

Paper: arxiv.org/abs/2103.04…
Code: github.com/VDIGPKU/OPA…

Inception Convolution with Efficient Dilation Search

Paper: arxiv.org/abs/2012.13…
Code: None

GAN

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

Paper: arxiv.org/abs/2104.03…
Code: github.com/ecnuycxie/D…

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

Paper: arxiv.org/abs/2105.02…
Code: github.com/KumapowerLI…

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

Paper: arxiv.org/abs/2104.05…
Code: github.com/PaddlePaddl…

Regularizing Generative Adversarial Networks under Limited Data

Homepage: hytseng0509.github.io/lecam-gan/
Paper: faculty.ucmerced.edu/mhyang/pape…
Code: github.com/google/leca…

Towards Real-World Blind Face Restoration with Generative Facial Prior

Paper: arxiv.org/abs/2101.04…
Code: None

TediGAN: Text-Guided Diverse Image Generation and Manipulation

Generative Hierarchical Features from Synthesizing Image

Teachers Do More Than Teach: Compressing Image-to-Image Models

Paper: arxiv.org/abs/2103.03…
Code: github.com/snap-resear…

HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms

Paper: arxiv.org/abs/2011.11…
Code: github.com/mahmoudnafi…

pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

Homepage: marcoamonteiro.github.io/pi-GAN-webs…
Paper(Oral): arxiv.org/abs/2012.00…
Code: None

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network

Paper: arxiv.org/abs/2103.07…
Code: None

Diverse Semantic Image Synthesis via Probability Distribution Modeling

Paper: arxiv.org/abs/2103.06…
Code: github.com/tzt101/INAD…

LOHO: Latent Optimization of Hairstyles via Orthogonalization

Paper: arxiv.org/abs/2103.03…
Code: None

PISE: Person Image Synthesis and Editing with Decoupled GAN

Paper: arxiv.org/abs/2103.04…
Code: github.com/Zhangjinso/…

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

Paper: raywzy.com/
Code: raywzy.com/

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

Paper: raywzy.com/
Code: raywzy.com/

Efficient Conditional GAN Transfer with Knowledge Propagation across Classes

Paper: www.researchgate.net/publication…
Code: github.com/mshahbazi72…

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Paper: None
Code: None

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Paper: arxiv.org/abs/2011.14…
Code: None

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Homepage: eladrich.github.io/pixel2style…
Paper: arxiv.org/abs/2008.00…
Code: github.com/eladrich/pi…

A 3D GAN for Improved Large-pose Facial Recognition

Paper: arxiv.org/abs/2012.10…
Code: None

HumanGAN: A Generative Model of Humans Images

Paper: arxiv.org/abs/2103.06…
Code: None

ID-Unet: Iterative Soft and Hard Deformation for View Synthesis

Paper: arxiv.org/abs/2103.02…
Code: github.com/MingyuY/Ite…

CoMoGAN: continuous model-guided image-to-image translation

Paper(Oral): arxiv.org/abs/2103.06…
Code: github.com/cv-rits/CoM…

Training Generative Adversarial Networks in One Stage

Paper: arxiv.org/abs/2103.00…
Code: None

Closed-Form Factorization of Latent Semantics in GANs

Anycost GANs for Interactive Image Synthesis and Editing

Paper: arxiv.org/abs/2103.03…
Code: github.com/mit-han-lab…

Image-to-image Translation via Hierarchical Style Disentanglement

Paper: arxiv.org/abs/2103.01…
Code: github.com/imlixinyang…

VAE

Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders

Homepage: taldatech.github.io/soft-intro-…
Paper: arxiv.org/abs/2012.13…
Code: github.com/taldatech/s…

Visual Transformer

1. End-to-End Human Pose and Mesh Reconstruction with Transformers

Paper: arxiv.org/abs/2012.09…
Code: github.com/microsoft/M…

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Paper: arxiv.org/abs/2101.06…
Code: github.com/tobyperrett…

3. Kaleido-BERT：Vision-Language Pre-training on Fashion Domain

Paper: arxiv.org/abs/2103.16…
Code: github.com/mczhuge/Kal…

4. HOTR: End-to-End Human-Object Interaction Detection with Transformers

Paper: arxiv.org/abs/2104.13…
Code: github.com/kakaobrain/…

5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Paper: arxiv.org/abs/2104.09…
Code: github.com/autonomousv…

6. Pose Recognition with Cascade Transformers

Paper: arxiv.org/abs/2104.06…
Code: github.com/mlpc-ucsd/P…

7. Variational Transformer Networks for Layout Generation

Paper: arxiv.org/abs/2104.02…
Code: None

8. LoFTR: Detector-Free Local Feature Matching with Transformers

9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Paper: arxiv.org/abs/2012.15…
Code: github.com/fudan-zvg/S…

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

Paper: arxiv.org/abs/2103.16…
Code: None

11. Transformer Tracking

Paper: arxiv.org/abs/2103.15…
Code: github.com/chenxin-dlu…

12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

Paper(Oral): arxiv.org/abs/2106.06…
Code: github.com/dingmyu/HR-…

13. MIST: Multiple Instance Spatial Transformer

Paper: arxiv.org/abs/1811.10…
Code: None

14. Multimodal Motion Prediction with Stacked Transformers

Paper: arxiv.org/abs/2103.11…
Code: decisionforce.github.io/mmTransform…

15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

Paper: www.amazon.science/publication…
Code: github.com/amzn/image-…

16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Paper(Oral): arxiv.org/abs/2103.11…
Code: github.com/594422814/T…

17. Pre-Trained Image Processing Transformer

Paper: arxiv.org/abs/2012.00…
Code: None

18. End-to-End Video Instance Segmentation with Transformers

Paper(Oral): arxiv.org/abs/2011.14…
Code: github.com/Epiphqny/Vi…

19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Paper(Oral): arxiv.org/abs/2011.09…
Code: github.com/dddzg/up-de…

20. End-to-End Human Object Interaction Detection with HOI Transformer

Paper: arxiv.org/abs/2103.04…
Code: github.com/bbepoch/Hoi…

21. Transformer Interpretability Beyond Attention Visualization

Paper: arxiv.org/abs/2012.09…
Code: github.com/hila-chefer…

22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

Paper: None
Code: None

23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

Paper: None
Code: None

24. Line Segment Detection Using Transformers without Edges

Paper(Oral): arxiv.org/abs/2101.01…
Code: None

25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

Paper: openaccess.thecvf.com/content/CVP…
Code: None

26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

Paper(Oral): arxiv.org/abs/2101.08…
Code: github.com/dukebw/SSTV…

27. Facial Action Unit Detection With Transformers

Paper: None
Code: None

28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition

Paper: None
Code: None

29. Lesion-Aware Transformers for Diabetic Retinopathy Grading

Paper: None
Code: None

30. Topological Planning With Transformers for Vision-and-Language Navigation

Paper: arxiv.org/abs/2012.05…
Code: None

31. Adaptive Image Transformer for One-Shot Object Detection

Paper: None
Code: None

32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

Paper: None
Code: None

33. Taming Transformers for High-Resolution Image Synthesis

Homepage: compvis.github.io/taming-tran…
Paper(Oral): arxiv.org/abs/2012.09…
Code: github.com/CompVis/tam…

34. Self-Supervised Video Hashing via Bidirectional Transformers

Paper: None
Code: None

35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

Paper(Oral): hehefan.github.io/pdfs/p4tran…
Code: None

36. Gaussian Context Transformer

Paper: None
Code: None

37. General Multi-Label Image Classification With Transformers

Paper: arxiv.org/abs/2011.14…
Code: None

38. Bottleneck Transformers for Visual Recognition

Paper: arxiv.org/abs/2101.11…
Code: None

39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

Paper(Oral): arxiv.org/abs/2011.13…
Code: github.com/YicongHong/…

40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Paper(Oral): arxiv.org/abs/2102.06…
Code: github.com/jayleicn/Cl…

41. Self-attention based Text Knowledge Mining for Text Detection

Paper: None
Code: github.com/CVI-SZU/STK…

42. SSAN: Separable Self-Attention Network for Video Representation Learning

Paper: None
Code: None

43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Paper(Oral): arxiv.org/abs/2103.12…
Code: None

Regularization

Regularizing Neural Networks via Adversarial Model Perturbation

Paper: arxiv.org/abs/2010.04…
Code: github.com/hiyouga/AMP…

SLAM

Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation

Paper: arxiv.org/abs/2105.07…
Code: None

Generalizing to the Open World: Deep Visual Odometry with Online Adaptation

Paper: arxiv.org/abs/2103.15…
Code: arxiv.org/abs/2103.15…

长尾分布(Long-Tailed)

Adversarial Robustness under Long-Tailed Distribution

Paper(Oral): arxiv.org/abs/2104.02…
Code: github.com/wutong16/Ad…

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition

Paper: arxiv.org/abs/2103.16…
Code: github.com/Megvii-Base…

Adaptive Class Suppression Loss for Long-Tail Object Detection

Paper: arxiv.org/abs/2104.00…
Code: github.com/CASIA-IVA-L…

Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification

Paper: arxiv.org/abs/2103.14…
Code: None

数据增广(Data Augmentation)

Scale-aware Automatic Augmentation for Object Detection

Paper: arxiv.org/abs/2103.17…
Code: github.com/Jia-Researc…

无监督/自监督(Un/Self-Supervised)

Domain-Specific Suppression for Adaptive Object Detection

Paper: arxiv.org/abs/2105.03…
Code: None

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Paper: arxiv.org/abs/2104.14…
Code: github.com/facebookres…

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Paper: arxiv.org/abs/2104.12…
Code: None

Self-supervised Video Representation Learning by Context and Motion Decoupling

Paper: arxiv.org/abs/2104.00…
Code: None

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

Homepage: fingerrec.github.io/index_files…
Paper: arxiv.org/abs/2009.05…
Code: github.com/FingerRec/B…

Spatially Consistent Representation Learning

Paper: arxiv.org/abs/2103.06…
Code: None

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

Paper: arxiv.org/abs/2103.05…
Code: github.com/tinapan-pt/…

Exploring Simple Siamese Representation Learning

Paper(Oral): arxiv.org/abs/2011.10…
Code: None

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Paper(Oral): arxiv.org/abs/2011.09…
Code: github.com/WXinlong/De…

半监督学习(Semi-Supervised )

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

作者单位: 阿里巴巴
Paper: arxiv.org/abs/2103.11…
Code: None

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Paper: arxiv.org/abs/2103.02…
Code: github.com/SHI-Labs/Se…

胶囊网络(Capsule Network)

Capsule Network is Not More Robust than Convolutional Network

Paper: arxiv.org/abs/2103.15…
Code: None

图像分类(Image Classification)

Correlated Input-Dependent Label Noise in Large-Scale Image Classification

Paper(Oral): arxiv.org/abs/2105.10…
Code: github.com/google/unce…

2D目标检测(Object Detection)

2D目标检测

1. Scaled-YOLOv4: Scaling Cross Stage Partial Network

作者单位: 中央研究院, 英特尔, 静宜大学
Paper: arxiv.org/abs/2011.08…
Code: github.com/WongKinYiu/…
中文解读: YOLOv4官方改进版来了！55.8% AP！速度最高达1774 FPS，Scaled-YOLOv4正式开源！

2. You Only Look One-level Feature

作者单位: 中科院, 国科大, 旷视科技
Paper: arxiv.org/abs/2103.09…
Code: github.com/megvii-mode…
中文解读: CVPR 2021 | 没有FPN！中科院&旷视提出YOLOF：你只需看一层特征

3. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

作者单位: 香港大学, 同济大学, 字节跳动AI Lab, 加利福尼亚大学伯克利分校
Paper: arxiv.org/abs/2011.12…
Code: github.com/PeizeSun/Sp…
中文解读: 目标检测新范式！港大同济伯克利提出Sparse R-CNN，代码刚刚开源！

4. End-to-End Object Detection with Fully Convolutional Network

作者单位: 旷视科技, 西安交通大学
Paper: arxiv.org/abs/2012.03…
Code: github.com/Megvii-Base…

5. Dynamic Head: Unifying Object Detection Heads with Attentions

作者单位: 微软
Paper: arxiv.org/abs/2106.08…
Code: github.com/microsoft/D…
中文解读: 60.6 AP！打破COCO记录！微软提出DyHead：将注意力与目标检测Heads统一

6. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

作者单位: 南京理工大学, Momenta, 南京大学, 清华大学
Paper: arxiv.org/abs/2011.12…
Code: github.com/implus/GFoc…
中文解读：CVPR 2021 | GFLV2：目标检测良心技术，无Cost涨点！

7. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

作者单位: 华南理工大学, 腾讯微信AI
Paper(Oral): arxiv.org/abs/2011.09…
Code: github.com/dddzg/up-de…
中文解读: CVPR 2021 Oral | Transformer再发力！华南理工和微信提出UP-DETR：无监督预训练检测器

8. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators

作者单位: 威斯康星大学, 谷歌
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/tensorflow/…

9. Tracking Pedestrian Heads in Dense Crowd

作者单位: 雷恩第一大学
Homepage: project.inria.fr/crowdscienc…
Paper: openaccess.thecvf.com/content/CVP…
Code1: github.com/Sentient07/…
Code2: github.com/Sentient07/…
Dataset: project.inria.fr/crowdscienc…

10. Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

作者单位: 香港科技大学, 华为诺亚
Paper: arxiv.org/abs/2105.12…
Code: None

11. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

作者单位: A*star, 四川大学, 南洋理工大学
Paper: arxiv.org/abs/2105.12…
Code: None

12. IQDet: Instance-wise Quality Distribution Sampling for Object Detection

作者单位: 旷视科技
Paper: arxiv.org/abs/2104.06…
Code: None

13. Multi-Scale Aligned Distillation for Low-Resolution Detection

作者单位: 香港中文大学, Adobe研究院, 思谋科技
Paper: jiaya.me/papers/ms_a…
Code: github.com/Jia-Researc…

14. Adaptive Class Suppression Loss for Long-Tail Object Detection

作者单位: 中科院, 国科大, ObjectEye, 北京大学, 鹏城实验室, Nexwise
Paper: arxiv.org/abs/2104.00…
Code: github.com/CASIA-IVA-L…

15. VarifocalNet: An IoU-aware Dense Object Detector

作者单位: 昆士兰科技大学, 昆士兰大学
Paper(Oral): arxiv.org/abs/2008.13…
Code: github.com/hyz-xmaster…

16. OTA: Optimal Transport Assignment for Object Detection

作者单位: 早稻田大学, 旷视科技
Paper: arxiv.org/abs/2103.14…
Code: github.com/Megvii-Base…

17. Distilling Object Detectors via Decoupled Features

作者单位: 华为诺亚, 悉尼大学
Paper: arxiv.org/abs/2103.14…
Code: github.com/ggjy/DeFeat…

18. Robust and Accurate Object Detection via Adversarial Learning

作者单位: 谷歌, UCLA, UCSC
Paper: arxiv.org/abs/2103.13…
Code: None

19. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

作者单位: 北京大学, Anyvision, 石溪大学
Paper: arxiv.org/abs/2103.04…
Code: github.com/VDIGPKU/OPA…

20. Multiple Instance Active Learning for Object Detection

作者单位: 国科大, 华为诺亚, 清华大学
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/yuantn/MI-A…

21. Towards Open World Object Detection

作者单位: 印度理工学院, MBZUAI, 澳大利亚国立大学, 林雪平大学
Paper(Oral): arxiv.org/abs/2103.02…
Code: github.com/JosephKJ/OW…

22. RankDetNet: Delving Into Ranking Constraints for Object Detection

作者单位: 赛灵思
Paper: openaccess.thecvf.com/content/CVP…
Code: None

旋转目标检测

23. Dense Label Encoding for Boundary Discontinuity Free Rotation Detection

作者单位: 上海交通大学, 国科大
Paper: arxiv.org/abs/2011.09…
Code1: github.com/Thinklab-SJ…
Code2: github.com/yangxue0827…

24. ReDet: A Rotation-equivariant Detector for Aerial Object Detection

作者单位: 武汉大学
Paper: arxiv.org/abs/2103.07…
Code: github.com/csuhan/ReDe…

25. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection

作者单位: 国科大, 清华大学
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/SDL-GuoZong…

Few-Shot目标检测

26. Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss

作者单位: 复旦大学, 同济大学, 浙江大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

27. Adaptive Image Transformer for One-Shot Object Detection

作者单位: 中央研究院, 台湾AI Labs
Paper: openaccess.thecvf.com/content/CVP…
Code: None

28. Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection

作者单位: 北京大学, 北邮
Paper: arxiv.org/abs/2103.17…
Code: github.com/hzhupku/DCN…

29. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection

作者单位: 卡内基梅隆大学(CMU)
Paper: arxiv.org/abs/2103.01…
Code: None

30. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding

作者单位: 南加利福尼亚大学, 旷视科技
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/MegviiDetec…

31. Hallucination Improves Few-Shot Object Detection

作者单位: 伊利诺伊大学厄巴纳-香槟分校
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/pppplin/Hal…

32. Few-Shot Object Detection via Classification Refinement and Distractor Retreatment

作者单位: 新加坡国立大学, SIMTech
Paper: openaccess.thecvf.com/content/CVP…
Code: None

33. Generalized Few-Shot Object Detection Without Forgetting

作者单位: 旷视科技
Paper: openaccess.thecvf.com/content/CVP…
Code: None

34. Transformation Invariant Few-Shot Object Detection

作者单位: 华为诺亚方舟实验室
Paper: openaccess.thecvf.com/content/CVP…
Code: None

35. UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation

作者单位: 不列颠哥伦比亚大学, Vector AI, CIFAR AI Chair
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/ubc-vision/…

36. Beyond Max-Margin: Class Margin Equilibrium for Few-Shot Object Detection

作者单位: 国科大, 厦门大学, 鹏城实验室
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/Bohao-Lee/C…

半监督目标检测

37. Points As Queries: Weakly Semi-Supervised Object Detection by Points]

作者单位: 旷视科技, 复旦大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

38. Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection

作者单位: 清华大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

39. Positive-Unlabeled Data Purification in the Wild for Object Detection

作者单位: 华为诺亚方舟实验室, 悉尼大学, 北京大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

40. Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection

作者单位: 阿里巴巴, 香港理工大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

41. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

作者单位: 阿里巴巴
Paper: arxiv.org/abs/2103.11…
Code: None

42. Humble Teachers Teach Better Students for Semi-Supervised Object Detection

作者单位: 卡内基梅隆大学(CMU), 亚马逊
Homepage: yihet.com/humble-teac…
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/lryta/Humbl…

43. Interpolation-Based Semi-Supervised Learning for Object Detection

作者单位: 首尔大学, 阿尔托大学等
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/soo89/ISD-S…

域自适应目标检测

44. Domain-Specific Suppression for Adaptive Object Detection

作者单位: 中科院, 寒武纪, 国科大
Paper: openaccess.thecvf.com/content/CVP…
Code: None

45. MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection

作者单位: 约翰斯·霍普金斯大学, 梅赛德斯—奔驰
Paper: arxiv.org/abs/2103.04…
Code: None

46. Unbiased Mean Teacher for Cross-Domain Object Detection

作者单位: 电子科技大学, ETH Zurich
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/kinredon/um…

47. I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors

作者单位: 香港大学, 厦门大学, Deepwise AI Lab
Paper: arxiv.org/abs/2103.13…
Code: None

自监督目标检测

48. There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge

作者单位: 弗莱堡大学
Paper: openaccess.thecvf.com/content/CVP…
Code: rl.uni-freiburg.de/research/mu…

49. Instance Localization for Self-supervised Detection Pretraining

作者单位: 香港中文大学, 微软亚洲研究院
Paper: arxiv.org/abs/2102.08…
Code: github.com/limbo0000/I…

弱监督目标检测

50. Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection

作者单位: 北航, 鹏城实验室, 商汤科技
Paper: openaccess.thecvf.com/content/CVP…
Code: None

51. DAP: Detection-Aware Pre-training with Weak Supervision

作者单位: UIUC, 微软
Paper: openaccess.thecvf.com/content/CVP…
Code: None

其他

52. Open-Vocabulary Object Detection Using Captions

作者单位：Snap, 哥伦比亚大学
Paper(Oral): openaccess.thecvf.com/content/CVP…
Code: github.com/alirezazare…

53. Depth From Camera Motion and Object Detection

作者单位: 密歇根大学, SIAI
Paper: arxiv.org/abs/2103.01…
Code: github.com/griffbr/ODM…
Dataset: github.com/griffbr/ODM…

54. Unsupervised Object Detection With LIDAR Clues

作者单位: 商汤科技, 国科大, 中科大
Paper: openaccess.thecvf.com/content/CVP…
Code: None

55. GAIA: A Transfer Learning System of Object Detection That Fits Your Needs

作者单位: 国科大, 北理, 中科院, 商汤科技
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/GAIA-vision…

56. General Instance Distillation for Object Detection

作者单位: 旷视科技, 北航
Paper: openaccess.thecvf.com/content/CVP…
Code: None

57. AQD: Towards Accurate Quantized Object Detection

作者单位: 蒙纳士大学, 阿德莱德大学, 华南理工大学
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/aim-uofa/mo…

58. Scale-Aware Automatic Augmentation for Object Detection

作者单位: 香港中文大学, 字节跳动AI Lab, 思谋科技
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/Jia-Researc…

59. Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection

作者单位: 同济大学, 商汤科技, 清华大学
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/tztztztztz/…

60. Class-Aware Robust Adversarial Training for Object Detection

作者单位: 哥伦比亚大学, 中央研究院
Paper: openaccess.thecvf.com/content/CVP…
Code: None

61. Improved Handling of Motion Blur in Online Object Detection

作者单位: 伦敦大学学院
Homepage: visual.cs.ucl.ac.uk/pubs/handli…
Paper: openaccess.thecvf.com/content/CVP…
Code: None

62. Multiple Instance Active Learning for Object Detection

作者单位: 国科大, 华为诺亚
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/yuantn/MI-A…

63. Neural Auto-Exposure for High-Dynamic Range Object Detection

作者单位: Algolux, 普林斯顿大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

64. Generalizable Pedestrian Detection: The Elephant in the Room

作者单位: IIAI, 阿尔托大学
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/hasanirtiza…

65. Neural Auto-Exposure for High-Dynamic Range Object Detection

作者单位: Algolux, 普林斯顿大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

单/多目标跟踪(Object Tracking)

单目标跟踪

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

Paper: arxiv.org/abs/2104.14…
Code: github.com/researchmm/…

Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark

Homepage: sites.google.com/view/langtr…
Paper: arxiv.org/abs/2103.16…
Evaluation Toolkit: github.com/wangxiao579…
Demo Video: www.youtube.com/watch?v=7lv…

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

Paper: arxiv.org/abs/2103.14…
Code: github.com/VISION-SJTU…

Graph Attention Tracking

Paper: arxiv.org/abs/2011.11…
Code: github.com/ohhhyeahhh/…

Rotation Equivariant Siamese Networks for Tracking

Paper: arxiv.org/abs/2012.13…
Code: None

Track to Detect and Segment: An Online Multi-Object Tracker

Homepage: jialianwu.com/projects/Tr…
Paper: None
Code: None

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Paper(Oral): arxiv.org/abs/2103.11…
Code: github.com/594422814/T…

Transformer Tracking

Paper: arxiv.org/abs/2103.15…
Code: github.com/chenxin-dlu…

多目标跟踪

Tracking Pedestrian Heads in Dense Crowd

Homepage: project.inria.fr/crowdscienc…
Paper: openaccess.thecvf.com/content/CVP…
Code1: github.com/Sentient07/…
Code2: github.com/Sentient07/…
Dataset: project.inria.fr/crowdscienc…

Multiple Object Tracking with Correlation Learning

Paper: arxiv.org/abs/2104.03…
Code: None

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Paper: arxiv.org/abs/2012.02…
Code: None

Learning a Proposal Classifier for Multiple Object Tracking

Paper: arxiv.org/abs/2103.07…
Code: github.com/daip13/LPC_…

Track to Detect and Segment: An Online Multi-Object Tracker

语义分割(Semantic Segmentation)

1. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation

作者单位: Facebook AI, 巴伊兰大学, 特拉维夫大学
Homepage: nirkin.com/hyperseg/
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/YuvalNirkin…

2. Rethinking BiSeNet For Real-time Semantic Segmentation

作者单位: 美团
Paper: arxiv.org/abs/2104.13…
Code: github.com/MichaelFan0…

3. Progressive Semantic Segmentation

作者单位: VinAI Research, VinUniversity, 阿肯色大学, 石溪大学
Paper: arxiv.org/abs/2104.03…
Code: github.com/VinAIResear…

4. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

作者单位: 复旦大学, 牛津大学, 萨里大学, 腾讯优图, Facebook AI
Homepage: fudan-zvg.github.io/SETR
Paper: arxiv.org/abs/2012.15…
Code: github.com/fudan-zvg/S…

5. Capturing Omni-Range Context for Omnidirectional Segmentation

作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司, 华为
Paper: arxiv.org/abs/2103.05…
Code: None

6. Learning Statistical Texture for Semantic Segmentation

作者单位: 北航, 商汤科技
Paper: arxiv.org/abs/2103.04…
Code: None

7. InverseForm: A Loss Function for Structured Boundary-Aware Segmentation

作者单位: 高通AI研究院
Paper: openaccess.thecvf.com/content/CVP…
Code: None

8. DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation

作者单位: Joyy Inc, 快手, 北航等
Paper: openaccess.thecvf.com/content/CVP…
Code: None

弱监督语义分割

9. Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation

作者单位: 延世大学, 成均馆大学
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/halbielee/E…

10. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

作者单位: 延世大学
Homepage: cvlab.yonsei.ac.kr/projects/BA…
Paper: arxiv.org/abs/2104.00…
Code: None

11. Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

作者单位: 南京理工大学, MBZUAI, 电子科技大学, 阿德莱德大学, 悉尼科技大学
Paper: arxiv.org/abs/2103.14…
Code: github.com/NUST-Machin…

12. Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation

作者单位: 北京理工大学, 美团
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/allenwu97/E…

13. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation

作者单位: 首尔大学
Paper: arxiv.org/abs/2103.08…
Code: github.com/jbeomlee93/…

半监督语义分割

14. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

作者单位: 北京大学, 微软亚洲研究院
Paper: arxiv.org/abs/2106.01…
Code: github.com/charlesCXK/…

15. Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation

作者单位: 华为, 大连理工大学, 北京大学
Paper: arxiv.org/abs/2103.04…
Code: None

16. Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency

作者单位: 香港中文大学, 思谋科技, 牛津大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

17. Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

作者单位: NVIDIA, 多伦多大学, 耶鲁大学, MIT, Vector Institute
Paper: openaccess.thecvf.com/content/CVP…
Code: nv-tlabs.github.io/semanticGAN…

18. Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation

作者单位: ETH Zurich, 伯恩大学, 鲁汶大学
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/lhoyer/impr…

域自适应语义分割

19. Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation

作者单位: ETH Zurich, 鲁汶大学, 电子科技大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

20. Source-Free Domain Adaptation for Semantic Segmentation

作者单位: 华东师范大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

21. Uncertainty Reduction for Model Adaptation in Semantic Segmentation

作者单位: Idiap Research Institute, EPFL, 日内瓦大学
Paper: openaccess.thecvf.com/content/CVP…
Code: git.io/JthPp

22. Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation

作者单位: 达姆施塔特工业大学, hessian.AI
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/visinf/da-s…

23. RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

作者单位: LG AI研究院, KAIST等
Paper: arxiv.org/abs/2103.15…
Code: github.com/shachoi/Rob…

24. Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization

作者单位: 香港大学, 深睿医疗
Paper: arxiv.org/abs/2103.13…
Code: None

25. MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation

作者单位: 香港城市大学, 百度
Paper: arxiv.org/abs/2103.05…
Code: github.com/cyang-cityu…

26. Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation

作者单位: 华为云, 华为诺亚, 大连理工大学
Paper: arxiv.org/abs/2103.04…
Code: None

27. Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

作者单位: 中国科学技术大学, 微软亚洲研究院
Paper: arxiv.org/abs/2101.10…
Code: github.com/microsoft/P…

28. DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation

作者单位: 南卡罗来纳大学, 天远视科技
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/W-zx-Y/DANN…

Few-Shot语义分割

29. Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation

作者单位: MBZUAI, IIAI, 哈工大
Paper: openaccess.thecvf.com/content/CVP…
Code: None

30. Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation

作者单位: 国科大, 清华大学
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/Bibkiller/A…

无监督语义分割

31. PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering

作者单位: UT-Austin, 康奈尔大学
Paper: openaccess.thecvf.com/content/CVP…
Code: https:// github.com/janghyuncho/PiCIE

视频语义分割

32. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

作者单位: 浙江大学, 百度, 悉尼科技大学
Homepage: www.vspwdataset.com/
Paper: www.vspwdataset.com/CVPR2021__m…
GitHub: github.com/sssdddwww2/…

其它

33. Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations

作者单位: 帕多瓦大学
Paper: openaccess.thecvf.com/content/CVP…
Code: lttm.dei.unipd.it/paper_data/…

34. Exploit Visual Dependency Relations for Semantic Segmentation

作者单位: 伊利诺伊大学芝加哥分校
Paper: openaccess.thecvf.com/content/CVP…
Code: None

35. Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs

作者单位: Institute for Infocomm Research, 新加坡国立大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

36. PLOP: Learning without Forgetting for Continual Semantic Segmentation

作者单位: 索邦大学, Heuritech, Datakalab, Valeo.ai
Paper: arxiv.org/abs/2011.11…
Code: github.com/arthurdouil…

37. 3D-to-2D Distillation for Indoor Scene Parsing

作者单位: 香港中文大学, 香港大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

38. Bidirectional Projection Network for Cross Dimension Scene Understanding

作者单位: 香港中文大学, 牛津大学等
Paper(Oral): arxiv.org/abs/2103.14…
Code: github.com/wbhu/BPNet

39. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation

作者单位: 北京大学, 中科院, 国科大, ETH Zurich, 商汤科技等
Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/lxtGH/PFSeg…

实例分割(Instance Segmentation)

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Paper: arxiv.org/abs/2011.09…
Code: github.com/aliyun/DCT-…

Incremental Few-Shot Instance Segmentation

Paper: arxiv.org/abs/2105.05…
Code: github.com/danganea/iM…

A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation

Paper: arxiv.org/abs/2105.03…
Code: None

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

Paper: arxiv.org/abs/2104.08…
Code: github.com/zhanggang00…

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

Paper: arxiv.org/abs/2104.05…
Code: github.com/tinyalpha/B…

Multi-Scale Aligned Distillation for Low-Resolution Detection

Paper: jiaya.me/papers/ms_a…
Code: github.com/Jia-Researc…

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

Homepage: bowenc0221.github.io/boundary-io…
Paper: arxiv.org/abs/2103.16…
Code: github.com/bowenc0221/…

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

Paper: arxiv.org/abs/2103.12…
Code: github.com/lkeab/BCNet

Zero-shot instance segmentation（Not Sure）

Paper: None
Code: github.com/CVPR2021-pa…

视频实例分割

STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Paper: www4.comp.polyu.edu.hk/~cslzhang/p…
Code: github.com/MinghanLi/S…

End-to-End Video Instance Segmentation with Transformers

Paper(Oral): arxiv.org/abs/2011.14…
Code: github.com/Epiphqny/Vi…

全景分割(Panoptic Segmentation)

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Part-aware Panoptic Segmentation

Exemplar-Based Open-Set Panoptic Segmentation Network

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

Paper: openaccess.thecvf.com/content/CVP…
Code: None

Panoptic Segmentation Forecasting

Paper: arxiv.org/abs/2104.03…
Code: github.com/nianticlabs…

Fully Convolutional Networks for Panoptic Segmentation

Paper: arxiv.org/abs/2012.00…
Code: github.com/yanwei-li/P…

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

Paper: arxiv.org/abs/2103.02…
Code: None

医学图像分割

1. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling

作者单位: 腾讯天衍实验室, 北京同仁医院
Paper(Best Paper Candidate): openaccess.thecvf.com/content/CVP…
Code: github.com/jiwei0921/M…

2. Every Annotation Counts: Multi-Label Deep Supervision for Medical Image Segmentation

作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司等
Paper: openaccess.thecvf.com/content/CVP…
Code: None

3. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

作者单位: 香港中文大学, 香港理工大学
Paper: arxiv.org/abs/2103.06…
Code: github.com/liuquande/F…

4. DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

作者单位: 约翰斯·霍普金斯大大学, NVIDIA
Paper(Oral): arxiv.org/abs/2103.15…
Code: None

5. DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images

作者单位: 斯坦福大学
Paper: openaccess.thecvf.com/content/CVP…
Code: None

视频目标分割(Video-Object-Segmentation)

Learning Position and Target Consistency for Memory-based Video Object Segmentation

Paper: arxiv.org/abs/2104.04…
Code: None

SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

Paper(Oral): arxiv.org/abs/2101.08…
Code: github.com/dukebw/SSTV…

交互式视频目标分割(Interactive-Video-Object-Segmentation)

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

Paper: arxiv.org/abs/2103.10…
Code: github.com/svip-lab/IV…

显著性检测(Saliency Detection)

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

Paper: arxiv.org/abs/2104.02…
Code: github.com/JingZhang61…

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion

Paper(Oral): arxiv.org/abs/2103.11…
Code: github.com/sunpeng1996…

伪装物体检测(Camouflaged Object Detection)

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

Paper: arxiv.org/abs/2104.02…
Code: github.com/JingZhang61…

协同显著性检测(Co-Salient Object Detection)

Group Collaborative Learning for Co-Salient Object Detection

Paper: arxiv.org/abs/2104.01…
Code: github.com/fanq15/GCoN…

协同显著性检测(Image Matting)

Semantic Image Matting

行人重识别(Person Re-identification)

Generalizable Person Re-identification with Relevance-aware Mixture of Experts

Paper: arxiv.org/abs/2105.09…
Code: None

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Paper: arxiv.org/abs/2104.12…
Code: None

Combined Depth Space based Architecture Search For Person Re-identification

Paper: arxiv.org/abs/2104.04…
Code: None

行人搜索(Person Search)

Anchor-Free Person Search

Paper: arxiv.org/abs/2103.11…
Code: github.com/daodaofr/Al…
Interpretation: 首个无需锚框（Anchor-Free）的行人搜索框架 | CVPR 2021

视频理解/行为识别(Video Understanding)

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Paper: arxiv.org/abs/2101.06…
Code: github.com/tobyperrett…

FrameExit: Conditional Early Exiting for Efficient Video Recognition

Paper(Oral): arxiv.org/abs/2104.13…
Code: None

No frame left behind: Full Video Action Recognition

Paper: arxiv.org/abs/2103.15…
Code: None

Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

Paper: arxiv.org/abs/2103.13…
Code: None

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Paper: arxiv.org/abs/2103.13…
Code: None
Interpretation: CVPR 2021 | TCANet：最强时序动作提名修正网络

ACTION-Net: Multipath Excitation for Action Recognition

Paper: arxiv.org/abs/2103.07…
Code: github.com/V-Sense/ACT…

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

Homepage: fingerrec.github.io/index_files…
Paper: arxiv.org/abs/2009.05…
Code: github.com/FingerRec/B…

TDN: Temporal Difference Networks for Efficient Action Recognition

Paper: arxiv.org/abs/2012.10…
Code: github.com/MCG-NJU/TDN

人脸识别(Face Recognition)

A 3D GAN for Improved Large-pose Facial Recognition

Paper: arxiv.org/abs/2012.10…
Code: None

MagFace: A Universal Representation for Face Recognition and Quality Assessment

Paper(Oral): arxiv.org/abs/2103.06…
Code: github.com/IrvingMeng/…

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

人脸检测(Face Detection)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection

Homepage: daooshee.github.io/HLA-Face-We…
Paper: arxiv.org/abs/2104.01…
Code: github.com/daooshee/HL…

CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement

Paper: arxiv.org/abs/2103.07…
Code: None

人脸活体检测(Face Anti-Spoofing)

Cross Modal Focal Loss for RGBD Face Anti-Spoofing

Paper: arxiv.org/abs/2103.00…
Code: None

Deepfake检测(Deepfake Detection)

Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain

Paper：arxiv.org/abs/2103.01…
Code: None

Multi-attentional Deepfake Detection

Paper：arxiv.org/abs/2103.02…
Code: None

人脸年龄估计(Age Estimation)

Continuous Face Aging via Self-estimated Residual Age Embedding

Paper: arxiv.org/abs/2105.00…
Code: None

PML: Progressive Margin Loss for Long-tailed Age Classification

Paper: arxiv.org/abs/2103.02…
Code: None

人脸表情识别(Facial Expression Recognition)

Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition

Paper: arxiv.org/abs/2103.13…
Code: None

Deepfakes

MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes

Paper: arxiv.org/abs/2103.14…
Code: None

人体解析(Human Parsing)

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

Paper: arxiv.org/abs/2103.04…
Code: github.com/tfzhou/MG-H…

2D/3D人体姿态估计(2D/3D Human Pose Estimation)

2D 人体姿态估计

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

Paper: ttps://arxiv.org/abs/2105.10154
Code: None

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

Paper: arxiv.org/abs/2105.06…
Code: None

Pose Recognition with Cascade Transformers

Paper: arxiv.org/abs/2104.06…
Code: github.com/mlpc-ucsd/P…

DCPose: Deep Dual Consecutive Network for Human Pose Estimation

Paper: arxiv.org/abs/2103.07…
Code: github.com/Pose-Group/…

3D 人体姿态估计

End-to-End Human Pose and Mesh Reconstruction with Transformers

Paper: arxiv.org/abs/2012.09…
Code: github.com/microsoft/M…

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation

Paper(Oral): arxiv.org/abs/2105.02…
Code: github.com/jfzhang95/P…

Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration

Paper: arxiv.org/abs/2103.02…
Code: github.com/SeanChenxy/…

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

Paper: arxiv.org/abs/2104.01…
github.com/3dpose/3D-M…

HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation

动物姿态估计(Animal Pose Estimation)

From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation

Paper: arxiv.org/abs/2103.14…
Code: None

手部姿态估计(Hand Pose Estimation)

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

Homepage: stevenlsw.github.io/Semi-Hand-O…
Paper: arxiv.org/abs/2106.05…
Code: github.com/stevenlsw/S…

Human Volumetric Capture

POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture

Homepage: www.liuyebin.com/posefusion/…
Paper(Oral): arxiv.org/abs/2103.15…
Code: None

场景文本检测(Scene Text Detection)

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

Paper: arxiv.org/abs/2104.10…
Code: None

场景文本识别(Scene Text Recognition)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Paper: arxiv.org/abs/2103.06…
Code: github.com/FangShanche…

图像压缩

Checkerboard Context Model for Efficient Learned Image Compression

Paper: arxiv.org/abs/2103.15…
Code: None

Slimmable Compressive Autoencoders for Practical Neural Image Compression

Paper: arxiv.org/abs/2103.15…
Code: None

Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton

Paper: arxiv.org/abs/2103.15…
Code: None

模型压缩/剪枝/量化

Teachers Do More Than Teach: Compressing Image-to-Image Models

Paper: arxiv.org/abs/2103.03…
Code: github.com/snap-resear…

模型剪枝

Dynamic Slimmable Network

Paper: arxiv.org/abs/2103.13…
Code: github.com/changlin31/…

模型量化

Network Quantization with Element-wise Gradient Scaling

Paper: arxiv.org/abs/2104.00…
Code: None

Zero-shot Adversarial Quantization

Paper(Oral): arxiv.org/abs/2103.15…
Code: git.io/Jqc0y

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Paper: arxiv.org/abs/2103.07…
Code: None

知识蒸馏(Knowledge Distillation)

Distilling Knowledge via Knowledge Review

Paper: arxiv.org/abs/2104.09…
Code: github.com/Jia-Researc…

Distilling Object Detectors via Decoupled Features

Paper: arxiv.org/abs/2103.14…
Code: github.com/ggjy/DeFeat…

超分辨率(Super-Resolution)

Image Super-Resolution with Non-Local Sparse Attention

Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/HarukiYqM/N…

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

Homepage: mepro.bjtu.edu.cn/resource.ht…
Paper: arxiv.org/abs/2104.06…
Code: None

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

Paper: arxiv.org/abs/2103.04…
Code: github.com/Xiangtaokon…

AdderSR: Towards Energy Efficient Image Super-Resolution

Paper: arxiv.org/abs/2009.08…
Code: None

去雾(Dehazing)

Contrastive Learning for Compact Single Image Dehazing

Paper: arxiv.org/abs/2104.09…
Code: github.com/GlassyWu/AE…

视频超分辨率

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

Paper: None
Code: github.com/CS-GangXu/T…

图像恢复(Image Restoration)

Multi-Stage Progressive Image Restoration

Paper: arxiv.org/abs/2102.02…
Code: github.com/swz30/MPRNe…

图像补全(Image Inpainting)

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

Paper: arxiv.org/abs/2105.02…
Code: github.com/KumapowerLI…

TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

Homepage: yzhouas.github.io/projects/Tr…
Paper: arxiv.org/abs/2103.15…
Code: None

图像编辑(Image Editing)

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

High-Fidelity and Arbitrary Face Editing

Paper: arxiv.org/abs/2103.15…
Code: None

Anycost GANs for Interactive Image Synthesis and Editing

Paper: arxiv.org/abs/2103.03…
Code: github.com/mit-han-lab…

PISE: Person Image Synthesis and Editing with Decoupled GAN

Paper: arxiv.org/abs/2103.04…
Code: github.com/Zhangjinso/…

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

Paper: raywzy.com/
Code: raywzy.com/

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Paper: None
Code: None

图像描述(Image Captioning)

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

Paper: arxiv.org/abs/2105.03…
Code: None

字体生成(Font Generation)

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

Paper: arxiv.org/abs/2104.03…
Code: github.com/ecnuycxie/D…

图像匹配(Image Matcing)

LoFTR: Detector-Free Local Feature Matching with Transformers

Convolutional Hough Matching Networks

Homapage: cvlab.postech.ac.kr/research/CH…
Paper(Oral): arxiv.org/abs/2103.16…
Code: None

图像融合(Image Blending)

Bridging the Visual Gap: Wide-Range Image Blending

Paper: arxiv.org/abs/2103.15…
Code: github.com/julia0607/W…

反光去除(Reflection Removal)

Robust Reflection Removal with Reflection-free Flash-only Cues

Paper: arxiv.org/abs/2103.04…
Code: github.com/ChenyangLEI…

3D点云分类(3D Point Clouds Classification)

Equivariant Point Network for 3D Point Cloud Analysis

Paper: arxiv.org/abs/2103.14…
Code: None

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

Paper: arxiv.org/abs/2103.14…
Code: github.com/CVMI-Lab/PA…

3D目标检测(3D Object Detection)

3D-MAN: 3D Multi-frame Attention Network for Object Detection

Paper: arxiv.org/abs/2103.16…
Code: None

Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

Paper: arxiv.org/abs/2104.06…
Code: github.com/cheng052/BR…

HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection

Homepage: cvlab.yonsei.ac.kr/projects/HV…
Paper: arxiv.org/abs/2104.00…
Code: github.com/cvlab-yonse…

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

Paper: arxiv.org/abs/2103.15…
Code: github.com/tusimple/Li…

M3DSSD: Monocular 3D Single Stage Object Detector

Paper: arxiv.org/abs/2103.13…
Code: github.com/mumianyuxin…

SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud

Paper: None
Code: github.com/Vegeta2020/…

Center-based 3D Object Detection and Tracking

Paper: arxiv.org/abs/2006.11…
Code: github.com/tianweiy/Ce…

Categorical Depth Distribution Network for Monocular 3D Object Detection

Paper: arxiv.org/abs/2103.01…
Code: None

3D语义分割(3D Semantic Segmentation)

Bidirectional Projection Network for Cross Dimension Scene Understanding

Paper(Oral): arxiv.org/abs/2103.14…
Code: github.com/wbhu/BPNet

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Paper: arxiv.org/abs/2103.07…
Code: github.com/ShiQiu0419/…

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

Paper: arxiv.org/abs/2011.10…
Code: github.com/xinge008/Cy…

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

3D全景分割(3D Panoptic Segmentation)

Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation

Paper: arxiv.org/abs/2103.14…
Code: github.com/edwardzhou1…

3D目标跟踪(3D Object Trancking)

Center-based 3D Object Detection and Tracking

Paper: arxiv.org/abs/2006.11…
Code: github.com/tianweiy/Ce…

3D点云配准(3D Point Cloud Registration)

ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

Paper: arxiv.org/abs/2103.15…
Code: None

PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Paper: arxiv.org/abs/2103.05…
Code: github.com/XuyangBai/P…

PREDATOR: Registration of 3D Point Clouds with Low Overlap

Paper: arxiv.org/abs/2011.13…
Code: github.com/ShengyuH/Ov…

3D点云补全(3D Point Cloud Completion)

Unsupervised 3D Shape Completion through GAN Inversion

Homepage: junzhezhang.github.io/projects/Sh…
Paper: arxiv.org/abs/2104.13…
Code: github.com/junzhezhang…

Variational Relational Point Completion Network

Homepage: paul007pl.github.io/projects/VR…
Paper: arxiv.org/abs/2104.10…
Code: github.com/paul007pl/V…

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion

3D重建(3D Reconstruction)

Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection

Paper: arxiv.org/abs/2106.07…
Code: github.com/TencentYout…

Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction

Paper: arxiv.org/abs/2104.00…
Code: None

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video

6D位姿估计(6D Pose Estimation)

FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

Paper(Oral): arxiv.org/abs/2103.07…
Code: github.com/DC1991/FS-N…

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

Paper: arxiv.org/abs/2102.12…
code: git.io/GDR-Net

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

Paper: arxiv.org/abs/2103.02…
Code: github.com/ethnhe/FFB6…

相机姿态估计

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

Paper: arxiv.org/abs/2103.09…
Code: github.com/cvg/pixloc

深度估计(Depth Estimation)

S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

Paper(Oral): arxiv.org/abs/2104.00…
Code: None

Beyond Image to Depth: Improving Depth Prediction using Echoes

Homepage: krantiparida.github.io/projects/bi…
Paper: arxiv.org/abs/2103.08…
Code: github.com/krantiparid…

S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation

Paper: arxiv.org/abs/2103.02…
Code: None

Depth from Camera Motion and Object Detection

立体匹配(Stereo Matching)

A Decomposition Model for Stereo Matching

Paper: arxiv.org/abs/2104.07…
Code: None

光流估计(Flow Estimation)

Self-Supervised Multi-Frame Monocular Scene Flow

Paper: arxiv.org/abs/2105.02…
Code: github.com/visinf/mult…

RAFT-3D: Scene Flow using Rigid-Motion Embeddings

Paper: arxiv.org/abs/2012.00…
Code: None

Learning Optical Flow From Still Images

Homepage: mattpoggi.github.io/projects/cv…
Paper: mattpoggi.github.io/assets/pape…
Code: github.com/mattpoggi/d…

FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Paper: arxiv.org/abs/2104.00…
Code: None

车道线检测(Lane Detection)

Focus on Local: Detecting Lane Marker from Bottom Up via Key Point

Paper: arxiv.org/abs/2105.13…
Code: None

Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

Paper: arxiv.org/abs/2010.12…
Code: github.com/lucastabeli…

轨迹预测(Trajectory Prediction)

Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction

Paper(Oral): arxiv.org/abs/2104.08…
Code: None

人群计数(Crowd Counting)

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

对抗样本(Adversarial Examples)

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

Paper: arxiv.org/abs/2103.15…
Code: github.com/JHL-HUST/VT

LiBRe: A Practical Bayesian Approach to Adversarial Detection

Paper: arxiv.org/abs/2103.14…
Code: None

Natural Adversarial Examples

Paper: arxiv.org/abs/1907.07…
Code: github.com/hendrycks/n…

图像检索(Image Retrieval)

StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

Paper: arxiv.org/abs/2103.15…
COde: None

QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval

Paper: arxiv.org/abs/2103.02…
Code: None

视频检索(Video Retrieval)

On Semantic Similarity in Video Retrieval

跨模态检索(Cross-modal Retrieval)

Cross-Modal Center Loss for 3D Cross-Modal Retrieval

Paper: arxiv.org/abs/2008.03…
Code: github.com/LongLong-Ji…

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

Paper: arxiv.org/abs/2103.16…
Code: None

Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

Paper: www.amazon.science/publication…
Code: github.com/amzn/image-…

Zero-Shot Learning

Counterfactual Zero-Shot and Open-Set Visual Recognition

Paper: arxiv.org/abs/2103.00…
Code: github.com/yue-zhongqi…

联邦学习(Federated Learning)

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

Paper: arxiv.org/abs/2103.06…
Code: github.com/liuquande/F…

视频插帧(Video Frame Interpolation)

CDFI: Compression-Driven Network Design for Frame Interpolation

Paper: None
Code: github.com/tding1/CDFI

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation

视觉推理(Visual Reasoning)

Transformation Driven Visual Reasoning

图像合成(Image Synthesis)

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Homepage: m-niemeyer.github.io/project-pag…
Paper(Oral): arxiv.org/abs/2011.12…
Code: github.com/autonomousv…
Demo: www.youtube.com/watch?v=fIa…

Taming Transformers for High-Resolution Image Synthesis

Homepage: compvis.github.io/taming-tran…
Paper(Oral): arxiv.org/abs/2012.09…
Code: github.com/CompVis/tam…

视图合成(View Synthesis)

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Homepage: virtualhumans.mpi-inf.mpg.de/srf/
Paper: arxiv.org/abs/2104.06…

Self-Supervised Visibility Learning for Novel View Synthesis

Paper: arxiv.org/abs/2103.15…
Code: None

NeX: Real-time View Synthesis with Neural Basis Expansion

Homepage: nex-mpi.github.io/
Paper(Oral): arxiv.org/abs/2103.05…

风格迁移(Style Transfer)

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

Paper: arxiv.org/abs/2104.05…
Code: github.com/PaddlePaddl…

布局生成(Layout Generation)

LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

Paper: None
Code: None

Variational Transformer Networks for Layout Generation

Paper: arxiv.org/abs/2104.02…
Code: None

Domain Generalization

Generalizable Person Re-identification with Relevance-aware Mixture of Experts

Paper: arxiv.org/abs/2105.09…
Code: None

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

Paper: arxiv.org/abs/2103.15…
Code: github.com/shachoi/Rob…

Adaptive Methods for Real-World Domain Generalization

Paper: arxiv.org/abs/2103.15…
Code: None

FSDR: Frequency Space Domain Randomization for Domain Generalization

Paper: arxiv.org/abs/2103.02…
Code: None

Domain Adaptation

Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation

Paper: arxiv.org/abs/2104.00…
Code: None

Domain Consensus Clustering for Universal Domain Adaptation

Paper: reler.net/papers/guan…
Code: github.com/Solacex/Dom…

Open-Set

Towards Open World Object Detection

Paper(Oral): arxiv.org/abs/2103.02…
Code: github.com/JosephKJ/OW…

Exemplar-Based Open-Set Panoptic Segmentation Network

Learning Placeholders for Open-Set Recognition

Paper(Oral): arxiv.org/abs/2103.15…
Code: None

Adversarial Attack

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

Paper: arxiv.org/abs/2103.14…
Code: github.com/VISION-SJTU…

"人-物"交互(HOI)检测

HOTR: End-to-End Human-Object Interaction Detection with Transformers

Paper: arxiv.org/abs/2104.13…
Code: None

Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

Paper: arxiv.org/abs/2103.05…
Code: github.com/hitachi-rd-…

Reformulating HOI Detection as Adaptive Set Prediction

Paper: arxiv.org/abs/2103.05…
Code: github.com/yoyomimi/AS…

Detecting Human-Object Interaction via Fabricated Compositional Learning

Paper: arxiv.org/abs/2103.08…
Code: github.com/zhihou7/FCL

End-to-End Human Object Interaction Detection with HOI Transformer

Paper: arxiv.org/abs/2103.04…
Code: github.com/bbepoch/Hoi…

阴影去除(Shadow Removal)

Auto-Exposure Fusion for Single-Image Shadow Removal

Paper: arxiv.org/abs/2103.01…
Code: github.com/tsingqguo/e…

虚拟换衣(Virtual Try-On)

Parser-Free Virtual Try-on via Distilling Appearance Flows

基于外观流蒸馏的无需人体解析的虚拟换装

Paper: arxiv.org/abs/2103.04…
Code: github.com/geyuying/PF…

标签噪声(Label Noise)

A Second-Order Approach to Learning with Instance-Dependent Label Noise

Paper(Oral): arxiv.org/abs/2012.11…
Code: github.com/UCSC-REAL/C…

视频稳像(Video Stabilization)

Real-Time Selfie Video Stabilization

Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/jiy173/self…

数据集(Datasets)

Tracking Pedestrian Heads in Dense Crowd

Homepage: project.inria.fr/crowdscienc…
Paper: openaccess.thecvf.com/content/CVP…
Code1: github.com/Sentient07/…
Code2: github.com/Sentient07/…
Dataset: project.inria.fr/crowdscienc…

Part-aware Panoptic Segmentation

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Homepage: fidler-lab.github.io/efficient-a…
Paper(Oral): arxiv.org/abs/2104.12…
Code: github.com/fidler-lab/…

论文下载链接：

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Learning To Count Everything

Semantic Image Matting

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

Homepage: mepro.bjtu.edu.cn/resource.ht…
Paper: arxiv.org/abs/2104.06…
Code: None

Visual Semantic Role Labeling for Video Understanding

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Homepage: vap.aau.dk/sewer-ml/
Paper: arxiv.org/abs/2103.10…

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Homepage: vap.aau.dk/sewer-ml/
Paper: arxiv.org/abs/2103.10…

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

Paper: arxiv.org/abs/2103.03…
Dataset: None

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

Depth from Camera Motion and Object Detection

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Homepage: rl.uni-freiburg.de/research/mu…
Paper: arxiv.org/abs/2103.01…
Code: rl.uni-freiburg.de/research/mu…

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Paper: arxiv.org/abs/2103.01…
Code: rl.uni-freiburg.de/research/mu…
Dataset: rl.uni-freiburg.de/research/mu…

其他(Others)

Fast and Accurate Model Scaling

Paper: openaccess.thecvf.com/content/CVP…
Code: github.com/facebookres…

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos

Omnimatte: Associating Objects and Their Effects in Video

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Homepage: fidler-lab.github.io/efficient-a…
Paper(Oral): arxiv.org/abs/2104.12…
Code: github.com/fidler-lab/…

Motion Representations for Articulated Animation

Paper: arxiv.org/abs/2104.11…
Code: github.com/snap-resear…

Deep Lucas-Kanade Homography for Multimodal Image Alignment

Paper: arxiv.org/abs/2104.11…
Code: github.com/placeforyim…

Skip-Convolutions for Efficient Video Processing

Paper: arxiv.org/abs/2104.11…
Code: None

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

Homepage: tomasjakab.github.io/KeypointDef…
Paper(Oral): arxiv.org/abs/2104.11…
Code: github.com/tomasjakab/…

Learning To Count Everything

SOLD2: Self-supervised Occlusion-aware Line Description and Detection

Paper(Oral): arxiv.org/abs/2104.03…
Code: github.com/cvg/SOLD2

Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression

LEAP: Learning Articulated Occupancy of People

Paper: arxiv.org/abs/2104.06…
Code: None

Visual Semantic Role Labeling for Video Understanding

UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

Paper: arxiv.org/abs/2104.00…
Code: github.com/SUTDCV/UAV-…

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

Paper(Oral): arxiv.org/abs/2104.00…
Code: None

Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction

Paper: arxiv.org/abs/2104.00…
Code: None

Towards High Fidelity Face Relighting with Realistic Shadows

Paper: arxiv.org/abs/2104.00…
Code: None

BRepNet: A topological message passing system for solid models

Paper(Oral): arxiv.org/abs/2104.00…
Code: None

Visually Informed Binaural Audio Generation without Binaural Audios

Homepage: sheldontsui.github.io/projects/Ps…
Paper: None
GitHub: github.com/SheldonTsui…
Demo: www.youtube.com/watch?v=r-u…

Exploring intermediate representation for monocular vehicle pose estimation

Paper: None
Code: github.com/Nicholasli1…

Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB

Paper(Oral): arxiv.org/abs/2103.14…
Code: None

Invertible Image Signal Processing

Paper: arxiv.org/abs/2103.15…
Code: github.com/yzxing87/In…

Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling

Paper: arxiv.org/abs/2103.14…
Code: None

SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences

Paper: arxiv.org/abs/2103.14…
Code: None

Embedding Transfer with Label Relaxation for Improved Metric Learning

Paper: arxiv.org/abs/2103.14…
Code: None

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

Paper: arxiv.org/abs/2103.15…
Code: github.com/hlei-ziyan/…

Meta-Mining Discriminative Samples for Kinship Verification

Paper: arxiv.org/abs/2103.15…
Code: None

Cloud2Curve: Generation and Vectorization of Parametric Sketches

Paper: arxiv.org/abs/2103.15…
Code: None

TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Paper: arxiv.org/abs/2103.15…
Code: github.com/SUTDCV/SUTD…

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

Homepage: wellyzhang.github.io/project/pra…
Paper: arxiv.org/abs/2103.14…
Code: None

ACRE: Abstract Causal REasoning Beyond Covariation

Homepage: wellyzhang.github.io/project/acr…
Paper: arxiv.org/abs/2103.14…
Code: None

Confluent Vessel Trees with Accurate Bifurcations

Paper: arxiv.org/abs/2103.14…
Code: None

Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling

Paper: arxiv.org/abs/2103.14…
Code: github.com/HuangZhiCha…

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks

Homepage: paschalidoud.github.io/neural_part…
Paper: None
Code: github.com/paschalidou…

Knowledge Evolution in Neural Networks

Paper(Oral): arxiv.org/abs/2103.05…
Code: github.com/ahmdtaha/kn…

Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning

Paper: arxiv.org/abs/2103.02…
Code: github.com/guopengf/FL…

SGP: Self-supervised Geometric Perception

Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning

Paper: arxiv.org/abs/2103.02…
Code: github.com/guopengf/FL…

Diffusion Probabilistic Models for 3D Point Cloud Generation

Paper: arxiv.org/abs/2103.01…
Code: github.com/luost26/dif…

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Paper: arxiv.org/abs/2103.01…
Code: rl.uni-freiburg.de/research/mu…
Dataset: rl.uni-freiburg.de/research/mu…

待添加(TODO)

不确定中没中(Not Sure)

CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models

Paper: none
Code: github.com/transcenden…

Toward Explainable Reflection Removal with Distilling and Model Uncertainty

Paper: none
Code: github.com/ytpeng-aiml…

DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation

Paper: none
Code: github.com/lhaippp/Dee…

Exploring Adversarial Fake Images on Face Manifold

Paper: none
Code: github.com/ldz666666/S…

Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task

Paper: none
Code: github.com/yandamengda…

Temporal Contrastive Graph for Self-supervised Video Representation Learning

Paper: none
Code: github.com/YangLiu9208…

Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching

Paper: none
Code: github.com/ouranonymou…

Fast and Memory-Efficient Compact Bilinear Pooling

Paper: none
Code: github.com/cvpr2021kp2…

Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine

Paper: none
Code: github.com/gapDetectio…

Estimating A Child’s Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation

Paper: none
Code: github.com/interactive…

github.com/ShaoQiangSh…

github.com/gillesflash…

github.com/anonymous-s…

github.com/cvpr2021dcb…

github.com/anonymousau…

github.com/AldrichZeng…

github.com/Anonymous-A…

github.com/ddfss/datad…