单摄像头做3D感知,不用深度相机,单目出3D框,颠覆多目才能3D,输出目标3D信息。

1 阅读10分钟

📷 单目3D感知系统 - 颠覆传统多目视觉

📋 README.md

单目3D感知系统 - Monocular 3D Object Detection

🎯 项目概述

本项目实现了仅使用单个摄像头进行3D目标检测的技术突破,无需深度相机或多目视觉系统,即可输出目标的3D边界框、距离、速度等信息。

🔥 行业痛点

传统3D感知方案存在三大痛点:

  1. 成本高昂:多目相机+深度相机成本>5000元
  2. 标定复杂:多传感器时空同步标定耗时数天
  3. 系统冗余:多目系统计算量大,实时性差

🚀 核心创新

  • ✅ 单目2D检测 → 几何约束求解 → 先验尺寸估计 → 3D框重建
  • ✅ 无需深度信息,纯视觉几何推算
  • ✅ 推理速度可达30FPS,CPU即可运行

📦 安装依赖

bash

pip install torch torchvision opencv-python numpy scipy matplotlib

🎮 运行方式

bash

图片测试

python demo_image.py --image_path test.jpg

视频测试

python demo_video.py --video_path test.mp4

实时摄像头

python demo_camera.py

📁 项目结构

monocular_3d_perception/

├── core/

│ ├── detector.py # 2D目标检测器

│ ├── geometry.py # 几何约束求解器

│ ├── priors.py # 物体先验尺寸库

│ └── reconstructor.py # 3D重建器

├── models/

│ ├── init.py

│ └── pretrained/ # 预训练模型

├── utils/

│ ├── camera_calibration.py # 相机标定工具

│ ├── visualization.py # 可视化工具

│ └── metrics.py # 评估指标

├── config.py # 配置文件

├── demo_image.py # 图片演示

├── demo_video.py # 视频演示

├── demo_camera.py # 实时演示

└── requirements.txt # 依赖清单

⚙️ 核心算法原理

  1. 2D检测:YOLOv8检测2D边界框
  2. 关键点定位:检测物体底部中心点
  3. 几何约束:利用地面平面假设求解3D位置
  4. 尺寸先验:基于类别的车辆尺寸数据库
  5. 姿态估计:求解目标朝向角度

📖 使用说明

详见下方"使用说明"章节

🎓 适用场景

  • 自动驾驶低成本方案
  • 无人机避障系统
  • 安防监控3D分析
  • 机器人导航感知

📝 核心代码实现

  1. config.py - 配置文件

""" 单目3D感知系统配置文件 包含模型参数、几何参数和先验知识 """

from dataclasses import dataclass, field from typing import Dict, List, Tuple, Optional import json

@dataclass class CameraConfig: """相机内参配置""" # 图像分辨率 image_width: int = 1280 image_height: int = 720

# 相机内参矩阵 K
fx: float = 1000.0  # 焦距x
fy: float = 1000.0  # 焦距y
cx: float = 640.0   # 主点x
cy: float = 360.0   # 主点y

# 畸变系数
k1: float = 0.0    # 径向畸变1
k2: float = 0.0    # 径向畸变2
p1: float = 0.0    # 切向畸变1
p2: float = 0.0    # 切向畸变2

def get_intrinsic_matrix(self) -> np.ndarray:
    """获取相机内参矩阵"""
    return np.array([
        [self.fx, 0, self.cx],
        [0, self.fy, self.cy],
        [0, 0, 1]
    ])

def get_distortion_coeffs(self) -> np.ndarray:
    """获取畸变系数"""
    return np.array([self.k1, self.k2, self.p1, self.p2])

@dataclass class GeometryConfig: """几何约束配置""" # 地面平面方程: ax + by + cz + d = 0 # 假设地面是水平的: z = 0 ground_plane: Tuple[float, float, float, float] = (0, 0, 1, 0)

# 相机安装高度 (米)
camera_height: float = 1.5

# 相机俯仰角 (弧度,向下为正)
camera_pitch: float = 0.1

# 相机水平偏航角 (弧度)
camera_yaw: float = 0.0

# 相机到地面的垂直距离
camera_to_ground: float = 1.5

@dataclass class PriorConfig: """物体先验尺寸配置 (单位: 米)""" # 车辆尺寸: [长度, 宽度, 高度] vehicle_dimensions: Dict[str, List[float]] = field(default_factory=lambda: { 'car': [4.5, 1.8, 1.5], 'truck': [8.0, 2.5, 3.0], 'bus': [12.0, 2.5, 3.2], 'motorcycle': [2.0, 0.8, 1.2], 'bicycle': [1.8, 0.6, 1.1], 'pedestrian': [0.5, 0.5, 1.7], 'default_car': [4.2, 1.8, 1.5] })

# 尺寸不确定性 (标准差)
dimension_std: Dict[str, float] = field(default_factory=lambda: {
    'car': 0.3,
    'truck': 0.5,
    'bus': 0.8,
    'motorcycle': 0.2,
    'bicycle': 0.15,
    'pedestrian': 0.1
})

@dataclass class DetectionConfig: """检测配置""" # 置信度阈值 confidence_threshold: float = 0.5

# IOU阈值
iou_threshold: float = 0.45

# 输入图像尺寸
input_size: Tuple[int, int] = (640, 640)

# 使用的2D检测模型
detector_model: str = "yolov8n.pt"

# 是否使用GPU
use_gpu: bool = True

@dataclass class ReconstructionConfig: """重建配置""" # 深度估计的不确定性因子 depth_uncertainty_factor: float = 0.1

# 角度估计的不确定性 (弧度)
angle_uncertainty: float = 0.2

# 最小/最大检测距离
min_detection_distance: float = 1.0
max_detection_distance: float = 100.0

# 3D框顶点数
box_corner_count: int = 8

全局配置实例

CAMERA_CONFIG = CameraConfig() GEOMETRY_CONFIG = GeometryConfig() PRIOR_CONFIG = PriorConfig() DETECTION_CONFIG = DetectionConfig() RECONSTRUCTION_CONFIG = ReconstructionConfig()

def load_config_from_file(config_path: str) -> 'Monocular3DConfig': """从JSON文件加载配置""" with open(config_path, 'r') as f: config_dict = json.load(f) return Monocular3DConfig(**config_dict)

class Monocular3DConfig: """完整的单目3D感知配置""" def init( self, camera: Dict = None, geometry: Dict = None, priors: Dict = None, detection: Dict = None, reconstruction: Dict = None ): self.camera = CameraConfig((camera or {})) self.geometry = GeometryConfig((geometry or {})) self.priors = PriorConfig((priors or {})) self.detection = DetectionConfig((detection or {})) self.reconstruction = ReconstructionConfig(**(reconstruction or {}))

默认配置导出

DEFAULT_CONFIG = Monocular3DConfig()

  1. core/detector.py - 2D目标检测器

""" 2D目标检测器 基于YOLOv8实现高效的目标检测 输出2D边界框、类别和置信度 """

import cv2 import numpy as np import torch from typing import List, Dict, Tuple, Optional from dataclasses import dataclass import time

@dataclass class BoundingBox2D: """2D边界框""" x1: float # 左上角x y1: float # 左上角y x2: float # 右下角x y2: float # 右下角y confidence: float # 置信度 class_id: int # 类别ID class_name: str # 类别名称

@dataclass class DetectionResult: """检测结果""" boxes_2d: List[BoundingBox2D] inference_time: float raw_output: np.ndarray

class YOLOv8Detector: """ YOLOv8 2D检测器

核心功能:
1. 加载预训练模型
2. 执行目标检测
3. 过滤低置信度结果
4. 非极大值抑制(NMS)
"""

# COCO数据集类别映射(车辆和行人相关)
COCO_CLASSES = {
    0: 'person',
    1: 'bicycle',
    2: 'car',
    3: 'motorcycle',
    5: 'bus',
    7: 'truck'
}

def __init__(
    self,
    model_path: str = "yolov8n.pt",
    confidence_threshold: float = 0.5,
    iou_threshold: float = 0.45,
    device: str = None
):
    """
    初始化检测器
    
    Args:
        model_path: 模型权重路径
        confidence_threshold: 置信度阈值
        iou_threshold: NMS的IOU阈值
        device: 计算设备 ('cuda' 或 'cpu')
    """
    self.confidence_threshold = confidence_threshold
    self.iou_threshold = iou_threshold
    
    # 自动选择设备
    if device is None:
        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
    else:
        self.device = device
        
    print(f"🚀 加载YOLOv8模型: {model_path}")
    print(f"📱 使用设备: {self.device}")
    
    # 加载模型
    try:
        from ultralytics import YOLO
        self.model = YOLO(model_path)
        self.model.to(self.device)
    except ImportError:
        print("⚠️ 未安装ultralytics,使用OpenCV DNN作为后备方案")
        self.model = None
        self._init_opencv_backup()
        
def _init_opencv_backup(self):
    """初始化OpenCV DNN后备方案"""
    # 使用YOLOv3-tiny作为后备
    self.net = cv2.dnn.readNetFromDarknet(
        "yolov3-tiny.cfg",
        "yolov3-tiny.weights"
    )
    if self.device == 'cuda':
        self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
        self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
    else:
        self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
        self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
        
    # 获取输出层名称
    layer_names = self.net.getLayerNames()
    self.output_layers = [layer_names[i - 1] for i in self.net.getUnconnectedOutLayers().flatten()]
    
def detect(self, image: np.ndarray) -> DetectionResult:
    """
    执行目标检测
    
    Args:
        image: 输入图像 (BGR格式)
        
    Returns:
        检测结果
    """
    start_time = time.time()
    
    if self.model is not None:
        results = self._detect_with_yolov8(image)
    else:
        results = self._detect_with_opencv(image)
        
    inference_time = time.time() - start_time
    
    return DetectionResult(
        boxes_2d=results,
        inference_time=inference_time,
        raw_output=None
    )

def _detect_with_yolov8(self, image: np.ndarray) -> List[BoundingBox2D]:
    """使用YOLOv8进行检测"""
    results = self.model(
        image,
        conf=self.confidence_threshold,
        iou=self.iou_threshold,
        verbose=False
    )
    
    boxes = []
    for result in results:
        boxes_result = result.boxes
        if boxes_result is not None:
            for box in boxes_result:
                # 获取边界框坐标
                x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
                
                # 获取置信度和类别
                conf = float(box.conf[0].cpu().numpy())
                cls_id = int(box.cls[0].cpu().numpy())
                
                # 获取类别名称
                class_name = self.COCO_CLASSES.get(cls_id, f'class_{cls_id}')
                
                boxes.append(BoundingBox2D(
                    x1=x1, y1=y1, x2=x2, y2=y2,
                    confidence=conf,
                    class_id=cls_id,
                    class_name=class_name
                ))
                
    return boxes

def _detect_with_opencv(self, image: np.ndarray) -> List[BoundingBox2D]:
    """使用OpenCV DNN进行检测(后备方案)"""
    height, width = image.shape[:2]
    
    # 预处理
    blob = cv2.dnn.blobFromImage(
        image, 1/255.0, (416, 416),
        swapRB=True, crop=False
    )
    self.net.setInput(blob)
    
    # 前向推理
    outputs = self.net.forward(self.output_layers)
    
    # 解析输出
    boxes = []
    confidences = []
    class_ids = []
    
    for output in outputs:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            
            if confidence > self.confidence_threshold:
                # 获取边界框
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)
                
                x1 = int(center_x - w / 2)
                y1 = int(center_y - h / 2)
                x2 = x1 + w
                y2 = y1 + h
                
                boxes.append([x1, y1, x2, y2])
                confidences.append(float(confidence))
                class_ids.append(class_id)
    
    # NMS
    indices = cv2.dnn.NMSBoxes(
        boxes, confidences, 
        self.confidence_threshold, 
        self.iou_threshold
    )
    
    results = []
    if len(indices) > 0:
        for i in indices.flatten():
            class_name = self.COCO_CLASSES.get(class_ids[i], f'class_{class_ids[i]}')
            results.append(BoundingBox2D(
                x1=float(boxes[i][0]),
                y1=float(boxes[i][1]),
                x2=float(boxes[i][2]),
                y2=float(boxes[i][3]),
                confidence=confidences[i],
                class_id=class_ids[i],
                class_name=class_name
            ))
            
    return results

def visualize_detections(
    self,
    image: np.ndarray,
    detections: DetectionResult
) -> np.ndarray:
    """
    可视化检测结果
    
    Args:
        image: 原始图像
        detections: 检测结果
        
    Returns:
        标注后的图像
    """
    vis_image = image.copy()
    
    for box in detections.boxes_2d:
        # 绘制边界框
        color = self._get_class_color(box.class_name)
        cv2.rectangle(
            vis_image,
            (int(box.x1), int(box.y1)),
            (int(box.x2), int(box.y2)),
            color, 2
        )
        
        # 绘制标签
        label = f"{box.class_name}: {box.confidence:.2f}"
        label_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2)[0]
        
        cv2.rectangle(
            vis_image,
            (int(box.x1), int(box.y1) - label_size[1] - 10),
            (int(box.x1) + label_size[0], int(box.y1)),
            color, -1
        )
        
        cv2.putText(
            vis_image, label,
            (int(box.x1), int(box.y1) - 5),
            cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 2
        )
        
    # 添加FPS信息
    fps = 1.0 / detections.inference_time
    cv2.putText(
        vis_image, f"FPS: {fps:.1f}",
        (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2
    )
    
    return vis_image

def _get_class_color(self, class_name: str) -> Tuple[int, int, int]:
    """获取类别对应的颜色"""
    colors = {
        'person': (255, 0, 0),      # 蓝色
        'bicycle': (0, 255, 0),     # 绿色
        'car': (0, 0, 255),         # 红色
        'motorcycle': (255, 255, 0), # 青色
        'bus': (255, 0, 255),       # 紫色
        'truck': (0, 255, 255),     # 黄色
    }
    return colors.get(class_name, (128, 128, 128))

class KeypointDetector: """ 关键点检测器

检测目标底部中心点,用于3D重建
关键点类型:
- bottom_center: 底部中心点(最重要)
- bottom_left: 底部左角点
- bottom_right: 底部右角点
"""

def __init__(self):
    """初始化关键点检测器"""
    # 关键点检测模型(简化的基于轮廓的方法)
    pass

def detect_bottom_center(
    self,
    image: np.ndarray,
    bbox_2d: BoundingBox2D
) -> Tuple[float, float]:
    """
    检测目标底部中心点
    
    方法:取2D边界框底部边的中点
    
    Args:
        image: 输入图像
        bbox_2d: 2D边界框
        
    Returns:
        (x, y): 底部中心点像素坐标
    """
    # 简单方法:取底部边中点
    center_x = (bbox_2d.x1 + bbox_2d.x2) / 2
    bottom_y = bbox_2d.y2
    
    return center_x, bottom_y

def detect_bottom_corners(
    self,
    image: np.ndarray,
    bbox_2d: BoundingBox2D
) -> Tuple[Tuple[float, float], Tuple[float, float]]:
    """
    检测底部两个角点
    
    Args:
        image: 输入图像
        bbox_2d: 2D边界框
        
    Returns:
        ((left_x, left_y), (right_x, right_y)): 左下角和右下角坐标
    """
    center_x = (bbox_2d.x1 + bbox_2d.x2) / 2
    bottom_y = bbox_2d.y2
    
    left_corner = (bbox_2d.x1, bottom_y)
    right_corner = (bbox_2d.x2, bottom_y)
    
    return left_corner, right_corner

def refine_bottom_point(
    self,
    image: np.ndarray,
    bbox_2d: BoundingBox2D,
    method: str = 'gradient'
) -> Tuple[float, float]:
    """
    精化底部中心点位置
    
    使用梯度方法找到真正的底部接触点
    
    Args:
        image: 输入图像
        bbox_2d: 2D边界框
        method: 精化方法
        
    Returns:
        (x, y): 精化后的底部中心点
    """
    # 提取ROI
    x1, y1, x2, y2 = int(bbox_2d.x1), int(bbox_2d.y1), int(bbox_2d.x2), int(bbox_2d.y2)
    roi = image[y1:y2, x1:x2]
    
    if roi.size == 0:
        return self.detect_bottom_center(image, bbox_2d)
    
    # 转换为灰度图
    if len(roi.shape) == 3:
        gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
    else:
        gray = roi
    
    # 使用边缘检测找底部
    edges = cv2.Canny(gray, 50, 150)
    
    # 找底部边缘
    bottom_row = edges[-1, :]
    non_zero = np.where(bottom_row > 0)[0]
    
    if len(non_zero) > 0:
        # 取底部边缘的中点
        local_x = (non_zero[0] + non_zero[-1]) / 2
        global_x = x1 + local_x
        global_y = y2
        return global_x, global_y
    
    # 回退到简单方法
    return self.detect_bottom_center(image, bbox_2d)

测试代码

if name == "main": print("=" * 50) print("2D检测器测试") print("=" * 50)

# 创建检测器
detector = YOLOv8Detector(
    model_path="yolov8n.pt",
    confidence_threshold=0.5
)

# 读取测试图像
test_image_path = "test_image.jpg"
image = cv2.imread(test_image_path)

if image is None:
    print(f"❌ 无法读取图像: {test_image_path}")
    print("📝 创建模拟测试图像...")
    
    # 创建模拟图像
    image = np.random.randint(0, 255, (720, 1280, 3), dtype=np.uint8)
    
    # 添加一些模拟车辆
    cv2.rectangle(image, (200, 400), (400, 500), (0, 0, 255), -1)
    cv2.rectangle(image, (800, 380), (1100, 520), (0, 255, 0), -1)
    cv2.rectangle(image, (500, 350), (700, 480), (255, 0, 0), -1)

print(f"📷 图像尺寸: {image.shape}")

# 执行检测
results = detector.detect(image)

print(f"\n🎯 检测结果:")
print(f"  - 检测耗时: {results.inference_time*1000:.2f}ms")
print(f"  - 检测到 {len(results.boxes_2d)} 个目标")

for i, box in enumerate(results.boxes_2d):
    print(f"  [{i+1}] {box.class_name}: "
          f"置信度={box.confidence:.2f}, "
          f"位置=[{box.x1:.0f},{box.y1:.0f},{box.x2:.0f},{box.y2:.0f}]")

# 可视化
vis_image = detector.visualize_detections(image, results)
cv2.imshow("Detection Results", vis_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

print("\n✅ 测试完成!")

3. core/geometry.py - 几何约束求解器

""" 几何约束求解器 利用相机几何模型和地面平面假设求解3D位置 这是单目3D感知的核心数学模块 """

import numpy as np from dataclasses import dataclass from typing import Tuple, Optional, List import math

@dataclass class CameraIntrinsics: """相机内参""" fx: float # 焦距x fy: float # 焦距y cx: float # 主点x cy: float # 主点y

@classmethod
def from_matrix(cls, K: np.ndarray) -> 'CameraIntrinsics':
    """从内参矩阵创建实例"""
    return cls(
        fx=K[0, 0],
        fy=K[1, 1],
        cx=K[0, 2],
        cy=K[1, 2]
    )

@dataclass class CameraExtrinsics: """相机外参(相对于地面)""" height: float # 相机离地高度 pitch: float # 俯仰角(向下为正) yaw: float # 偏航角 roll: float = 0.0 # 翻滚角

@dataclass class Point3D: """3D点""" x: float y: float z: float

def to_array(self) -> np.ndarray:
    return np.array([self.x, self.y, self.z])

@dataclass class BoundingBox3D: """3D边界框""" center: Point3D # 3D中心点 dimensions: Tuple[float, float, float] # 长宽高 orientation: float # 朝向角(绕Y轴) confidence: float # 置信度 class_name: str # 类别名称

class GeometricSolver: """ 几何约束求解器

核心原理:
==========

1. 相机投影模型:
   ┌───────┐      ┌───────┐
   │ 世界点 │ ──→ │ 相机帧 │ ──→ │ 像素点 │
   │  P_w   │      │  P_c   │      │  p_img │
   └───────┘      └───────┘      └───────┘
   
   投影公式:u = fx * X/Z + cx
             v = fy * Y/Z + cy

2. 地面平面假设:
   假设所有目标都在地面上,Z=0
   
3. 单目深度估计:
   利用已知物体尺寸和2D框尺寸的比例关系
   
4. 关键公式推导:
   设物体真实高度为H,2D框高度为h
   设物体距离为D,相机焦距为f
   
   几何关系:H/D = h/f
   因此:D = H * f / h
"""

def __init__(
    self,
    intrinsics: CameraIntrinsics,
    extrinsics: CameraExtrinsics,
    image_size: Tuple[int, int]
):
    """
    初始化几何求解器
    
    Args:
        intrinsics: 相机内参
        extrinsics: 相机外参
        image_size: 图像尺寸 (宽, 高)
    """
    self.K = self._build_intrinsic_matrix(intrinsics)
    self.intrinsics = intrinsics
    self.extrinsics = extrinsics
    self.image_width, self.image_height = image_size
    
    # 构建相机到世界的变换矩阵
    self.T_cam_to_world = self._build_extrinsic_matrix(extrinsics)

利用AI解决实际问题,如果你觉得这个工具好用,欢迎关注长安牧笛!