Day3 共享内存与零拷贝机制

2 阅读4分钟

学习目标

  • 理解共享内存传输架构设计
  • 掌握零拷贝技术的实现原理
  • 能够配置和调试 SHM 传输
  • 掌握工业级低延迟优化技巧

一、架构设计

1.1 传统拷贝 vs 零拷贝

graph TB
    subgraph "传统传输(4次拷贝)"
        A1[应用缓冲区] -->|copy| K1[内核缓冲区]
        K1 -->|copy| N1[网卡/网络]
        N1 -->|copy| K2[对端内核缓冲区]
        K2 -->|copy| A2[对端应用缓冲区]
    end
    
    subgraph "共享内存零拷贝(0次拷贝)"
        B1[Publisher应用] -->|指针传递| SHM[(共享内存段<br/>/dev/shm/)]
        SHM -->|指针传递| B2[Subscriber应用]
        B1 -.->|mmap| SHM
        B2 -.->|mmap| SHM
    end
    
    style SHM fill:#c8e6c9

1.2 Fast-DDS SHM 架构

flowchart TB
    subgraph "Publisher进程"
        PUB[PublisherApp]
        DW[DataWriter]
        SHM_W[SharedMemTransport<br/>写入端]
        SEG_W[(共享内存段<br/>Segment)]
    end
    
    subgraph "操作系统内核"
        MMAP[mmap/munmap<br/>内存映射]
        EVENT[eventfd<br/>事件通知]
        SEM[信号量/互斥锁<br/>同步机制]
    end
    
    subgraph "Subscriber进程"
        SHM_R[SharedMemTransport<br/>读取端]
        SEG_R[(共享内存段<br/>Segment)]
        DR[DataReader]
        SUB[SubscriberApp]
    end
    
    PUB --> DW --> SHM_W --> SEG_W
    SEG_W -.->|映射| MMAP
    SHM_W -.->|通知| EVENT
    SHM_R -.->|监听| EVENT
    MMAP -.->|映射| SEG_R
    SEG_R --> SHM_R --> DR --> SUB
    
    style SEG_W fill:#e3f2fd
    style SEG_R fill:#e3f2fd
    style EVENT fill:#fff3e0

二、核心机制详解

2.1 共享内存段管理

// 简化版 SharedMemSegment 实现
class SharedMemSegment {
    int fd_;                    // /dev/shm/ 文件描述符
    void* base_address_;        // mmap 基地址
    size_t segment_size_;       // 段大小(默认 1MB/8MB)
    
    // 内存布局
    struct SegmentHeader {
        uint64_t magic_number;  // 魔术字,验证有效性
        uint32_t version;       // 版本号
        std::atomic<uint32_t> ref_count;  // 引用计数
    };
    
public:
    // 创建/打开共享内存段
    bool open(const std::string& name, size_t size) {
        fd_ = shm_open(name.c_str(), O_CREAT | O_RDWR, 0666);
        ftruncate(fd_, size);
        base_address_ = mmap(nullptr, size, 
                            PROT_READ | PROT_WRITE, 
                            MAP_SHARED, fd_, 0);
        return base_address_ != MAP_FAILED;
    }
    
    // 零拷贝写入:直接返回映射地址
    void* allocate_buffer(size_t size) {
        // 在共享内存段内部分配,返回指针
        return offset_to_address(allocate_offset(size));
    }
};

2.2 零拷贝数据流

sequenceDiagram
    participant P as Publisher
    participant SHM as SharedMemManager
    participant SEG as /dev/shm/fastdds_xxx
    participant EVT as eventfd
    participant S as Subscriber
    
    Note over P,S: 初始化阶段
    P->>SHM: create_segment("fastdds_pub")
    SHM->>SEG: shm_open + mmap
    S->>SHM: open_segment("fastdds_pub")
    SHM->>SEG: shm_open + mmap (只读或读写)
    
    Note over P,S: 数据传输阶段(零拷贝)
    P->>SHM: get_buffer(size)
    SHM-->>P: 返回共享内存指针
    P->>P: 直接写入共享内存 (序列化)
    P->>SHM: notify(data_offset, size)
    SHM->>EVT: write(eventfd, 1)
    
    EVT->>S: read(eventfd) 唤醒
    S->>SHM: get_data(offset)
    SHM-->>S: 返回指针(同一物理地址)
    S->>S: 直接读取(反序列化)
    S->>SHM: release_buffer()
    
    Note over P,S: 无内核拷贝,无上下文切换

2.3 与 UDP/TCP 对比

特性UDPTCPSHM (零拷贝)
拷贝次数2-4次2-4次0次
延迟10-100μs1-10ms0.5-2μs
吞吐量极高
跨节点支持支持不支持(同主机)
可靠性不可靠可靠可靠(基于共享内存)
CPU占用中(内核协议栈)高(内核协议栈)低(用户态)

三、代码导读

3.1 关键文件

文件职责
SharedMemTransport.cppSharedMemTransport传输层实现
SharedMemManager.cppSharedMemManager共享内存管理
SharedMemSegment.cppSharedMemSegment内存段操作
SharedMemChannel.cppSharedMemChannel进程间通信通道

3.2 配置启用 SHM

<!-- 方式1:XML配置 -->
<transport_descriptors>
    <transport_descriptor>
        <transport_id>shm_transport</transport_id>
        <type>SHM</type>
        <segment_size>8388608</segment_size>  <!-- 8MB -->
        <port_queue_capacity>1024</port_queue_capacity>
        <healthy_check_timeout_ms>1000</healthy_check_timeout_ms>
    </transport_descriptor>
</transport_descriptors>

<participant profile_name="shm_participant">
    <rtps>
        <userTransports>
            <transport_id>shm_transport</transport_id>
        </userTransports>
        <useBuiltinTransports>false</useBuiltinTransports>
    </rtps>
</participant>
// 方式2:代码配置
DomainParticipantQos pqos;

// 创建共享内存传输
auto shm_transport = std::make_shared<SharedMemTransportDescriptor>();
shm_transport->segment_size(8 * 1024 * 1024);  // 8MB
shm_transport->port_queue_capacity(1024);

// 添加到参与者
pqos.transport().user_transports.push_back(shm_transport);
pqos.transport().use_builtin_transports = false;

auto participant = DomainParticipantFactory::get_instance()->
    create_participant(0, pqos);

3.3 零拷贝关键代码路径

// SharedMemTransport::send (简化)
bool SharedMemTransport::send(
    const fastrtps::rtps::Locator_t& locator,
    const fastrtps::rtps::SerializedPayload_t& payload,
    const std::vector<GuidPrefix_t>& remote_participants)
{
    // 1. 在共享内存段内分配缓冲区(零拷贝关键)
    SharedMemBuffer* buffer = segment_->alloc_buffer(payload.length);
    
    // 2. 直接拷贝到共享内存(只有这一次拷贝,跨进程零拷贝)
    memcpy(buffer->data(), payload.data, payload.length);
    
    // 3. 通知对端(eventfd,无数据拷贝)
    for (auto& listener : port_listeners_) {
        listener->notify(buffer->offset(), payload.length);
    }
    
    return true;
}

// SharedMemTransport::receive (简化)
bool SharedMemTransport::receive(
    fastrtps::rtps::Locator_t& locator,
    fastrtps::rtps::SerializedPayload_t& payload,
    std::chrono::milliseconds timeout)
{
    // 1. 等待通知(eventfd/epoll)
    if (!channel_->wait_notification(timeout)) {
        return false;
    }
    
    // 2. 获取共享内存中的数据偏移量(零拷贝)
    SharedMemBuffer::Offset offset;
    channel_->pop_notification(offset);
    
    // 3. 直接映射到应用地址空间(零拷贝,无memcpy)
    SharedMemBuffer* buffer = segment_->get_buffer(offset);
    
    // 4. 设置 payload 指针指向共享内存(关键!)
    payload.data = buffer->data();  // 直接指针赋值,无拷贝
    payload.length = buffer->size();
    
    return true;
}

四、VSCode 调试 SHM

4.1 观察共享内存段

# 在 SharedMemSegment::open 打断点
break SharedMemSegment::open

# 运行后观察
print name                # 共享内存段名称,如 "/fastdds_shm_xxx"
print segment_size        # 段大小
print fd                  # 文件描述符

# 查看系统实际创建的共享内存
shell ls -lh /dev/shm/ | grep fastdds

4.3 性能对比调试

# 分别测试 UDP 和 SHM,观察延迟
# 在 DataWriterImpl::write 入口和出口打时间戳

set $start = (unsigned long long)0

break DataWriterImpl::write
commands
    set $start = *(unsigned long long*)(&std::chrono::system_clock::now())
    continue
end

break DataWriterImpl::write_return  # 如果有返回断点,或用 finish
commands
    printf "Latency: %llu ns\n", 
        *(unsigned long long*)(&std::chrono::system_clock::now()) - $start
    continue
end

五、工业级应用与优化

5.1 适用场景

graph TB
    A[选择传输方式]
    
    A --> B{同主机?}
    B -->|是| C{延迟要求?}
    B -->|否| D[UDP/TCP]
    
    C -->|< 10μs| E[SHM 零拷贝]
    C -->|> 100μs| F[UDP 即可]
    
    E --> G{数据大小?}
    G -->|< 64KB| H[纯SHM]
    G -->|> 64KB| I[SHM分段/批量]
    
    style E fill:#c8e6c9
    style H fill:#c8e6c9

5.2 优化技巧

优化项配置效果
大页内存segment_size 对齐 2MB减少 TLB miss
CPU亲和性绑定到同一 NUMA 节点避免跨节点访问
无锁队列替换 std::mutex降低竞争延迟
预分配增大 port_queue_capacity避免运行时分配
批量传输累积多个样本一次发送摊平通知开销

5.3 故障排查

# 1. 检查共享内存段是否存在
ls -lh /dev/shm/ | grep fastdds

# 2. 检查权限
ipcs -m  # 查看 System V 共享内存(如果用了)

# 3. 检查进程映射
cat /proc/<pid>/maps | grep /dev/shm

# 4. 实时监控共享内存使用
watch -n 1 'ls -lh /dev/shm/ && df -h /dev/shm'

# 5. 清理残留共享内存(如果进程崩溃)
rm /dev/shm/fastdds_*

六、Day 3 自检清单

  • 能解释零拷贝与传统传输的区别(0次 vs 4次拷贝)
  • 能在代码中找到 mmapeventfd 的使用位置
  • 能配置 XML 启用 SHM 传输
  • 能在 GDB 中观察到 Publisher 和 Subscriber 的指针指向同一物理地址
  • 能计算 SHM 相比 UDP 的延迟提升(通常 10-50倍)

七、代码流程

sequenceDiagram
    participant App as Fast-DDS Application
    participant SMTrans as SharedMemTransport
    participant SMManager as SharedMemManager
    participant SMPort as SharedMemManager::Port
    participant SMGlobal as SharedMemGlobal::Port
    participant SMChan as SharedMemChannelResource
    participant Receiver as TransportReceiverInterface

    App ->> SMTrans: init()
    SMTrans ->> SMManager: SharedMemManager::create("fastdds")
    SMTrans ->> SMManager: create_segment(segment_size,max_alloc)
    SMTrans ->> SMManager: open_port(port, queue_size, timeout, Mode::Write)

    App ->> SMTrans: OpenInputChannel(locator, receiver)
    SMTrans ->> SMManager: open_port(port, ..., Mode::ReadShared/ReadExclusive)
    SMManager ->> SMPort: create_listener()
    SMTrans ->> SMChan: new SharedMemChannelResource(listener, ...)

    App ->> SMTrans: send(buffers, locators)
    SMTrans ->> SMTrans: copy_to_shared_buffer(buffers)
    SMTrans ->> SMManager: find_port(locator.port)
    SMManager ->> SMPort: try_push(shared_buffer)
    SMPort ->> SMGlobal: push to ring buffer / notify

    SMChan ->> SMManager::Listener: pop() (blocking)
    SMListener ->> SMGlobal: wait_pop()
    SMGlobal ->> SMListener: BufferDescriptor
    SMListener ->> SMManager::SharedMemBuffer: construct
    SMChan ->> Receiver: OnDataReceived(data,size,input,remote)
    SMChan ->> SMListener: stop_processing_buffer()