【MCU系列】第一篇--OWT-Server环境搭建、调试和VideoMix的原理剖析

7,358 阅读5分钟

简介:

OWT是**「Open WebRTC Toolkit」的简称,为Intel开源的视频会议「SFU+MCU」**系统,借用官方git的README说明:

The media server for OWT provides an efficient video conference and streaming service that is based on WebRTC. It scales a single WebRTC stream out to many endpoints. At the same time, it enables media analytics capabilities for media streams. It features:

  • Distributed, scalable, and reliable SFU + MCU server
  • High performance VP8, VP9, H.264, and HEVC real-time transcoding on Intel® Core™ and Intel® Xeon® processors
  • Wide streaming protocols support including WebRTC, RTSP, RTMP, HLS, MPEG-DASH
  • Efficient mixing of HD video streams to save bandwidth and power on mobile devices
  • Intelligent Quality of Service (QoS) control mechanisms that adapt to different network environments
  • Customer defined media analytics plugins to perform analytics on streams from MCU
  • The usage scenarios for real-time media stream analytics including but not limited to movement/object detection

环境搭建和运行

请参考OWT-Server User GuideOWT Server快速入门忘篱大神讲解OWT Server环境的搭建、调试和分析

常见问题和注意事项

  1. node版本需使用v8.15.0。

  2. 当前的master分支会使用webrtc-m79,同步和编译代码比较耗时,需使用稳定的VPN,且总共代码占用17G左右,需较大的硬盘容量,建议直接使用和调试4.3.x分支,我所使用的commit为:ec770323d387c6e63c609712481d9d2b0beebd52。

  3. 在pack打包时需确认check过程中没有错误。video_agent、audio_agent、webrtc_agent依赖的动态库都需拷贝至各自的lib目录下,特别是如果使能了ffmpeg的fdk-aac,需手动将libfdk-aac.so.1拷贝至对应的lib目录下。在缺乏依赖的动态库时运行看不到明显的错误提示,需查看对应agent的运行日志。

  4. 在打包4.3.x分支过程中,需手动将source/agent/sip/sipIn/build/Release/lib.target/目录下的sipLib.so拷贝至上层目录Release下。

  5. 在虚拟机局域网192下运行init-all.sh启动rabbitmq server中发生错误:

    Job for rabbitmq-server.service failed because the control process exited with error code. See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.
    

    查看日志,错误提示是:

    ERROR: epmd error for host 192: badarg (unknown POSIX error)
    

    解决方法:

    vim /etc/rabbitmq/rabbitmq-env.conf

    在文件里面添加这一行:NODENAME=rabbit@localhost,保存

    (注意:如果rabbitmq-env.conf这个文件没有,打开之后自动创建)

整体框架和代码分析

注:本文先聚焦媒体单元,对于信令和API后续再论。

OWT的程序结构,使用Nodejs调用C++代码,请参考忘篱大神CodeNodejs

OWT的代码分析,请参考Intel owt-server VideoMixer设计忘篱大神CodeVideo

VideoMix的原理剖析

将解码后的源image的YUV数据按照Layout布局的要求resize到目标image的大小,然后贴图拷贝至相应的内存区域输出合成后的图片。如下图所示[1]:左边为目标合成image,右边为原始image需要先进行采样缩放,需将下图中间的源s1缩放成右边的目标大小 

然后贴图,贴图原理为:

采样后图像的Y分量直接memcpy到合成图像的对应区域,U/V分量则需要注意:U/V水平和垂直均1/2采样,隔行拷贝,每次拷贝采样后w/2长度。合成后目标image的U/V数据也是隔行-且存储也是连续的,所以UV拷贝的时候连续拷贝采样后w/2长度然后进入合成图像下一行的U/V数据然后再拷贝,拷贝的高度为采样后h/2高度。

上面的过程可由Google开源的libyuv的I420Scale函数来胜任,代码如下:

void SoftFrameGenerator::layout_regions(SoftFrameGenerator *t, rtc::scoped_refptr<webrtc::I420Buffer> compositeBuffer, const LayoutSolution &regions)
{
        uint32_t composite_width = compositeBuffer->width();
        uint32_t composite_height = compositeBuffer->height();

​
        for (LayoutSolution::const_iterator it = regions.begin(); it != regions.end(); ++it) {
          boost::shared_ptr<webrtc::VideoFrame> inputFrame = t->m_owner->getInputFrame(it->input);
          if (inputFrame == NULL) {
            continue;
          }

          rtc::scoped_refptr<webrtc::VideoFrameBuffer> inputBuffer = inputFrame->video_frame_buffer();

​
          Region region = it->region;
          uint32_t dst_x      = (uint64_t)composite_width * region.area.rect.left.numerator / region.area.rect.left.denominator;
          uint32_t dst_y      = (uint64_t)composite_height * region.area.rect.top.numerator / region.area.rect.top.denominator;
          uint32_t dst_width  = (uint64_t)composite_width * region.area.rect.width.numerator / region.area.rect.width.denominator;
          uint32_t dst_height = (uint64_t)composite_height * region.area.rect.height.numerator / region.area.rect.height.denominator;

​
          if (dst_x + dst_width > composite_width)
          dst_width = composite_width - dst_x;

​
          if (dst_y + dst_height > composite_height)
          dst_height = composite_height - dst_y;

​
          uint32_t cropped_dst_width;
          uint32_t cropped_dst_height;
          uint32_t src_x;
          uint32_t src_y;
          uint32_t src_width;
          uint32_t src_height;
          if (t->m_crop) {
            src_width   = std::min((uint32_t)inputBuffer->width(), dst_width * inputBuffer->height() / dst_height);
            src_height  = std::min((uint32_t)inputBuffer->height(), dst_height * inputBuffer->width() / dst_width);
            src_x       = (inputBuffer->width() - src_width) / 2;
            src_y       = (inputBuffer->height() - src_height) / 2;cropped_dst_width   = dst_width;
            cropped_dst_height  = dst_height;
          } else {
            src_width   = inputBuffer->width();
            src_height  = inputBuffer->height();
            src_x       = 0;
            src_y       = 0;cropped_dst_width   = std::min(dst_width, inputBuffer->width() * dst_height / inputBuffer->height());
            cropped_dst_height  = std::min(dst_height, inputBuffer->height() * dst_width / inputBuffer->width());
          }

​
          dst_x += (dst_width - cropped_dst_width) / 2;
          dst_y += (dst_height - cropped_dst_height) / 2;

​
          src_x               &= ~1;
          src_y               &= ~1;
          src_width           &= ~1;
          src_height          &= ~1;
          dst_x               &= ~1;
          dst_y               &= ~1;
          cropped_dst_width   &= ~1;
          cropped_dst_height  &= ~1;

​
          int ret = libyuv::I420Scale(
          inputBuffer->DataY() + src_y * inputBuffer->StrideY() + src_x, inputBuffer->StrideY(),
          inputBuffer->DataU() + (src_y * inputBuffer->StrideU() + src_x) / 2, inputBuffer->StrideU(),
          inputBuffer->DataV() + (src_y * inputBuffer->StrideV() + src_x) / 2, inputBuffer->StrideV(),
          src_width, src_height,
          compositeBuffer->MutableDataY() + dst_y * compositeBuffer->StrideY() + dst_x, compositeBuffer->StrideY(),
          compositeBuffer->MutableDataU() + (dst_y * compositeBuffer->StrideU() + dst_x) / 2, compositeBuffer->StrideU(),
          compositeBuffer->MutableDataV() + (dst_y * compositeBuffer->StrideV() + dst_x) / 2, compositeBuffer->StrideV(),
          cropped_dst_width, cropped_dst_height,
          libyuv::kFilterBox);
          if (ret != 0)
            ELOG_ERROR("I420Scale failed, ret %d", ret);
          }
}

可以看到上面代码中,多次出现**「stride」,那「什么是stride」**[2]呢?

当视频图像存储在内存中时,内存缓冲区可能在每行像素后面有额外的填充字节,填充字节会影响图像在内存中的存储方式,但不会影响图像的显示方式。stride是从内存中一行像素开头到内存中的下一行像素间隔的字节数,也可以叫做pitch,如果存在填充字节,stride就会大于图像宽度,如下图所示:

两个包含相同尺寸图像的buffer可能有不同stride,所以在处理视频图像时必须考虑stride。

另外,图像在内存中的排列方式有两种。**「top-down」图像的首行第一个像素先储存在内存中。「bottom-up」**图像的最后一行像素先存储在内存中。下图显示了两者的区别:

bottom-up图像具有负的stride,因为stride被定义为从第一行像素到第二行像素需要向后移动的距离。YUV图像通常是top-down,Direct3D surface则必须是top-down。而内存中的RGB图像则通常是bottom-up。做视频转换时尤其需要处理stride不匹配的buffer。

综上所述,对于视频会议MCU来说,混流服务器的Layout布局坐标一般有如下定义:

对于混流服务的每路输入流来说,需要设置相对左上角合理的坐标值**(left,top)「和」(right,bottom)**,然后就可以通过坐标值决定偏移量来进行上述像素拷贝操作。

有了上述的坐标和Layout设计作为基础,我们可以为用户提前设定好常用的Layout布局模版,以下布局和图片均摘自于Freeswitch

1up_top_left+9

2up_bottom+8

2up_middle+8

2up_top+8

3up+4

3up+9

3x3

4x4

5x5

6x6

8x8

overlaps

画中画

自此,视频混流合成的基本原理介绍完毕。

欢迎大家拍砖留言,分享你感兴趣的话题!欢迎大家关注我个人公众号!

Reference

[1]

YUV图像合成原理:

https://blog.csdn.net/zwz1984/article/details/50403150#comments

[2]

Image Stride:

https://docs.microsoft.com/en-us/windows/win32/medfound/image-stride

本文使用 mdnice 排版