【MCU系列】第一篇--OWT-Server环境搭建、调试和VideoMix的原理剖析简介：OWT是「OpenWebRT

简介：

OWT是**「Open WebRTC Toolkit」的简称，为Intel开源的视频会议「SFU+MCU」**系统，借用官方git的README说明：

❝

The media server for OWT provides an efficient video conference and streaming service that is based on WebRTC. It scales a single WebRTC stream out to many endpoints. At the same time, it enables media analytics capabilities for media streams. It features:

Distributed, scalable, and reliable SFU + MCU server

High performance VP8, VP9, H.264, and HEVC real-time transcoding on Intel® Core™ and Intel® Xeon® processors

Wide streaming protocols support including WebRTC, RTSP, RTMP, HLS, MPEG-DASH

Efficient mixing of HD video streams to save bandwidth and power on mobile devices

Intelligent Quality of Service (QoS) control mechanisms that adapt to different network environments

Customer defined media analytics plugins to perform analytics on streams from MCU

The usage scenarios for real-time media stream analytics including but not limited to movement/object detection

❞

环境搭建和运行

请参考OWT-Server User Guide和OWT Server快速入门，忘篱大神讲解OWT Server环境的搭建、调试和分析

常见问题和注意事项

node版本需使用v8.15.0。
当前的master分支会使用webrtc-m79，同步和编译代码比较耗时，需使用稳定的VPN，且总共代码占用17G左右，需较大的硬盘容量，建议直接使用和调试4.3.x分支，我所使用的commit为：ec770323d387c6e63c609712481d9d2b0beebd52。
在pack打包时需确认check过程中没有错误。video_agent、audio_agent、webrtc_agent依赖的动态库都需拷贝至各自的lib目录下，特别是如果使能了ffmpeg的fdk-aac，需手动将libfdk-aac.so.1拷贝至对应的lib目录下。在缺乏依赖的动态库时运行看不到明显的错误提示，需查看对应agent的运行日志。
在打包4.3.x分支过程中，需手动将source/agent/sip/sipIn/build/Release/lib.target/目录下的sipLib.so拷贝至上层目录Release下。
在虚拟机局域网192下运行init-all.sh启动rabbitmq server中发生错误：
```
Job for rabbitmq-server.service failed because the control process exited with error code. See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.
```
查看日志，错误提示是：
```
ERROR: epmd error for host 192: badarg (unknown POSIX error)
```
解决方法：

❝

vim /etc/rabbitmq/rabbitmq-env.conf

❞

在文件里面添加这一行：NODENAME=rabbit@localhost，保存

(注意：如果rabbitmq-env.conf这个文件没有，打开之后自动创建）

整体框架和代码分析

注：本文先聚焦媒体单元，对于信令和API后续再论。

OWT的程序结构，使用Nodejs调用C++代码，请参考忘篱大神CodeNodejs

OWT的代码分析，请参考Intel owt-server VideoMixer设计和忘篱大神CodeVideo

VideoMix的原理剖析

将解码后的源image的YUV数据按照Layout布局的要求resize到目标image的大小，然后贴图拷贝至相应的内存区域输出合成后的图片。如下图所示^[1]：左边为目标合成image，右边为原始image需要先进行采样缩放，需将下图中间的源s1缩放成右边的目标大小

然后贴图，贴图原理为：

采样后图像的Y分量直接memcpy到合成图像的对应区域，U/V分量则需要注意：U/V水平和垂直均1/2采样，隔行拷贝，每次拷贝采样后w/2长度。合成后目标image的U/V数据也是隔行-且存储也是连续的，所以UV拷贝的时候连续拷贝采样后w/2长度然后进入合成图像下一行的U/V数据然后再拷贝，拷贝的高度为采样后h/2高度。

上面的过程可由Google开源的libyuv的I420Scale函数来胜任，代码如下：

void SoftFrameGenerator::layout_regions(SoftFrameGenerator *t, rtc::scoped_refptr<webrtc::I420Buffer> compositeBuffer, const LayoutSolution &regions)
{
        uint32_t composite_width = compositeBuffer->width();
        uint32_t composite_height = compositeBuffer->height();


        for (LayoutSolution::const_iterator it = regions.begin(); it != regions.end(); ++it) {
          boost::shared_ptr<webrtc::VideoFrame> inputFrame = t->m_owner->getInputFrame(it->input);
          if (inputFrame == NULL) {
            continue;
          }

          rtc::scoped_refptr<webrtc::VideoFrameBuffer> inputBuffer = inputFrame->video_frame_buffer();


          Region region = it->region;
          uint32_t dst_x      = (uint64_t)composite_width * region.area.rect.left.numerator / region.area.rect.left.denominator;
          uint32_t dst_y      = (uint64_t)composite_height * region.area.rect.top.numerator / region.area.rect.top.denominator;
          uint32_t dst_width  = (uint64_t)composite_width * region.area.rect.width.numerator / region.area.rect.width.denominator;
          uint32_t dst_height = (uint64_t)composite_height * region.area.rect.height.numerator / region.area.rect.height.denominator;


          if (dst_x + dst_width > composite_width)
          dst_width = composite_width - dst_x;


          if (dst_y + dst_height > composite_height)
          dst_height = composite_height - dst_y;


          uint32_t cropped_dst_width;
          uint32_t cropped_dst_height;
          uint32_t src_x;
          uint32_t src_y;
          uint32_t src_width;
          uint32_t src_height;
          if (t->m_crop) {
            src_width   = std::min((uint32_t)inputBuffer->width(), dst_width * inputBuffer->height() / dst_height);
            src_height  = std::min((uint32_t)inputBuffer->height(), dst_height * inputBuffer->width() / dst_width);
            src_x       = (inputBuffer->width() - src_width) / 2;
            src_y       = (inputBuffer->height() - src_height) / 2;


            cropped_dst_width   = dst_width;
            cropped_dst_height  = dst_height;
          } else {
            src_width   = inputBuffer->width();
            src_height  = inputBuffer->height();
            src_x       = 0;
            src_y       = 0;


            cropped_dst_width   = std::min(dst_width, inputBuffer->width() * dst_height / inputBuffer->height());
            cropped_dst_height  = std::min(dst_height, inputBuffer->height() * dst_width / inputBuffer->width());
          }


          dst_x += (dst_width - cropped_dst_width) / 2;
          dst_y += (dst_height - cropped_dst_height) / 2;


          src_x               &= ~1;
          src_y               &= ~1;
          src_width           &= ~1;
          src_height          &= ~1;
          dst_x               &= ~1;
          dst_y               &= ~1;
          cropped_dst_width   &= ~1;
          cropped_dst_height  &= ~1;


          int ret = libyuv::I420Scale(
          inputBuffer->DataY() + src_y * inputBuffer->StrideY() + src_x, inputBuffer->StrideY(),
          inputBuffer->DataU() + (src_y * inputBuffer->StrideU() + src_x) / 2, inputBuffer->StrideU(),
          inputBuffer->DataV() + (src_y * inputBuffer->StrideV() + src_x) / 2, inputBuffer->StrideV(),
          src_width, src_height,
          compositeBuffer->MutableDataY() + dst_y * compositeBuffer->StrideY() + dst_x, compositeBuffer->StrideY(),
          compositeBuffer->MutableDataU() + (dst_y * compositeBuffer->StrideU() + dst_x) / 2, compositeBuffer->StrideU(),
          compositeBuffer->MutableDataV() + (dst_y * compositeBuffer->StrideV() + dst_x) / 2, compositeBuffer->StrideV(),
          cropped_dst_width, cropped_dst_height,
          libyuv::kFilterBox);
          if (ret != 0)
            ELOG_ERROR("I420Scale failed, ret %d", ret);
          }
}

可以看到上面代码中，多次出现**「stride」，那「什么是stride」**^[2]呢？

当视频图像存储在内存中时，内存缓冲区可能在每行像素后面有额外的填充字节，填充字节会影响图像在内存中的存储方式，但不会影响图像的显示方式。stride是从内存中一行像素开头到内存中的下一行像素间隔的字节数，也可以叫做pitch，如果存在填充字节，stride就会大于图像宽度，如下图所示：

两个包含相同尺寸图像的buffer可能有不同stride，所以在处理视频图像时必须考虑stride。

另外，图像在内存中的排列方式有两种。**「top-down」图像的首行第一个像素先储存在内存中。「bottom-up」**图像的最后一行像素先存储在内存中。下图显示了两者的区别：

bottom-up图像具有负的stride，因为stride被定义为从第一行像素到第二行像素需要向后移动的距离。YUV图像通常是top-down，Direct3D surface则必须是top-down。而内存中的RGB图像则通常是bottom-up。做视频转换时尤其需要处理stride不匹配的buffer。

综上所述，对于视频会议MCU来说，混流服务器的Layout布局坐标一般有如下定义：