ExoPlayer 漫谈之音视频同步

1,969 阅读3分钟

音视频同步是播放器中非常重要的逻辑,对用户的实际体验影响巨大。上一篇文章中说明了视频中audio与video是分别解码并且送显的,两个线程分别执行,如果不加时间戳同步操作,播放出来的视频会出现音频和视频不同步的情况。 Video解码送显在MediaCodecVideoRenderer中执行; Audio解码播放在MediaCodecAudioRenderer中执行;

视频同步

视频解码执行的入口函数是MediaCodecRenderer.render函数:传入了两个参数

public void render(long positionUs, long elapsedRealtimeUs)

elapsedRealtimeUs是当前时间戳,传入drainOutputBuffer开始读取解码之后的output buffer,准备送显,执行到MediaCodecVideoRenderer.processOutputBuffer

  protected boolean processOutputBuffer(
      long positionUs,
      long elapsedRealtimeUs,
      MediaCodec codec,
      ByteBuffer buffer,
      int bufferIndex,
      int bufferFlags,
      long bufferPresentationTimeUs,
      boolean isDecodeOnlyBuffer,
      boolean isLastBuffer,
      Format format)
  • bufferPresentationTimeUs为帧的pts
  • elapsedRealtimeUs为准备render的时间戳
    long earlyUs = bufferPresentationTimeUs - positionUs;

    // Fine-grained adjustment of earlyUs based on the elapsed time since the start of the current
    // iteration of the rendering loop.
    long elapsedSinceStartOfLoopUs = elapsedRealtimeNowUs - elapsedRealtimeUs;
    earlyUs -= elapsedSinceStartOfLoopUs;

    // Compute the buffer's desired release time in nanoseconds.
    long systemTimeNs = System.nanoTime();
    long unadjustedFrameReleaseTimeNs = systemTimeNs + (earlyUs * 1000);

    // Apply a timestamp adjustment, if there is one.
    long adjustedReleaseTimeNs = frameReleaseTimeHelper.adjustReleaseTime(
        bufferPresentationTimeUs, unadjustedFrameReleaseTimeNs);
    earlyUs = (adjustedReleaseTimeNs - systemTimeNs) / 1000;
  • positionUs 可以看做音频时间点
  • bufferPresentationTimeUs 可以认为是视频帧的pts
  • elapsedSinceStartOfLoopUs 是当前时间戳 - render执行的时间戳,我觉得这儿只是为了校准一下。
  • earlyUs -= elapsedSinceStartOfLoopUs 得到的 earlyUs 校准了视频pts和音频的pts

接下来还要做一下送显时间的校准。frameReleaseTimeHelper.adjustReleaseTime就是做这个工作的。closestVsync函数是寻找最近的送显时间点。

    // Find the timestamp of the closest vsync. This is the vsync that we're targeting.
    long snappedTimeNs = closestVsync(adjustedReleaseTimeNs, sampledVsyncTimeNs, vsyncDurationNs);
  private static long closestVsync(long releaseTime, long sampledVsyncTime, long vsyncDuration) {
    long vsyncCount = (releaseTime - sampledVsyncTime) / vsyncDuration;
    long snappedTimeNs = sampledVsyncTime + (vsyncDuration * vsyncCount);
    long snappedBeforeNs;
    long snappedAfterNs;
    if (releaseTime <= snappedTimeNs) {
      snappedBeforeNs = snappedTimeNs - vsyncDuration;
      snappedAfterNs = snappedTimeNs;
    } else {
      snappedBeforeNs = snappedTimeNs;
      snappedAfterNs = snappedTimeNs + vsyncDuration;
    }
    long snappedAfterDiff = snappedAfterNs - releaseTime;
    long snappedBeforeDiff = releaseTime - snappedBeforeNs;
    return snappedAfterDiff < snappedBeforeDiff ? snappedAfterNs : snappedBeforeNs;
  }

VideoFrameReleaseTimeHelper.java定义一个内部类:

private static final class VSyncSampler implements FrameCallback, Handler.Callback {
    @Override
    public void doFrame(long vsyncTimeNs) {
      sampledVsyncTimeNs = vsyncTimeNs;
      choreographer.postFrameCallbackDelayed(this, CHOREOGRAPHER_SAMPLE_DELAY_MILLIS);
    }

}

onFrame回调可以精确表示送显的时间,因为这是系统choreographer送显的时间点。

经过上面的处理之后,校准的时间是比较准确的。

    boolean treatDroppedBuffersAsSkipped = joiningDeadlineMs != C.TIME_UNSET;
    if (shouldDropBuffersToKeyframe(earlyUs, elapsedRealtimeUs, isLastBuffer)
        && maybeDropBuffersToKeyframe(
            codec, bufferIndex, presentationTimeUs, positionUs, treatDroppedBuffersAsSkipped)) {
      return false;
    } else if (shouldDropOutputBuffer(earlyUs, elapsedRealtimeUs, isLastBuffer)) {
      if (treatDroppedBuffersAsSkipped) {
        skipOutputBuffer(codec, bufferIndex, presentationTimeUs);
      } else {
        dropOutputBuffer(codec, bufferIndex, presentationTimeUs);
      }
      return true;
    }

    if (Util.SDK_INT >= 21) {
      // Let the underlying framework time the release.
      if (earlyUs < 50000) {
        notifyFrameMetadataListener(
            presentationTimeUs, adjustedReleaseTimeNs, format, currentMediaFormat);
        renderOutputBufferV21(codec, bufferIndex, presentationTimeUs, adjustedReleaseTimeNs);
        return true;
      }
    } else {
      // We need to time the release ourselves.
      if (earlyUs < 30000) {
        if (earlyUs > 11000) {
          // We're a little too early to render the frame. Sleep until the frame can be rendered.
          // Note: The 11ms threshold was chosen fairly arbitrarily.
          try {
            // Subtracting 10000 rather than 11000 ensures the sleep time will be at least 1ms.
            Thread.sleep((earlyUs - 10000) / 1000);
          } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            return false;
          }
        }
        notifyFrameMetadataListener(
            presentationTimeUs, adjustedReleaseTimeNs, format, currentMediaFormat);
        renderOutputBuffer(codec, bufferIndex, presentationTimeUs);
        return true;
      }

上面一大段的执行方法主要是已经得到了校准后的时间earlyUs,接下来要根据earlyUs来丢帧、跳帧或者说等一等音频解码。 如果earlyUs 时间差为正值,代表视频帧应该在当前系统时间之后被显示,换言之,代表视频帧来早了,反之,如果时间差为负值,代表视频帧应该在当前系统时间之前被显示,换言之,代表视频帧来晚了。如果超过一定的门限值,即该视频帧来的太晚了,则将这一帧丢掉,不予显示。按照预设的门限值,视频帧比预定时间来的早了50ms以上,则进入下一个间隔为10ms的循环,再继续判断,否则,将视频帧送显。

音频同步

MediaCodecAudioRenderer.getPositionUs 获取音频的时间戳

  @Override
  public long getPositionUs() {
    if (getState() == STATE_STARTED) {
      updateCurrentPosition();
    }
    return currentPositionUs;
  }
  private void updateCurrentPosition() {
    long newCurrentPositionUs = audioSink.getCurrentPositionUs(isEnded());
    if (newCurrentPositionUs != AudioSink.CURRENT_POSITION_NOT_SET) {
      currentPositionUs =
          allowPositionDiscontinuity
              ? newCurrentPositionUs
              : Math.max(currentPositionUs, newCurrentPositionUs);
      allowPositionDiscontinuity = false;
    }
  }

调用到DefaultAudioSink. getCurrentPositionUs

  @Override
  public long getCurrentPositionUs(boolean sourceEnded) {
    if (!isInitialized() || startMediaTimeState == START_NOT_SET) {
      return CURRENT_POSITION_NOT_SET;
    }
    long positionUs = audioTrackPositionTracker.getCurrentPositionUs(sourceEnded);
    positionUs = Math.min(positionUs, configuration.framesToDurationUs(getWrittenFrames()));
    return startMediaTimeUs + applySkipping(applySpeedup(positionUs));
  }

audioTrackPositionTracker.getCurrentPositionUs AudioTrackPositionTracker.java是追踪AudioTrack播放位置的类。

  private long getPlaybackHeadPositionUs() {
    return framesToDurationUs(getPlaybackHeadPosition());
  }

  private long framesToDurationUs(long frameCount) {
    return (frameCount * C.MICROS_PER_SECOND) / outputSampleRate;
  }

getPlaybackHeadPosition函数计算当前的音频帧数。 因为音频有码率,根据音频的帧数和帧间duration,可以得到当前播放的位置。

小结

音频同步我三种普通的方式: 1.以音频为基准,视频向音频靠拢 2.以视频为基准,音频向视频靠拢 3.找一个共同基准,音频和视频都向这个共同基准靠拢

ExoPlayer中采用了第一种方案