​​ISO/IEC 11172-3:1993 - MP3编码标准翻译 (1)

148 阅读9分钟

Contents

Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 3: Audio

信息技术——数字存储介质上动态图像及相关音频的编码 第三部分:音频(速率最高约1.5兆比特/秒)

Contents

Introduction

Section 1: General

1.1 Scope

1.2 Normative referrences

Section 2: Technical elements

2.1 Definitions

2.2 Symbols and abbreviations

2.3 Method of describing bitstream syntax

2.4 Requirements

目录

引言

第1部分:概述

1.1 范围

1.2 规范性引用文件

第2部分:技术要素

2.1 定义

2.2 符号和缩写

2.3 描述比特流语法的方法

2.4 要求

Annexes

A Diagrams

B Tables

C The encoding process

D Psycchoacoustic models

E Bit sensitivity to errors

F Error concealment

G Joint stereo coding

H List of patent holders

附录

A 图示

B 表格

C 编码过程

D 心理声学模型

E 比特误码敏感性

F 错误隐藏

G 联合立体声编码

H 专利持有者名单

Foreword

IS0 (the International Organization for Standardization) and IEC (the Inter-national Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of IS0 or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. IS0 and IEC technical com- mittees collaborate in fields of mutual interest. Other international organ- izations, governmental and non-governmental, in liaison with IS0 and IEC, also take part in the work.

前言

国际标准化组织(ISO)和国际电工委员会(IEC)是全球标准化工作的专门体系。作为ISO或IEC成员的国家机构,通过各自组织设立的技术委员会参与制定国际标准,这些委员会负责不同技术领域的标准化活动。ISO和IEC技术委员会在共同感兴趣的领域开展合作。其他政府或非政府的国际组织也与ISO和IEC保持联络,参与相关工作。

In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75% of the national bodies casting a vote.

在信息技术领域,国际标准化组织(ISO)与国际电工委员会(IEC)共同成立了联合技术委员会 ISO/IEC JTC 1。该联合技术委员会通过的国际标准草案将分发给各国家机构进行投票表决。若要正式发布为国际标准,需获得至少75% 参与投票的国家机构批准。

International Standard iSO/IEC 11172-3 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Sub-committee SC 29, Coded representation of audio, picture, multimedia and hypermedia information.

国际标准 ISO/IEC 11172-3 由联合技术委员会 ISO/IEC JTC 1(信息技术)下属的 SC 29 小组委员会(其职能为音频、图像、多媒体及超媒体信息的编码呈现)制定。

ISO/lEC 11172 consists of the following parts, under the general title Information technology -- Coding of moving pictures and associated audio for digital storage media at up to about 1,5 MbiVs:

  • Part 1: Systems
  • Part2: Video
  • Part 3: Audio
  • Part 4: Compliance testing Annexes A and B form an integral part of this part of ISO/IEC 11 172. Annexes C, D, E, F, G and H are for information only.

ISO/IEC 11172 标准包含以下部分,总标题为“信息技术 — 数字存储介质上活动图像及其伴音编码 — 最高约1.5兆比特/秒”: 第1部分:系统 第2部分:视频 第3部分:音频 第4部分:合规性测试

附录A和B构成本部分标准的组成部分。 附录C、D、E、F、G和H仅为参考信息。

Introduction

Note: Readers interested in an overview of MPEG Audio should read this Introduction and then proceed to annex A (Diagrams) (and annex C (The encoding process) before reading the normative clauses 1 and 2.

注:有意了解MPEG音频技术概览的读者,请按以下顺序阅读: 先通读此引言,再查阅附录A(图示)和附录C(编码过程),最后阅读规范性条款1和2。

To aid in the understanding of the specification of the stored compressed bitstream and its decoding, a sequence of encoding, storage and decoding is described.

为便于理解存储的压缩比特流规范及其解码过程,本文描述了编码、存储和解码的操作序列。

0.1 Encoding

The encoder processes the digital audio signal and produces the compressed bitstream for storage. The encoder algorithm is not standardized, and may use various means for encoding such as estimation of the auditory masking threshold, quantization, and scaling. However, the encoder output must be such that a decoder conforming to the specifications of clause 2.4 will produce audio suitable for the intended application.

编码器对数字音频信号进行处理,生成用于存储的压缩比特流。编码器算法未标准化,可采用多种编码手段,例如听觉掩蔽阈值估计、量化及缩放等。然而,编码器输出必须确保符合第2.4条规范的解码器能够生成适用于目标应用的音频。

image.png

Figure 1 illustrates the basic structure of a audio encoder. Input audio samples are fed into the encoder. The mapping creates a filtered and subsampled representation of the input audio stream. The mapped samples may be called either subband samples (as in Layer I or II, see below) or transformed subband samples (as in Layer III). A psychoacoustic model creates a set of data to control the quantizer and coding. These data are different depending on the actual coder implemention. One possibility is to use an estimation of the masking threshold to do this quantizer control. The quantizer and coding block creates a set of coding symbols from the mapped input samples. Again, this block can depend on the encoding system. The block 'frame packing' assembles the actual bitstream from the output data of the other blocks, and adds other information (e.g. error correction) if necessary.

图1展示了音频编码器的基本结构。输入音频样本被送入编码器后,映射模块会生成经过滤波和抽样的输入音频流表示形式。这些映射后的样本可称为子带样本(如layer I或II,见下文)或变换子带样本(如layer III)。心理声学模型生成一组数据以控制量化器和编码器,具体数据内容因编码器实现而异。一种常见方法是利用掩蔽阈值估计来实现量化控制。量化与编码模块将映射后的输入样本转换为一组编码符号,其具体实现方式取决于编码系统。帧打包模块负责将其他模块输出的数据组装成比特流,并根据需要添加纠错信息等附加数据。

There are four different modes possible, single chmnel, dual channel (two independent audio signals coded within one bitstream), stereo (left and right signals of a stereo pair coded within one bitstream), and Joint Stereo (left and right signals of a stereo pair coded within one bitstream with the stereo irrelevancy and redundancy exploited).

存在四种不同的编码模式:

单声道(Single Channel):单个音频信号单独编码;

双声道(Dual Channel):两个独立音频信号通过单一比特流编码;

立体声(Stereo):立体声对的左右声道信号通过单一比特流编码;

联合立体声(Joint Stereo):立体声对的左右声道信号通过单一比特流编码,并利用立体声的无关性(irrelevancy)和冗余性(redundancy)进行优化。

0.2 Layers

Depending on the application, different layers of the coding system with increasing encoder complexity and performance can be used. An ISOAEC 11172-3 Audio Layer N decoder is able to decode bitstream data which has been encoded in Layer N and all layers below N.

根据应用需求,编码系统提供不同层级,其编码器复杂度和性能逐步提升。 ISO/IEC 11172-3 音频层N解码器能够解码在N层及低于N的所有层级(如Layer I或II对于Layer III)编码的比特流数据。

Layer I

This layer contains the basic mapping of the digital audio input into 32 subbands, fixed segmentation to format the data into blocks, a psychoacoustic model to determine the adaptive bit allocation, and quantization using block companding and formatting. The theoretical minimum encoding/decoding delay for Layer I is about 19 ms.

Layer I

本层包含将数字音频输入基本映射到32个子带的核心处理,通过固定分段将数据格式化为块,利用心理声学模型确定自适应比特分配,并采用块压扩与格式化的量化方法。其理论上的最小编解码延迟约为19毫秒。

Layer II

This layer provides additional coding of bit allocation, scalefactors and samples. Different framing is used. The theoretical minimum encoding/decoding delay for Layer II is about 35 ms.

Layer II

本层提供比特分配、比例因子和样本的进一步编码处理,采用差异化分帧策略。其理论上的最小编解码延迟约为35毫秒。

Layer III This layer introduces inmased frequency resolution based on a hybrid filterbank. It adds a different (nonuniform) quantizer, adaptive segmentation and entropy coding of the quantized values. The theoretical minimum encoding/decoding delay for Layer III is about 59 ms. Joint Stereo coding can be added as an additional feature to any of the layers.

Layer III

本层通过混合滤波器组显著提升频率分辨率,引入非均匀量化器、自适应分段及量化值的熵编码技术。其理论上的最小编解码延迟约为59毫秒。

联合立体声编码可作为附加功能集成至任意层级。

0.3 Storage

Various streams of encoded video, encoded audio, synchronization data, systems data and auxiliary data may be stored together on a storage medium. Editing of the audio will be easier if the edit point is constrained to coincide with an addressable point.

多种编码视频流、编码音频流、同步数据、系统数据和辅助数据可共同存储于同一存储介质中。若将音频的编辑点限制为与可寻址点对齐,则音频编辑将更加便捷。

Access to storage may involve remote access over a communication system. Access is assumed to be controlled by a functional unit other than the audio decoder itself. This control unit accepts user commands, reads and interprets data base structure information, reads the stored information from the media, demultiplexes non-audio information aid passes the stored audio bitstream to the audio decoder at the required rate.

对存储的访问可能通过通信系统实现远程访问。访问控制假定由音频解码器自身以外的功能单元完成:该控制单元接收用户指令,读取并解析数据基础结构信息,从存储介质中读取数据,解复用非音频信息,并按所需速率将存储的音频比特流传送至音频解码器。

0.4 Decoding

The decoder accepts the compressed audio bitstream in the syntax defined in 2.4.1, decodes the data elements according to 2.4.2, and uses the information to produce digital audio output according to 2.4.3.

解码器接受按第2.4.1节定义的语法压缩的音频比特流,根据第2.4.2节的规定解码数据元素,并根据第2.4.3节的要求利用这些信息生成数字音频输出。

image.png

Figure 2 illustrates the basic smcture of a audio decoder. Bitstream data is fed into the decoder. The bitstream unpacking and decoding block does error detection if error-check is applied in the encoder (see 2.4.2.4). The bitstream data are unpacked to recover the various pieces of information. The reconstruction block reconstructs the quantized version of the set of mapped samples. The inverse mapping transforms these mapped samples back into uniform PCM.

图2展示了音频解码器的基本结构。压缩音频比特流(Bitstream data)输入解码器后,比特流解包与解码模块会检测错误(若编码器启用了错误校验功能,详见2.4.2.4节)。比特流数据被解包以恢复各部分信息。重建模块将量化后的映射样本还原为量化版本,逆映射则将这些映射样本转换回均匀的PCM格式。

未完待续........