ISO/IEC 11172-3:1993 - MP3编码标准翻译 (4)2.4.3 The audio decodi

2.4.3 The audio decoding process

2.4.3.1 General

The first action is synchronization of the decoder to the incoming bitstream. Just after startup this may be done by searching in the bitstream for the 12 bit syncword. In some applications the ID, layer, and protection_bit are already known to the decoder, and thus the first 16 bits of the header should be regarded as a 16 bit synchronization code for a more reliable synchronization. The position of consecutive syncwords can be calculated from the information provided by the seven bits after the protection_bit: the bitstream is subdivided in slots. The distance between the start of two consecutive syncwords is equal to "N +1" slots. The value of "N" is depends on the layer.

首要操作是使解码器与输入的比特流同步。在启动后，可通过在比特流中搜索 12 比特同步字来实现。在某些应用中，解码器已预先知晓 ID、层号和保护位，因此，头部的前 16 比特应被视为 16 比特同步码，以实现更可靠的同步。连续同步字的位置可根据保护位之后的 7 比特所提供的信息计算得出：比特流被细分为时隙。两个连续同步字起始位置之间的距离等于 “N + 1” 个时隙。“N” 的值取决于层号。

If this calculation does not give an integer number the result is truncated and 'padding' is required. In this case the number of slots in a frame will vary between N and N+1. The padding bit is set to '0' if the number of slots equals N, and to '1' otherwise. This knowledge of the position of consecutive syncwords greatly facilitates synchronization.

If the bitrate index equals '0000', the exact bitrate is not indicated. N can be determined from the distance between consecutive syncwords and the value of the padding bit.

The mode bits in the bitstream shall be read and if their value is '01' the mode_extension bits shall also be read. The mode_extension bits set the 'bound' as shown in 2.4.2.3 and thus indicate which subbands are coded in joint_stereo mode.

If the protection bit in the header equals '0', a CRC-check word has been inserted in the bitstream just after the header. The error detection method used is 'CRC-16' whose generator polynomial is:

如果该计算未得到整数，则对结果进行截断处理且需要 "填充"。在这种情况下，一帧中的时隙数将在 N 和 N+1 之间变化。当时隙数等于 N 时，填充位设置为 '0'；否则设置为 '1'。连续同步字的位置信息极大地有助于同步过程。

如果码率索引等于 '0000'，则不指示确切码率。N 可通过连续同步字之间的间隔以及填充位的值来确定。

应当读取码流中的模式位，若其值为 '01'，则还需读取模式扩展位。模式扩展位按 2.4.2.3 所示设置 "边界"，从而指示哪些子带以联合立体声模式进行编码。

The method is depicted in figure A.9 "CRC-check diagram". The initial state of the shift register is '1111 1111 1111 1111'. Then all the bits included into the CRC-check are input to the circuit shown in figure A.9 "CRC-check diagram". Each bit is input to the shift register is shifted by one bit. After the last shift operation, the outputs b15...b0 constitute a word to be compared with the CRC-check word in the bitstream. If the words are not identical, a transmission error has occurred in the protected field of the audio stream. To avoid annoying previous frames, is recommended.

该方法如图 A.9 “CRC 校验示意图” 所示。移位寄存器的初始状态为 “1111 1111 1111 1111”。随后，所有参与 CRC 校验的比特都输入到图 A.9 “CRC 校验示意图” 所示的电路中。每一个输入到移位寄存器的比特都会使其移位一位。在最后一次移位操作后，输出 b15...b0 构成一个字，该字将与比特流中的 CRC 校验字进行比较。如果这两个字不相同，说明音频流的受保护字段出现了传输错误。为避免影响之前的帧，建议

2.4.3.2 Layer I

After the part of the decoding which is common to all layers (see 2.4.3.1) the bit allocation information has to be read for all subbands, and the scalefactors read for all subbands with a nonzero bit allocation. The decoder flowchart is given in figure A.1 "Layer I and II decoder flow chart".

在完成所有层通用的解码部分（见 2.4.3.1）后，需读取所有子带的比特分配信息，并读取所有具有非零比特分配子带的比例因子。解码器流程图见图 A.1 “层 I 和层 II 解码器流程图”。

2.4.3.2.1 Requantization of subband samples

From the bit allocation the number of bits nb that has to be read for the samples in each subband is known. The order of the samples is given in 2.4.1.5 for each mode. After the bits for one sample have been gathered from the bitstream, the first bit has to be inverted. The resulting number can be considered as a two's complement fractional number, where the MSB represents the value -1. The requantized value can be obtained by applying a linear formula :

根据比特分配，可知每个子带中样本需读取的比特数 nb。每种模式下样本的顺序在 2.4.1.5 中给出。从比特流中收集到一个样本的比特后，需将第一个比特取反。得到的数字可视为一个补码分数，其中最高有效位（MSB）表示值 -1 。

Samples in subbands which are in intensity_stereo mode must be copied to both channels. The requantized value has to be rescaled. The multiplication factor can be found in the table B.1 "Layer I, II scalefactors". The rescaled value s' is calculated as :
s′ = factor * s″

处于强度立体声模式的子带样本必须复制到两个声道。重新量化后的值需要进行缩放。缩放因子可在表 B.1“层 I、层 II 比例因子” 中查找。缩放后的值 s' 按如下公式计算：
s' = 因子 * s″

2.4.3.2.2 Synthesis subband filter

If a subband has no bits allocated to it, the samples in that subband are set to zero. Each time the subband samples for all 32 subbands of one channel have been calculated, they can be applied to the synthesis subband filter and 32 consecutive audio samples can be calculated. The actions in flow diagram figure A.2 "Synthesis subband filter flow chart" show the reconstruction operation. The coefficients Nik for the matrixing operation are given by

The coefficients Di for the windowing operation can be found in table B.3 "Coefficients Di of the synthesis window". The coefficients have been derived by numerical optimization. One frame contains 12∗32=384 subband samples, which result, after filtering, in 384 audio samples.

如果某个子带没有分配比特，那么该子带中的样本将被设置为零。每当计算出一个声道的全部 32 个子带的样本后，就可以将其输入到合成子带滤波器中，进而计算出 32 个连续的音频样本。图 A.2 “合成子带滤波器流程图” 中的步骤展示了重建操作过程。矩阵运算的系数Nik 由此式给出，加窗操作的系数Di 可在表 B.3“合成窗系数Di” 中找到。这些系数是通过数值优化得到的。一帧包含12×32=384 个子带样本，经过滤波后会得到 384 个音频样本。

2.4.3.3 Layer II

Layer II is a more efficient but more complex coding scheme than Layer I. The flowchart in figure A.1 "Layer I and II decoder flow chart" applies to both Layers I and II. The first step is to perform the decoding which is common to all three layers (see 2.4.3.1).

层 II 是一种比层 I 更高效但也更复杂的编码方案。图 A.1 “层 I 和层 II 解码器流程图” 中的流程图适用于层 I 和层 II。第一步是执行对所有三层都通用的解码操作（见 2.4.3.1）。

2.4.3.3.1 Bit allocation decoding

For different combinations of bitrate and sampling frequency different bit allocation tables exist (table B.2 "Layer II bit allocation tables"). Note that the bitrates given in the table headers are per channel. If the mode is not single_channel, the bitrate should be divided by two to obtain the bitrate per channel. The decoding of the bit allocation table is done in a three - step approach. The first step consists of reading 'nbal' (2,3, or 4) bits of information for one subband from the bitstream. The value of 'nbal' is given in the second column of the relevant table B.2 "Layer II bit allocation tables". These bits shall be interpreted as an unsigned integer number. The second step uses this number and the number of the subband as indices to point to a value in the table. This value represents the number of levels 'nlevels' used to quantize the samples in the subband. As a third step, using table B.4 "Layer II classes of quantization", the number of bits used to code the quantized samples, the requation coefficients, and whether the codes for three consecutive subband samples have been grouped to one code can be determined. It can be seen from the bit allocation tables that some of the highest subbands will never have bits allocated. The number of the lowest subband that will not have bits allocated to it is assigned to the identifier'sblimit'.

对于不同的码率和采样频率组合，存在不同的比特分配表（表 B.2 “层 II 比特分配表”）。请注意，表标题中给出的码率是每个声道的码率。如果模式不是单声道，码率应除以 2 以得到每个声道的码率。比特分配表的解码分三步进行。第一步是从比特流中读取一个子带的 “nbal”（2、3 或 4）比特信息。“nbal” 的值在相关的表 B.2 “层 II 比特分配表” 的第二列中给出。这些比特应被解释为无符号整数。第二步使用该数字和子带编号作为索引，指向表中的一个值。该值表示用于对子带中的样本进行量化的级数 “nlevels” 。第三步，使用表 B.4 “层 II 量化类别”，可以确定用于对量化样本进行编码的比特数、再量化系数，以及三个连续子带样本的编码是否已被组合为一个编码。从比特分配表中可以看出，一些最高编号的子带永远不会分配比特。没有比特分配的最低编号子带的编号将被赋给标识符 “sblimit” 。

2.4.3.3.2 Scalefactor selection information decoding

The 36 samples in one subband within a frame are divided in three equal parts of 12 subband samples. Each part can have its own scalefactor. The number of scalefactors that has to be read from the bitstream depends on scfsib]. The scalefactor selection information scfsib] is read from the bitstream for the subbands that have a nonzero bit allocation. If scfsib] equals '00' three scalefactors are transmitted, for parts 0,1,2 respectively. If scfsib] equals '01' two scalefactors are transmitted, the first one valid for parts 0 and 1, the second one for part 2. If scfsi[sb] equals '10' one scalefactor is transmitted, valid for all three parts. If scfsi[sb] equals '11' two scalefactors are transmitted, the first one valid for part 0, the second one for parts 1 and 2.

一帧内一个子带中的 36 个样本被等分为三部分，每部分包含 12 个子带样本。每部分都可以有自己的比例因子。需要从比特流中读取的比例因子数量取决于 scfsi [sb] 。对于有非零比特分配的子带，从比特流中读取比例因子选择信息 scfsi [sb] 。如果 scfsi [sb] 等于 “00” ，则传输三个比例因子，分别对应第 0、1、2 部分。如果 scfsi [sb] 等于 “01” ，则传输两个比例因子，第一个对第 0 和第 1 部分有效，第二个对第 2 部分有效。如果 scfsi [sb] 等于 “10” ，则传输一个比例因子，对所有三部分均有效。如果 scfsi [sb] 等于 “11” ，则传输两个比例因子，第一个对第 0 部分有效，第二个对第 1 和第 2 部分有效。

2.4.3.3.3 Scalefactor decoding

For every subband with a nonzero bit allocation the coded scalefactors for that subband are read from the bitstream. The number of coded scalefactors and the part of the subband samples they refer to is defined by scfsi[sb]. The 6 bits of a coded scalefactor should be interpreted as an unsigned integer index to table B.1 "Layer I, II scalefactors". This table gives the scalefactor by which the relevant subband samples should be multiplied after requantization.

对于每个有非零比特分配的子带，从比特流中读取该子带已编码的比例因子。已编码比例因子的数量以及它们所对应的子带样本部分由 scfsi [sb] 定义。一个已编码比例因子的 6 比特应被解释为表 B.1“层 I、层 II 比例因子” 中的无符号整数索引。该表给出了在重新量化后，相关子带样本应乘的比例因子。

2.4.3.3.4 Requantization of subband samples

Next the coded samples are read. As can be seen from 2.4.1.6, the coded samples appear as triplets, the code contains three consecutive samples at a time. From table B.4 "Layer II classes of quantization" it is known how many bits have to be read for one triplet from the bitstream for each subband. Also from table B.4 "Layer II classes of quantization", it is known whether this code consists of three consecutive separable codes for each sample or of one combined code for the three samples (grouping). In the last case degrouping must be performed. The combined code has to be regarded as an unsigned integer, called 'c'. The following algorithm will supply the three separate codes s[0], s[1], s[2].

for (i=0; i<3; i++) {
    s[i]= c % nlevels
    c  = c DIV nlevels
}

where nlevels is the number of steps as shown in table B.2 "Layer II bit allocation table".

The first bit of each of the three codes has to be inverted, and the resulting numbers should be regarded as two's complement fractional numbers, where the MSB represents the value -1. The requantized values can be obtained by applying a linear formula :

The values of the constants C and D are given in table B.4 "Layer II classes of quantization". The requantized values have to be rescaled. The multiplication factors can be found in the table B.1 "Layer I, II scalefactors". as described above. The rescaled value s' is calculated as :
s′ = factor * s″

2.4.3.3.5 Synthesis subband filter

If a subband has no bits allocated to it, the samples in that subband are set to zero. Each time the subband samples for all 32 subbands of one channel have been calculated, they can be applied to the synthesis subband filter and 32 consecutive audio samples can be calculated. For that purpose, the actions in the flow diagram of figure A.2 "Synthesis subband filter flow chart" have to be carried out. The coefficients Nik for the matrixing operation are given by

The coefficients Di for the windowing operation can be found in table B.3 "Coefficients Di of the synthesis window". One frame contains 36∗32=1152 subband samples, which result after filtering in 1152 audio samples.

2.4.3.4 Layer III

Additional frequency resolution is provided by the use of an hybrid filterbank. Every band is split into 18 frequency lines by use of an MDCT. The window length of the MDCT is 36. Adaptive window switching is used to control time artifacts (pre-echoes), see the description in annex C. The frequency above which shorter blocks (better time resolution) are used can be selected. Parts of the signal below a frequency depending on "mixed_block_flag" are coded with better frequency resolution, parts of the signal above are coded with better time resolution.

通过使用混合滤波器组来提供额外的频率分辨率。借助改良离散余弦变换（MDCT），每个频段被划分为 18 条频率线。MDCT 的窗长为 36 。采用自适应窗切换来控制时域瑕疵（预回声），详见附录 C 中的说明。可以选择一个频率，高于该频率时使用较短的数据块（具有更好的时间分辨率）。根据 “混合块标志” ，低于某个频率的信号部分将以更高的频率分辨率进行编码，高于该频率的信号部分则以更好的时间分辨率进行编码。

The frequency components are quantized using a nonuniform quantizer and coded using a Huffman encoder. The Huffman coder uses one of 18 different tables (see table B.7). A buffer is used to help enhance the coding efficiency of the Huffman coder and to help in the case of pre-echo conditions (see the description in annex C). The size of the input buffer is the size of one frame at the bitrate of 160 kbits/s per channel for Layer III. The short term buffer technique used is called "bit reservoir" because it uses short-term variable bitrate with a maximum integral offset from the mean bitrate.

频率分量使用非均匀量化器进行量化，并使用霍夫曼编码器进行编码。霍夫曼编码器使用 18 种不同表格中的一种（见表 B.7）。使用一个缓冲区来提高霍夫曼编码器的编码效率，并在预回声情况下提供帮助（见附录 C 中的说明）。对于层 III，输入缓冲区的大小相当于每声道 160 kbits/s 码率下的一帧数据大小。所采用的短期缓冲区技术被称为 “比特缓存器”，因为它采用短期可变码率，与平均码率的最大积分偏移量是固定的。

Each frame holds the data from 2 granules. The audio data in a frame is allocated in the following way:

main_data_begin pointer

side info for both granules (scfsi)

side info granule 1

side info granule 2

每帧包含来自 2 个子粒的数据。一帧中的音频数据按以下方式分配：

主数据起始指针
两个子粒的边信息（scfsi）
子粒 1 的边信息
子粒 2 的边信息

The header and this part of the audio data constitute the side information stream.

scalefactors and Huffman code data granule 1

scalefactors and Huffman code data granule 2

ancillary data

These data constitute the main data stream. The main_data_begin pointer specifies a negative offset from the position of the first byte of the header.

头部信息和这部分音频数据构成边信息流。

子粒 1 的比例因子和霍夫曼编码数据
子粒 2 的比例因子和霍夫曼编码数据
辅助数据

这些数据构成主数据流。主数据起始指针指定了相对于头部第一个字节位置的负偏移量。

2.4.3.4.1 Decoding

The first action is the synchronization of the decoder to the incoming bitstream. This is done as in the other layers. The header information (first 32 bits including syncword) is read in just as in the other layers. The information about sampling frequency is used to select the scalefactor_band table (see annex B.8).

第一步是使解码器与输入的比特流同步，这与其他层的操作方式相同。和其他层一样，读取头部信息（前 32 位，包括同步字）。利用采样频率信息来选择比例因子带表（见附录 B.8）。

2.4.3.4.2 Side information

The side information must be extracted from the bitstream and stored for use during the decoding of the associated frame. The table select information is used to select the Huffman decoder table and the number of ESC-bits (linbits), according to table B.7.

必须从比特流中提取边信息并存储，以便在解码相关帧时使用。根据表 B.7，表格选择信息用于选择霍夫曼解码器表以及转义码比特数（linbits）。

2.4.3.4.3 Start of main_data

The main_data (scalefactors, Huffman coded data and ancillary information) are not necessarily located adjacent to the side information. This is described in figure A.7.a and figure A.7.b. The beginning of the main data part is located by using the main_data_begin pointer of the current frame. The allocation of the main data is done in a way that all main data are resident in the input buffer when the header of the next frame is arriving in the input buffer. The decoder has to skip Header and side information when decoding the main data. It knows their positions from the bitrate_index and padding_bit. The length of the Header is always 4 bytes, the length of the side information is 17 bytes in mode single_channel and 32 bytes in the other modes. Main data can span more than one block of header and side information (see figure A.7.b).

主数据（比例因子、霍夫曼编码数据和辅助信息）不一定与边信息相邻。图 A.7.a 和图 A.7.b 对此进行了说明。通过使用当前帧的主数据起始指针来定位主数据部分的起始位置。主数据的分配方式是，当下一帧的头部到达输入缓冲区时，所有主数据都存储在输入缓冲区中。解码器在解码主数据时必须跳过头部和边信息。它可以通过码率索引和填充比特来确定它们的位置。头部长度始终为 4 字节，在单声道模式下边信息长度为 17 字节，在其他模式下为 32 字节。主数据可能跨越多个头部和边信息块（见图 A.7.b）。

2.4.3.4.4 Buffer considerations

The following rule can be used to calculate the maximum number of bits used for one granule: The buffer length is 7 680 bits. This value is used as the maximum buffer at every bitrate. At the highest possible bitrate of Layer III (320 kbits/s per stereo signal) and sampling frequency 48 kHz the mean frame length is (320 000/48 000)1 152 = 7 680 bits. Therefore the frames must be of constant length at this bitrate and sampling frequency. At 64 kbits/s (128 kbits/s stereo) the mean granule length is (64 000/48 000)576 = 768 bit at 48 kHz sampling frequency. This means that there is a maximum deviation (short time buffer) of 7 680 - 4768 = 4 608 bits is allowed at 64 kbits/s. The actual deviation is equal to the number of bytes denoted by the main_data_begin offset pointer. The actual maximum deviation is 29 * 8 bit = 4 096 bits. For intermediate bitrates the delay and buffer length can be calculated accordingly. The exchange of buffer between the left and right channel in a stereo bitstream is allowed without restrictions. Because of the constraint on the buffer size main_data_begin is always set to 0 in the case of bitrate_index==14, i.e. data rate 320 kbits/s per stereo signal. In this case all data are allocated between adjacent header words.

At sampling frequencies lower than 48 kHz the buffer should be constrained such that the same physical buffer size is sufficient as the one calculated for the 48 kHz case above.

以下规则可用于计算单个颗粒所用的最大比特数：
缓冲区长度为 7680 比特。该值在所有比特率下均作为最大缓冲区使用。在 Layer III 的最高可能比特率（每立体声信号 320 千比特 / 秒）和采样频率 48 千赫兹的情况下，平均帧长为（320,000/48,000）×1152 = 7680 比特。因此，在此比特率和采样频率下，帧必须为固定长度。

在 64 千比特 / 秒（立体声 128 千比特 / 秒）时，采样频率为 48 千赫兹的平均颗粒长度为（64,000/48,000）×576 = 768 比特。这意味着在 64 千比特 / 秒时，允许的最大偏差（短时缓冲区）为 7680 - 4×768 = 4608 比特。实际偏差等于主数据起始偏移指针所指示的字节数对应的比特数。实际最大偏差为 2×9×8 比特 = 4096 比特。对于中间比特率，延迟和缓冲区长度可相应计算。

立体声比特流中左右声道之间的缓冲区交换不受限制。由于缓冲区大小的限制，当比特率索引等于 14（即每立体声信号数据速率为 320 千比特 / 秒）时，主数据起始（main_data_begin）始终设置为 0。在这种情况下，所有数据均分配在相邻的头字段之间。

当采样频率低于 48 千赫兹时，缓冲区应受到约束，以确保物理缓冲区大小与上述 48 千赫兹情况下计算的缓冲区大小一致。

2.4.3.4.5 Scalefactors

The scalefactors are decoded according to the slen1 and slen2 which themselves are determined from the values of scalefac_compress. The decoded values can be used as entries into a table or used to calculate the factors for each scalefactor band directly. When decoding the second granule, the scfsi has to be considered. For the bands in which the corresponding scfsi is set to 1, the scalefactors of the first granule are also used for the second granule, therefore they are not transmitted for the second granule.

The number of bits used to encode scalefactors is called part2_length, and is calculated as follows.
For block_type == 0, 1, or 3 (long blocks):
part2_length = 11slen1 + 10slen2.
For block_type==2 (short blocks) and mixed_block_flag == 0:
part2_length = 18slen1 + 18slen2.
For block_type==2 (short blocks) and mixed_block_flag == 1:
part2_length = 17slen1 + 18slen2.

These formulas are valid if gr==0 or if gr==1 and scfsi[ch][scfsi_band]==0 for all scfsi_bands, i.e. scalefactor selection information is not used.

比例因子根据 slen1 和 slen2 进行解码，而 slen1 和 slen2 本身由 scalefac_compress 的值确定。解码后的值可作为表格的索引，或直接用于计算每个比例因子频段的因子。解码第二个颗粒时，必须考虑 scfsi（比例因子选择信息）。对于相应 scfsi 设置为 1 的频段，第一个颗粒的比例因子也用于第二个颗粒，因此第二个颗粒无需传输这些比例因子。

用于对比例因子进行编码的比特数称为 part2_length（第二部分长度），计算方式如下：

当 block_type（块类型）等于 0、1 或 3（长块）时：
part2_length = 11×slen1 + 10×slen2。
当 block_type 等于 2（短块）且 mixed_block_flag（混合块标志）等于 0 时：
part2_length = 18×slen1 + 18×slen2。
当 block_type 等于 2（短块）且 mixed_block_flag 等于 1 时：
part2_length = 17×slen1 + 18×slen2。

若 gr 等于 0，或 gr 等于 1 且对于所有 scfsi 频段 scfsi [ch][scfsi_band] 都等于 0（即未使用比例因子选择信息），这些公式有效。

2.4.3.4.6 Huffman decoding

All necessary information including the table which realizes the Huffman code tree can be generated from the tables in table B.7. First the big_values data are decoded, using the tables with the number table_select[gr][ch][region]. The frequency lines in region 0, region 1 and region 2 are Huffman decoded in pairs until big_values number of line - pairs have been decoded. The remaining Huffmancodebits are decoded using the table according to count1table_select[gr][ch]. Decoding is done until all Huffman code bits have been decoded or until quantized values representing 576 frequency lines have been decoded, whichever comes first. If there are more Huffman code bits than necessary to decode 576 values they are regarded as stuffing bits and discarded. The variable count1 is implicitly derived as the number of quadruples of decoded values using count1table_select.

包括实现霍夫曼码树的表格在内的所有必要信息，都可以从表格 B.7 中的表格生成。首先，使用编号为 table_select [gr][ch][region] 的表格对 big_values 数据进行解码。区域 0、区域 1 和区域 2 中的频率线将以成对的方式进行霍夫曼解码，直到已解码的线对数达到 big_values 的数量。剩余的霍夫曼编码比特将根据 count1table_select [gr][ch] 所指定的表格进行解码。解码过程持续进行，直到所有霍夫曼编码比特都被解码，或者直到代表 576 条频率线的量化值被解码，以先满足的条件为准。如果霍夫曼编码比特数超过解码 576 个值所需的数量，多余的比特将被视为填充比特并被丢弃。变量 count1 通过 count1table_select 隐式推导为已解码值的四元组数量。

2.4,3.4.7 Requantizer

2.4.3.4.10 Synthesis filterbank

Figure A.4 shows a block diagram including the synthesis filterbank. The frequency lines are preprocessed by the "alias reduction" scheme (see the block diagrams in figure A.5 and in table B.9. for the coefficients) and fed into the IMDCT matrix, each 18 into one transform block. The first half of the output values are added to the stored overlap values from the last block. These values are new output values and are input values for the polyphase filterbank. The second half of the output values is stored for overlap with the next data granule. For every second subband of the polyphase filterbank every second input value is multiplied by -1 to correct for the frequency inversion of the polyphase filterbank.

图 A.4 展示了包含合成滤波器组的框图。频率线通过 “混叠抑制” 方案进行预处理（系数见图 A.5 中的框图和表 B.9），然后输入到逆改进离散余弦变换（IMDCT）矩阵中，每 18 条线作为一个变换块。输出值的前半部分与上一个块存储的重叠值相加。这些值成为新的输出值，并作为多相滤波器组的输入值。输出值的后半部分被存储起来，用于与下一个数据颗粒重叠。对于多相滤波器组的每隔一个子带，每隔一个输入值乘以 - 1，以校正多相滤波器组的频率反转。

2.4.3.4.10.1 Alias reduction

For long block_type granules (block_type != 2) the input to the synthesis filterbank is processed for alias reduction before processing by the IMDCT. The following pseudo code describes the alias reduction computation:

The indices of arrays xar[] and xr[] label the frequency lines in a granule, arranged in order of lowest frequency to highest frequency, with zero being the index of the lowest frequency line, and 575 being the index of the highest. The coefficients: Cs[i] and Ca[i] can be found in table B.9. Figures A.5 and A.6 illustrate the alias reduction computation.

Alias reduction is not applied for granules with block_type == 2 (short block).

对于长块类型的颗粒（block_type != 2 ），合成滤波器组的输入在进行逆改进离散余弦变换（IMDCT ）处理之前，要先进行混叠抑制处理。以下伪代码描述了混叠抑制计算过程：

for (sb=1; sb<32; sb++)
    for (i=0; i<8; i++) {
        xar[18*sb - 1 - i] = xr[18*sb - 1 - i] * Cs[i] - xr[18*sb + i] * Ca[i]
        xar[18*sb + i] = xr[18*sb + i] * Cs[i] + xr[18*sb - 1 - i] * Ca[i]
    }

数组 xar [] 和 xr [] 的索引用于标记颗粒中的频率线，按照从最低频率到最高频率的顺序排列，其中 0 表示最低频率线的索引，575 表示最高频率线的索引。系数 Cs [i] 和 Ca [i] 可在表 B.9 中找到。图 A.5 和图 A.6 展示了混叠抑制计算过程。

2.4.3.4.10.2 IMDCT

In the following, n is the number of windowed samples (for short blocks n is 12, for long blocks n is 36). In the case of a block of type "short", each of the three short blocks is transformed separately. n/2 values X_k are transformed to n values x_i. The analytical expression of the IMDCT is:

在以下内容中，n 为加窗样本的数量（对于短块，n 为 12；对于长块，n 为 36 ）。对于 “短” 类型的块，三个短块中的每一个都单独进行变换。n/2 个值 X_k 被变换为 n 个值 x_i 。逆改进离散余弦变换（IMDCT）的解析表达式为上述公式。

​​ISO/IEC 11172-3:1993 - MP3编码标准翻译 (4)