​​ISO/IEC 11172-3:1993 - MP3编码标准翻译 (3)

92 阅读43分钟

Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 3: Audio

2.3.2 Mnemonics(助记符)

The following mnemonics are defined to describe the different chta types used in the coded bit-stream

以下助记符用于描述编码比特流中使用的不同数据类型。

bslbf: Bit string, left bit first, where "left" is the order in which bit strings are written in ISO/IEC 11172. Bit strings are written ils a string of 1s and Os within single quote marks, e.g. '1000 0001'. Blanks within a bit string are for ease of reading and have no significance.

bslbf(Bit String, Left Bit First):
位字符串,采用左侧(高位)优先的编码顺序。其中“左侧”指 ISO/IEC 11172 标准中定义的位字符串书写顺序。位字符串以 单引号包裹的 1 和 0 组成的字符串形式表示(例如 '1000 0001')。
注意:位字符串内的空格仅为了便于阅读,无实际意义。

ch: Channel. If ch has the value 0, the left channel of a stereo signal or the fist of two independent signals is indicated. (Audio)

用于表示声道。当ch的值为0时,表示立体声信号的左声道或两个独立信号中的第一个。

nch: Number of channels; equal to 1 for single-channel mode, 2 in other modes.

表示声道数,等于1时表示单声道模式,2表示其他模式

gr: Granule of 3 * 32 subband samples in audio Layer II, 18 * 32 sub-band samples in audio Layer III. (Audio)

表示颗粒。在音频层II中为3×32子带样本的颗粒,在音频层III中为18×32子带样本的颗粒。

main_data: The main-data portion of the bitstream contiIins the scalefactors, Huffman encoded data, aid ancillary information, (Audio)

表示主数据,比特流的主数据部分包含比例因子、霍夫曼编码数据和附加辅助信息。

main_data_beg: The location in the bitstream of the beginning of the main-data for the frame. The location is equal to the ending location of the previous frame's main-data plus one bit. It is caculated from the maindata-end value of the previous frame. (Audio)

表示比特流中当前帧主数据的起始位置。该位置等于前一帧主数据结束位置加一比特,其值根据前一帧的 maindata-end 值计算得出。

part2_length:The number of main-data bits used for scalefactors. (Audio)

用于比例因子的主比特数。

rpchof:Remainder polynomial coefficients, highest order first. (Audio)

余数多项式系数,按最高阶优先排列。

sb:Subband. (Audio)

子带

sblimit:The number of the lowest sub - band for which no bits are allocated. (Audio)

未分配比特的最低子带的数量

scfsi:Scalefactor selection information. (Audio)

比例因子选择信息

switch_point_l:Number of scalefactor band (long block scalefactor band) from which point on window switching is used. (Audio)

从该点开始使用窗口切换的比例因子带(长块比例因子带)的数量。

switch_point_s:Number of scalefactor band (short block scalefactor band) from which point on window switching is used. (Audio)

从该点开始使用窗口切换的比例因子带(短块比例因子带)的数量。

uimsbf:Unsigned integer, most significant bit first.

无符号整数,最高有效位优先

vlclbf:Variable length code, left bit first, where "left" refers to the order in which the VLC codes are written.

可变长码,左位优先,其中“左”指的是VLC码的书写顺序

window:Number of the actual time slot in case of block_type==2, 0 ≤ window ≤ 2. (Audio)

当block_type == 2时,实际时隙的数量,0 ≤ window ≤ 2。

The byte order of multi - byte words is most significant byte first.

多字节字节的字节顺序是最高有效字节优先

2.4 Requirements

2.4.1 Specification of the coded audio bitstream syntax

2.4.1.1 Audio sequence
audio sequence() 
{
    while (nextbits()==syncword) (
        frame()
    }
}
2.4.1.2 Audio frame
frame()
{
    header()
    error_check()
    audio_data()
    ancillary_data()
}
2.4.1.3 Header
header()
{
    syncword                             12 bit  bslbf
    ID                                    1 bit  bslbf
    layer                                 2 bit  bslbf
    protection_bit                        1 bit  bslbf
    bitrate_index                         4 bit  bslbf
    sampling_frequency                    2 bit  bslbf
    padding_bit                           1 bit  bslbf         
    private_bit                           1 bit  bslbf
    mode                                  2 bit  bslbf
    mode-extension                        2 bit  bslbf
    copyright                             1 bit  bslbf
    original/copy                         1 bit  bslbf
    emphasis                              2 bit  bslbf
    
}
2.4.1.4 Error check

rpchof:Remainder polynomial coefficients, highest order first. (Audio)

通过多项式除法计算的余数系数,按最高阶优先的顺序排列。

error_check()
{
    if (protection_bit == 0)
       crc_check                       16 bit    rpchof
}
2.4.1.5 Audio data, Layer I

uimsbf: Unsigned integer, most significant bit first.

audio_data() {
    for (sb=0; sb < bound; sb++)
      for (ch=0; ch < nch; ch++)
          allocation[ch][sb]                 4 bit  uimsbf
    for (sb=bound; sb < 32; sb++) {
       allocation[0][sb];                    4 bit  uimsbf
       allocation[1][sb]=allocation[0][sb]
    }
    for (sb=0; s<32; s++) 
      for (ch=0; ch < nch; ch++)
        if (allocaton[ch][sb] != 0)
            scalefactor[ch][sb]              6 bit  uimsbf
            
    for (s=0; s<12; s++) {
        for (sb=0; sb < bound; sb++)
           for (ch=0; ch < nch; ch++)
              if(allocation[ch][sb]!=0)
                sample[ch][sb][s]            2 .. 15 bit  uimsbf
        for (sb=bound; sb < 32; sb++)
           if (allocation[0][sb] != 0)
               sample[0][sb][s]              2 .. 15 bit  uimsbf
    }
}
2.4.1.6 Audio data, Layer II
audio_data() {
  for (sb=O; sbcbound; sb++) 
    for (ch=O; chach; ch++) 
      allocation[ch][sb]                             2..4    uimsbf
      
  for (sb=bound; sksblimit; sb++) { 
    allocation[O][sb] 
    allomtion[l][sb]=allocation[O][sb]               2..4    uimsbf
    
  for (sb=0; sbcsblimit; sb++) 
    for (ch=0; chach; ch++) 
      if (allocation[ch][sb]!=0) 
        scfsi[ch][sb]                                   2    bslbf
        
  for (sb=(l; sb<sblimit; sb++) 
    for (ch=O; chach; ch++) 
      if (allocation[ch][sb]!=0) { 
        if (scfsi[ch][sb]=0) { 
          scalefactor[ch][sb][0]                        6   uimsbf
          scalefactor[chl[sb][1]                        6   uimsbf
          scalefactor[ch][sb][2]                        6   uimsbf
        }		
        if ((scfsi[ch][sb]=l) || (scfsi[chl[sbl=3)) {
          scalefactor[ ch][sb][0]                       6   uimsbf
          scalefactor[ ch][sb][2]                       6   uimsbf
        }
        if (scfsi[ch][sb]=2) 
          scalefactor[ch][sb][0]                        6   uimsbf
      }
   for(gr=0; gr<12; gr++) {
     for(sb=0;sb<bound;sb++)
       for(ch=0;ch<nch;ch++)
         if(allocation[ch][sb]!=0) {
           if(grouping[ch][sb])
             samplecode[ch][sb][gr]                 5..10   uimsbf
           else
             for(s=0;s<3;s++)
               sample[ch][sb][3*gr+3]               5..16   uimsbf
         }
       for(sb=bound;sb<sblimit;sb++)
         if(allocation[0][sb]!=0) {
           if(grouping[0][sb])
             samplecode[0][sb][gr]                  5..10   uimsbf
           else
             for(s=0;s<3;s++)
               sample[0][sb][3*gr+s]                3..16   uimsbf
         }
   }

}
2.4.1.7 Audio data, Layer III
audio_data() {  
  main_data_begin                                    9 uimsbf  
  if (mode==single_channel) {  
    private_bits                                     5 bslbf  
  } else {  
    private_bits                                     3 bslbf  
  }  
  for (ch=0; ch<nch; ch++) {  
    for (scfsi_band=0; scfsi_band<4; scfsi_band++) {  
      scfsi[ch][scfsi_band]                           1 bslbf  
    }  
  }  
  for (gr=0; gr<2; gr++)  
    for (ch=0; ch<nch; ch++) {  
      part2_3_length[gr][ch]                          12 uimsbf  
      big_values[gr][ch]                               9 uimsbf  
      global_gain[gr][ch]                              8 uimsbf  
      scalefac_compress[gr][ch]                        4 bslbf  
      window_switching_flag[gr][ch]                    1 bslbf  
      if (window_switching_flag[gr][ch]) {  
        block_type[gr][ch]                             2 bslbf  
        mixed_block_flag[gr][ch]                       1 uimsbf  
        for (region=0; region<2; region++) {  
          table_select[gr][ch][region]                 5 bslbf  
        }  
        for (window=0; window<3; window++) {  
          subblock_gain[gr][ch][window]                3 uimsbf  
        }  
      } else {  
        for (region=0; region<3; region++) {  
           table_select[gr][ch][region]             5 bslbf  
        }  
        region0_count[gr][ch]                       4 bslbf  
        region1_count[gr][ch]                       3 bslbf  
     }  
     preflag[gr][ch] 1 bslbf  
     scalefac_scale[gr][ch]                         1 bslbf  
     count1table_select[gr][ch]                     1 bslbf   
   }  
   main_data ()  
}

The main data bitstream is defined below. The main_data field in the audio_data() syntax contains bytes from the main_data bitstream. However, because of the variable nature of Huffman coding used in Layer III, the main data for a frame does not generally follow the header and side information for that frame. The main_data for a frame starts at a location in the bitstream preceding the header of the frame at a negative offset given by the value of main_data_begin. (See definition of main_data_begin and figure A.7.a)

主数据比特流定义如下。audio_data()语法中的main_data字段包含来自主数据比特流的字节。然而,由于在MPEG音频编码的第三层(Layer III)中使用的哈夫曼编码具有可变的特性,一帧的主数据通常并不跟随该帧的帧头和边信息。一帧的主数据起始于比特流中该帧帧头之前的一个位置,该位置由main_data_begin的值给出的负偏移量确定。(参见main_data_begin的定义以及图A.7.a)

main_data()  
{  
  for (gr=0; gr<2; gr++) {  
    for (ch=0; ch<nch; ch++) {  
      if ((window_switching_flag[gr][ch]==1)  
           && (block_type[gr][ch]==2)) {  
         if (mixed_block_flag[gr][ch]) {  
           for (sfb=0; sfb<8; sfb++)  
             scalefac_l[gr][ch][sfb]                   0..4 uimsbf  
           for (sfb=3; sfb<12; sfb++)  
             for (window=0; window<3; window++)  
               scalefac_s[gr][ch][sfb][window]         0..4 uimsbf  
         }  
         else {  
           for (sfb=0; sfb<12; sfb++)  
             for (window=0; window<3; window++)  
               scalefac_s[gr][ch][sfb][window]         0..4 uimsbf  
         }  
       }  
       else {  
         if ((scfsi[ch][0]==0) || (gr == 0))  
           for(sfb=0;sfb<6;sfb++)  
             scalefac_l[gr][ch][sfb]                   0..4 uimsbf  
         if ((scfsi[ch][1]==0) || (gr == 0))  
            for(sfb=6;sfb<11;sfb++)  
               scalefac_l[gr][ch][sfb]                 0..4 uimsbf  
         if ((scfsi[ch][2]==0) || (gr == 0))  
            for(sfb=11;sfb<16;sfb++)  
               scalefac_l[gr][ch][sfb]                 0..3 uimsbf  
         if ((scfsi[ch][3]==0) || (gr == 0))  
            for(sfb=16;sfb<21;sfb++)  
              scalefac_l[gr][ch][sfb]                  0..3 uimsbf
      }
      Huffman_codebits()  
    }  
  }  
  for (b=0; b<no_of_ancillary_bits; b++)  
    ancillary_bit 1 bslbf  
}
Huffmancodebits() {
    for(l = 0; l < big_values*2; l += 2) {
        hcod[|x|][|y|]                         0..19 bslbf
        if (|x|==15 && linbits>0)
            linbitsx                           1..13 uimsbf
        if (x != 0)
            signx                                  1 bslbf
        if (|y|==15 && linbits>0)
            linbitsy                           1..13 uimsbf
        if(y!=0)
            signy                                   1 bslbf
        is[l] = x
        is[l+1] = y
    }
    for (; l < big_values*2 + count1 * 4; l += 4) {
        hcod[v][w][x][y]                        1..6 bslbf
        if(v!=0)
            signv                                  1 bslbf
        if(w!=0)
            signw                                  1 bslbf
        if(x!=0)
            signx                                  1 bslbf
        if(y!=0)
            signy                                  1 bslbf
        is[l] = v
        is[l+1] = w
        is[l+2] = x
        is[l+3] = y
    }
    for (; l < 576; l++)
        is[l] = 0
}
2.4.1.8 Ancillary data
ancillary_data() {
  if (layer == 1) || (layer == 2)
    for (b=0; b<no_of_ancillary_bits;b++)
      ancillary_bit                              1 bslbf
}

2.4.2 Semantics for the audio bitstream syntax

音频比特流语法的语义

2.4.2.1 Audio sequence general

frame -- Layer I and Layer II: Part of the bitstream that is decodable by itself. In Layer I it contains information for 384 samples and in Layer II for 1 152 samples. It starts with a syncword, and ends just before the next syncword. It consists of an integer number of slots (four bytes in Layer I, one byte in Layer II).

帧--第I层和第II层: 比特流中可自行解码的部分。在第I层中,它包含384个样本的信息,在第II层中包含1152个样本的信息。它以同步码开头,在下一个同步码之前结束。它由整数个时隙组成(第I层中为四字节,第II层中为一字节)。

-- Layer III: Part of the bitstream that is decodable with the use of previously acquired main information. In Layer III it contains information for 1 152 samples. Although the distance between the start of consecutive syncwords is an integer number of slots (one byte in Layer III), the audio information belonging to one frame is generally not contained between two successive syncwords.

帧 - 第III层: 借助先前获取的主要信息可解码的比特流部分。在第III层中,它包含1152个样本的信息。尽管连续同步码起始之间的距离是整数个时隙(第III层中为一字节),但属于一帧的音频信息通常并不包含在两个连续的同步码之间。

2.4.2.2 Audio frame

header -- Part of the bitstream containing synchronization and state information. error_check -- Part of the bitstream containing information for error detection. audio_data -- Part of the bitstream containing information on the audio samples. ancillary_data -- Part of the bitstream that may be used for ancillary data.

  • 帧头:比特流中包含同步和状态信息的部分。
  • 错误校验:比特流中包含错误检测信息的部分。
  • 音频数据:比特流中包含音频样本信息的部分。
  • 辅助数据:比特流中可用于辅助数据的部分。

2.4.2.3 Header

The first 32 bits (four bytes) are header information which is common to all layers.

前32位(四个字节)是所有层通用的头部信息。

syncword -- The bit string '1111 1111 1111'.

ID -- One bit to indicate the ID of the algorithm. Equals '1' for ISO/IEC 11172-3 audio, '0' is reserved.

Layer -- 2 bits to indicate which layer is used, according to the following.

同步字 -- 位串‘1111 1111 1111’。

ID -- 一位比特,用于指示算法的ID。对于ISO/IEC 11172 - 3音频,其值为‘1’,‘0’保留。

-- 两位比特,用于指示使用的是哪一层,具体如下:

codelayer
‘11’层I
‘10’层II
‘01’层III
‘00’保留

To change the layer, a reset of the audio decoder may be required.

要更改层,可能需要重置音频解码器。

protection_bit -- One bit to indicate whether redundancy has been added in the audio bitstream to facilitate error detection and concealment. Equals '1' if no redundancy has been added, '0' if redundancy has been added.

保护比特 -- 一位比特,用于指示在音频比特流中是否添加了冗余以便于错误检测和隐藏。如果未添加冗余,其值为‘1’;如果已添加冗余,其值为‘0’。

bitrate_index -- Indicates the bitrate. The all zero value indicates the 'free format' condition, in which a fixed bitrate which does not need to be in the list can be used. Fixed means that a frame contains either N or N+1 slots, depending on the value of the padding bit. The bitrate_index is an index to a table, which is different for the different layers.

比特率索引 -- 表示比特率。全零值表示“自由格式”情况,在这种情况下可以使用不需要在列表中的固定比特率。固定意味着一帧包含N个或N + 1个时隙,具体取决于填充比特的值。比特率索引是一个索引,指向不同层不同的表

The bitrate_index indicates the total bitrate irrespective of the mode (stereo, joint_stereo, dual_channel, single_channel).

比特率索引表示总比特率,与模式(立体声、联合立体声、双声道、单声道)无关。

image.png

In order to provide the smallest possible delay and complexity, the decoder is not required to support a continuously variable bitrate when in Layer I or II. Layer III supports variable bitrate by switching the bitrate_index. The switching of the bitrate_index can be used either to optimize storage requirements on DSM or to interpolate any mean data rate by switching between nearby values in the bitrate table. However, in free format, fixed bitrate is required. The decoder is also not required to support bitrates higher than 448 kbits/s, 384 kbits/s, 320 kbits/s in respect to Layer I, II and III when in free format mode.

为了提供尽可能小的延迟和复杂度,解码器在第I层或第II层时无需支持连续可变的比特率。第III层通过切换比特率索引来支持可变比特率。比特率索引的切换既可用于优化数字存储媒体(DSM)上的存储需求,也可通过在比特率表中附近值之间切换来插值任何平均数据速率。然而,在自由格式下,需要固定比特率。在自由格式模式下,对于第I、II和III层,解码器也无需支持高于448千比特/秒、384千比特/秒、320千比特/秒的比特率。

For Layer II, not all combinations of total bitrate and mode are allowed. See the following table.

对于第II层,并非所有总比特率和模式的组合都是允许的。请参阅以下表格。

image.png

sampling-frequency -- Indicates the sanpling frequency, according to the following table.

采样率 -- 表示采样率,参考下面的表格:

sample frequencyfrequency specified(kHZ)
'00'44.1
'01'48
'10'32
'11'reserved

a reset of the audio decoder maybe required to change the sample rate.

要更改采样率,可能需要重置音频解码器。

padding_bit -- If this bit equals '1', the frame contains an additional slot to adjust the mean bitrate to the sampling frequency, otherwise this bit will be '0'. Padding is necessary with a sampling frequency of 44.1 kHz. Padding may also be required in free format.

填充位(padding_bit) -- 若此位为 '1',表示帧中包含额外的时隙以将平均比特率调整至采样频率;若为 '0' 则不包含。当采样频率为44.1 kHz时必须使用填充。在自由格式(free format)中也可能需要填充。

The padding should be applied to the bitstream such that the accumulated length of the coded frames, after a certain number of audio frames does not deviate more than (+0, -1 slot) from the following computed value:

accumulated frame 

image.png where:

  • frame_size = 384 for Layer I, 1152 for Layer II or III.

The following method can be used to determine whether or not to use padding:

填充应应用于比特流,使得一定数量的音频帧之后编码帧的累积长度与以下计算值的偏差不超过(+0, -1 槽) 使用上面的公式计算累计帧长,其中:

  • 对于Layer I,frame_size = 384;
  • 对于Layer II或III,frame_size = 1152。

for 1st audio frame:

  • rest = 0;
  • padding = no;

for each 1 subsequent audio frame:

  • if (Layer == 1) dif = (12 * bitrate) % sampling_frequency;

  • else dif = (144 * bitrate) % sampling_frequency;

  • rest = rest - dif;

  • if (rest < 0) {

    • padding = yes;
    • rest = rest + sampling_frequency;
      }
  • else padding = no;

可使用以下方法来确定是否使用填充:

对于第一个音频帧:

  • rest = 0;
  • padding = 否;

对于每个后续音频帧:

如果(Layer == 1)
  dif = (12 * bitrate) % sampling_frequency;
否则
  dif = (144 * bitrate) % sampling_frequency;
  
rest = rest - dif;

如果(rest < 0){
   padding = 是;
   rest = rest + sampling_frequency; 
} 否则
  padding = 否;

private-bit -- Hit for private use. This bit will not be used in the future by ISO/IEC.

私有位(private-bit)-- 保留位,供内部使用。此位未来不会被ISO/IEC使用。

mode -- Indicates the mode according to the following table. In Layer I and II the joint_stereo mode is intensity_stereo in Layer III it is intensity_stereo and/or ms_stereo.

modemode specified
'00'stereo
'01'joint_stereo (intensity_stereo and/or ms_stereo)
'10'dual_channel
'11'single_channel

In Layer I, in all modes except joint stereo, the value of bound equals 32. In layer II, in all modes except joint_stereo, the value of bound equals sblimit. In joint_stereo mode the bound is determined by the mode_extension.

模式 -- 根据下表指示模式。在第一层和第二层中,联合立体声模式为强度立体声;在第三层中,它为强度立体声和/或多声道立体声。

模式指定的模式
'00'立体声
'01'联合立体声(强度立体声和/或多声道立体声)
'10'双声道
'11'单声道

在第一层中,除联合立体声模式外的所有模式下,边界值等于32。在第二层中,除联合立体声模式外的所有模式下,边界值等于sblimit。在联合立体声模式下,边界值由模式扩展决定。

mode_extension -- These bits are used in joint_stereo mode. In Layer I and II they indicate which subbands are in intensity_stereo. All other subbands are coded in stereo.

mode_extension
'00'subbands 4-31 in intensity_stereo, bound=4
'01'subbands 8-31 in intensity_stereo, bound=8
'10'subbands 12-31 in intensity_stereo, bound=12
'11'subbands 16-31 in intensity_stereo, bound=16

In Layer III they indicate which type of joint stereo coding method is applied. The frequency ranges over which the intensity_stereo and ms_stereo modes are applied are implicit in the algorithm. For more information see 2.4.3.4.

mode_extensionintensity_stereoms_stereo
'00'offoff
'01'onoff
'10'offon
'11'onon

Note thrat the mode "stereo" is used if the mode bits specify stereo or equivalently if the mode bits specify joint stereo and the mode-extension specifies intensity-stereo "off" and ms-stereo "off".

mode_extension -- 这些比特位用于联合立体声模式。在层I和层II中,它们指示哪些子带处于强度立体声模式。所有其他子带以立体声模式编码。

mode_extension
'00'强度立体声模式下的子带为4 - 31,边界 = 4
'01'强度立体声模式下的子带为8 - 31,边界 = 8
'10'强度立体声模式下的子带为12 - 31,边界 = 12
'11'强度立体声模式下的子带为16 - 31,边界 = 16

在层III中,它们指示应用了哪种类型的联合立体声编码方法。强度立体声和MS立体声模式应用的频率范围隐含在算法中。更多信息请参见2.4.3.4。

mode_extensionintensity_stereo(强度立体声)ms_stereo(MS立体声)
'00'关闭关闭
'01'开启关闭
'10'关闭开启
'11'开启开启

需要注意的是,当模式位指定为立体声(stereo),或等效地,当模式位指定为联合立体声(joint stereo)且模式扩展中强度立体声(intensity-stereo)设为‘关闭’、MS立体声(ms-stereo)也设为‘关闭’时,系统将使用‘立体声’模式。

copyright -- If this bit equals '0', there is no copyright on the ISO/IEC 11172 - 3 bitstream, '1' means copyright protected.

版权 -- 如果此位等于‘0’,则ISO/IEC 11172 - 3比特流没有版权;‘1’表示受版权保护。

original/copy -- This bit equals '0' if the bitstream is a copy, '1' if it is an original.

原版/复制版 -- 如果比特流是复制的,此位等于‘0’;如果是原版的,则等于‘1’。

emphasis -- Indicates the type of de - emphasis that shall be used.

emphasisemphasis specified
'00'none
'01'50/15 microseconds
'10'reserved
'11'CCITT J.17

加重 -- 指示应使用的去加重类型。

加重加重类型
'00'
'01'50/15微秒
'10'保留
'11'CCITT J.17

2.4.2.4 Error check

crc_check -- A 16 bit parity - check word is used for optional error detection within the encoded bitstream.

2.4.2.5 Audio data, Layer I

allocation[ch][sb] -- Indicates the number of bits used to code the samples in subband sb of channel ch.For subbands in intensity_stereo mode the bitstream contains only one allocation data element per subband.

allocation[ch][sb]bits per sample
00
12
23
34
45
56
67
78
89
910
1011
1112
1213
1314
1415
15forbidden

Note: For code '0000' no samples are transferred.

scalefactor[ch][sb] -- Indicates the factor of subband sb of channel ch by which the requantized samples of subband sb in channel ch shall be multiplied. The six bits constitute an unsigned integer, index to table B.1 "Layer I, II scalefactors".

sample[ch][sb][s] -- Coded representation of the s-th sample in subband sb of channel ch. For subbands in intensity_stereo mode the coded representation of the sample is valid for both channels.

allocation[ch][sb] -- 表示用于对声道ch中子带sb的样本进行编码的比特数。对于处于强度立体声模式的子带,比特流中每个子带仅包含一个分配数据元素。

allocation[ch][sb]每样本的比特数
00
12
23
34
45
56
67
78
89
910
1011
1112
1213
1314
1415
15禁止

注:对于代码“0000”,不传输样本。
scalefactor[ch][sb] -- 表示声道ch中子带sb的因子,声道ch中子带sb的重量化样本应乘以该因子。这六个比特构成一个无符号整数,是表B.1“第I层、II层比例因子”的索引。
sample[ch][sb][s] -- 声道ch中子带sb的第s个样本的编码表示。对于处于强度立体声模式的子带,样本的编码表示对两个声道都有效。

2.4.2.6 Audio data,Layer II

allocation[ch][sb] -- Contains information concerning the quantizers used for the samples in subband sb in channel ch, whether the information on three consecutive samples has been grouped to one code, and on the number of bits used to code the samples. The meaning and length of this field depends on the number of the subband, the bitrate, and the sampling frequency. The bits in this field form an unsigned integer used as an index to the relevant table in table B.2 "Layer II bit allocation tables", which gives the number of levels used for quantization. For subbands in intensity_stereo mode the bitstream contains only one allocation data element per subband.

allocation[ch][sb] ——包含有关通道ch中子带sb的样本所使用的量化器的信息,三个连续样本的信息是否已被组合为一个代码,以及用于对样本进行编码的比特数。该字段的含义和长度取决于子带的数量、比特率和采样频率。该字段中的比特构成一个无符号整数,用作表B.2“Layer II比特分配表”中相关表的索引,该表给出了用于量化的级别数量。对于处于强度立体声模式的子带,比特流中每个子带仅包含一个分配数据元素。

scfsi[ch][sb] -- Scalefactor selection information. This gives information on the number of scalefactors transferred for subband sb in channel ch and for which parts of the signal in this frame they are valid. The frame is divided into three equal parts of 12 subband samples each per subband.

scfsi[ch][sb]
'00'three scalefactors transmitted, for parts 0,1,2 respectively.
'01'two scalefactors transmitted, first one valid for parts 0 and 1, second one for part 2.
'10'one scalefactor transmitted, valid for all three parts.
'11'two scalefactors transmitted, first one valid for part 0, the second one for parts 1 and 2.

scfsi[ch][sb] -- 比例因子选择信息。这提供了有关为通道ch中的子带sb传输的比例因子数量,以及它们在本帧信号中的哪些部分有效的信息。本帧被划分为三个相等的部分,每个子带每部分有12个样本。

scfsi[ch][sb]
'00'传输三个比例因子,分别对应部分0、1、2。
'01'传输两个比例因子,第一个对部分0和1有效,第二个对部分2有效。
'10'传输一个比例因子,对所有三个部分都有效。
'11'传输两个比例因子,第一个对部分0有效,第二个对部分1和2有效。

scalefactor[ch][sb][p] -- Indicates the factor by which the requantized samples of subband sb in channel ch and of part p of the frame should be multiplied. The six bits constitute an unsigned integer, index to table B.1 "Layer I, II scalefactors".

scalefactor[ch][sb][p] -- 指示通道ch中子带sb和帧的p部分的重新量化样本应乘以的系数。六个比特构成一个无符号整数,是表B.1“层I、II比例因子”的索引。

grouping[ch][sb] -- Is a function that determines, whether grouping is in effect for coding of samples in subband sb of channel ch. Grouping means, that three consecutive samples (a triplet) of the current subband sb in channel ch in the current granule gr are coded and transmitted using one common codeword and not using three separate codewords. Grouping[ch][sb] is true, if in the Bit Allocation table currently in use (see B.2) the value found under the sb (row) and the allocation[sb] (column) is either 3, 5, or 9. Otherwise it is false. For subbands in intensity_stereo mode the grouping is valid for both channels.

grouping[ch][sb] -- 是一个函数,用于确定通道ch中子带sb的样本编码是否采用分组。分组意味着当前颗粒gr中通道ch的当前子带sb的三个连续样本(一个三元组)使用一个公共码字编码和传输,而不是三个单独码字。如果在当前使用的位分配表(见B.2)中,sb(行)和allocation[sb](列)下的值为3、5或9,则grouping[ch][sb]为真。否则为假。在强度立体声模式的子带中,分组对两个通道都有效。

samplecode[ch][sb][gr] -- Coded representation of the three consecutive samples in the granule gr in subband sb of channel ch. For subbands in intensity_stereo mode the coded representation of the samplecode is valid for both channels.

samplecode[ch][sb][gr] -- 通道ch中子带sb的颗粒gr中三个连续样本的编码表示。在强度立体声模式的子带中,样本code的编码表示对两个通道都有效。

sample[ch][sb][s] -- Coded representation of the s-th sample in subband sb of channel ch. For subbands in intensity_stereo mode the coded representation of the sample is valid for both channels.

sample[ch][sb][s] -- 通道ch中子带sb的第s个样本的编码表示。在强度立体声模式的子带中,样本的编码表示对两个通道都有效。

2.4.2.7 Audio data, Layer III

main_data_begin -- The value of main_data_begin is used to determine the location of the first bit of main data of a frame. The main_data_begin value specifies the location as a negative offset in bytes from the first byte of the audio sync word. The number of bytes belonging to the header and side information is not taken into account. For example, if main_data_begin == 0, then main data starts after the side information. Examples are given in figure A.7.a and figure A.7.b.

main_data_begin(主数据起始位置) -- 该值用于确定帧主数据的起始位置。main_data_begin 的值以音频同步字节的第一个字节为起点,用负偏移字节数表示主数据的位置。不计算头部和边信息所占用的字节数。例如,若 main_data_begin == 0,则主数据在边信息之后开始。具体示例可参考图A.7.a和图A.7.b。

private_bits -- Bits for private use. These bits will not be used in the future by ISO/IEC. The number of private_bits depends on the number of channels. The number of bits allocated for private_bits is determined to equalize the total number of bits used for side - information.

私有位(private_bits) -- 供专用的比特字段。ISO/IEC未来不会使用这些比特。私有位的数量取决于声道数。分配给私有位的比特数旨在使边信息(side-information)使用的总比特数达到平衡。

scfsi[ch][scfsi_band] -- In Layer III, the scalefactor selection information works similarly to audio Layer II. The main difference is the use of the variable scfsi_band to apply scfsi to groups of scalefactors instead of single scalefactors. The application of scalefactors to granules is controlled by scfsi.

scfsi[scfsi_band]
'0'scalefactors are transmitted for each granule
'1'scalefactors transmitted for granule 0 are also valid for granule 1

If short windows are switched on, i.e. block_type==2 for one of the granules, then scfsi is always 0 for this frame.

scfsi[ch][scfsi_band] -- 在第III层中,比例因子选择信息的工作原理与音频第II层类似。主要区别在于,通过变量 scfsi_band 将比例因子选择信息(scfsi)应用于比例因子组,而非单个比例因子。比例因子对颗粒(granule)的应用由 scfsi 控制。 如果short windows被打开了, 如:block_type==2为一个颗粒,那么当前帧的scfsi总是0.

scfsi_band controls the use of the scalefactor selection information for groups of scalefactors (scfsi_bands).

scfsi_bandscalefactor bands (see table B.8)
00,1,2,3,4,5
16,7,8,9,10
211...15
316...20

scfsi_band 控制比例因子组(scfsi_bands)中比例因子选择信息的使用。

part2_3_length[gr][ch] -- This value contains the number of main_data bits used for scalefactors and Huffman code data. Because the length of the side information is always the same, this value can be used to calculate the beginning of the main information for the next granule or the position of the ancillary information (if used). Note that single channel audio frames contain 17 bytes of side information and dual channel audio frames contain 32 bytes of side information (see 2.4.1.7 Audio Data, Layer III - syntax for audio_data()).

part2_3_length[gr][ch] -- 该值表示用于比例因子和霍夫曼码数据的主数据比特数。由于边信息(side information)的长度始终固定,该值可用于计算下一个颗粒(granule)的主信息起始位置或辅助信息(ancillary information,如果使用的话)的位置。需要注意的是,单通道音频帧包含17字节的边信息,而双通道音频帧包含32字节的边信息(详见2.4.1.7音频数据,第III层 - audio_data()语法)。

big_values[gr][ch] -- The spectral values of each granule are coded with different Huffman code tables. The full frequency range from zero to the Nyquist frequency is divided into several regions, which then are coded using different tables. Partitioning is done according to the maximum quantized values. This is done with the assumption that values at higher frequencies are expected to have lower amplitudes or do not need to be coded at all. Starting at high frequencies, the pairs of quantized values equal to zero are counted. This number is named "rzero". Then, quadruples of quantized values with absolute value not exceeding 1 (i.e. only 3 possible quantization levels) are counted. This number is named "count1". Again an even number of values remain. Finally, the number of pairs of values in the region of the spectrum which extends down to zero is named "big_values". The maximum absolute value in this range is constrained to 8191. The following figure shows the partitioning: image.png

big_values[gr][ch] : 每个颗粒(granule)的频谱值通过不同的霍夫曼码表进行编码。从零到奈奎斯特频率的整个频域范围被划分为多个区域,每个区域使用不同的码表编码。分区依据是最大量化值。该过程基于以下假设:高频区域的量化值幅度通常较低或无需编码。具体步骤如下:

  1. 高频起始统计:从高频开始,统计量化值为零的对数,记为 rzero
  2. 四元组统计:统计绝对值不超过1的量化值四元组(仅3种可能的量化等级),记为 count1
  3. 剩余偶数值对:剩余的偶数量化值对构成 big_values区域。
  4. 幅值限制:该区域内的最大绝对量化值限制为8191。

分区示意图如上图所示。

global_gain[gr][ch] -- The quantizer step size information is transmitted in the side information variable global_gain. It is logarithmically quantized. For the application of global_gain, refer to the formula in 2.4.3.4 "Formula for re - quantization and all scaling".

global_gain[gr][ch] -- 量化器步长信息通过边信息变量 global_gain 进行传输。它经过对数量化处理。关于 global_gain 的应用,请参考 2.4.3.4 节 “重新量化及所有缩放公式” 中的公式。

scalefac_compress[gr][ch] -- Selects the number of bits used for the transmission of the scalefactors according to the following table:

if block_type is 0, 1, or 3:
slen1: length of scalefactors for the scalefactor bands 0 to 10
slen2: length of scalefactors for the scalefactor bands 11 to 20

if block_type is 2 and mixed_block_flag is 0:
slen1: length of scalefactors for the scalefactor bands 0 to 5
slen2: length of scalefactors for the scalefactor bands 6 to 11

if block_type is 2 and mixed_block_flag is 1:
slen1: length of scalefactors for the scalefactor bands 0 to 7 (long window scalefactor band) and 3 to 5 (short window scalefactor band) Note: Scalefactor bands 0 - 7 are from the "long window scalefactor band" table, and scalefactor bands 3 to 11 from the "short window scalefactor band" table. This combination of partitions is contiguous and spans the entire frequency spectrum.
slen2: length of scalefactors for the scalefactor bands 6 to 11

image.png

scalefac_compress[gr][ch] -- 根据下表选择用于传输比例因子的比特数:

如果 block_type 为 0、1 或 3:

  • slen1:比例因子频带 0 至 10 的比例因子长度
  • slen2:比例因子频带 11 至 20 的比例因子长度

如果 block_type 为 2 且 mixed_block_flag 为 0:

  • slen1:比例因子频带 0 至 5 的比例因子长度
  • slen2:比例因子频带 6 至 11 的比例因子长度

如果 block_type 为 2 且 mixed_block_flag 为 1:

  • slen1:比例因子频带 0 至 7(长窗口比例因子频带)以及 3 至 5(短窗口比例因子频带)的比例因子长度。注意:比例因子频带 0 - 7 来自 “长窗口比例因子频带” 表,比例因子频带 3 至 11 来自 “短窗口比例因子频带” 表。这种分区组合是连续的,覆盖整个频谱。
  • slen2:比例因子频带 6 至 11 的比例因子长度

window_switching_flag[gr][ch] -- Signals that the block uses an other than normal (type 0) window.

If window_switching_flag is set, several other variables are set by default:
region0_count = 7 (in case of block_type==1 or block_type==3
or block_type==2 and mixed_block_flag)
region0_count = 8 (in case of block_type==2 and not mixed_block_flag)
region1_count = 36 Thus all remaining values in the big_value region are contained in region 1.

If window_switching_flag is not set, then the value of block_type is zero.

window_switching_flag[gr][ch] -- 用于指示该数据块使用了非普通(0 型)窗口。

如果 window_switching_flag 被置位,那么其他几个变量会被设为默认值:

  • 当 block_type 等于 1 或 3 ,或者 block_type 等于 2 且 mixed_block_flag 被置位时,region0_count 等于 7 ;
  • 当 block_type 等于 2 且 mixed_block_flag 未被置位时,region0_count 等于 8 ;
  • region1_count 等于 36 ,因此 big_value 区域中所有剩余的值都包含在区域 1 中。

如果 window_switching_flag 未被置位,那么 block_type 的值为 0 。

block_type[gr][ch] -- Indicates the window type for the granule (see description of the filterbank, Layer III).

block_type[gr]
0reserved
1start block
2short windows
3end block

Block_type and mixed_block_flag give the information about assembling of values in the block and about length and window of the transforms (see Figure A.4 for a schematic, annex C for an analytic description). If window_switching_flag==1, then the mixed_block_flag indicates whether low frequency polyphase filter subbands are coded using normal window type. The polyphase filterbank is described in 2.4.3.

In the case of long blocks (block_type not equal to 2 or in the lower subbands of block_type 2 if the mixed_block_flag is set) the IMDCT generates an output of 36 values every 18 input values. The output is windowed depending on the block_type and the first half is overlapped with the second half of the block before. The resulting vector is the input of the synthesis part of the polyphase filterbank of one band.

In the case of short blocks (in the upper subbands of a type 2 block if the mixed_block_flag is set, or in all subbands of a type 2 block if mixed_block_flag is not set), three transforms are performed producing 12 output values each. The vectors are windowed and overlapped each. Concatenating 6 zeros on both ends of the resulting vector gives a vector of length 36, which is processed like the output of a long transform.

block_type[gr][ch] -- 指示颗粒(granule)的窗口类型(参见第三层滤波器组的描述)。 block_type 和 mixed_block_flag 提供有关块内数值组合方式、变换长度及窗口类型的信息(示意图见 A.4 节,解析描述见附录 C)。若 window_switching_flag==1,则 mixed_block_flag 指示低频多相滤波器子带是否使用常规窗口类型编码。多相滤波器组的详细说明见 2.4.3 节。

长块处理规则block_type 不等于 2,或当 block_type=2 且 mixed_block_flag 置位时的低频子带):
IMDCT 每处理 18 个输入值生成 36 个输出值。输出根据 block_type 加窗,其前半部分与前一区块的后半部分重叠。最终向量作为单频带多相滤波器组合成端的输入。

短块处理规则(当 block_type=2 且 mixed_block_flag 置位时的高频子带,或 mixed_block_flag 未置位时的所有子带):
执行三次变换,每次生成 12 个输出值。每个向量单独加窗并重叠。在结果向量两端各补 6 个零后形成长度为 36 的向量,后续处理同长变换输出。

mixed_block_flag[gr][ch] -- Indicates that lower frequencies are transformed with a window type that is different than that which is used at higher frequencies. If mixed_block_flag is zero, then all blocks are transformed as indicated by block_type[gr][ch]. If mixed_block_flag is one, then all blocks (block_type=0), while retaining 30 subbands are transformed as block_type[gr][ch] (normal window transforming).

mixed_block_flag[gr][ch] -- 指示低频部分使用与高频部分不同的窗口类型进行变换。如果 mixed_block_flag 为 0,则所有块均按照 block_type[gr][ch] 所指示的类型进行变换;如果 mixed_block_flag 为 1,则所有块(block_type=0)在保留 30 个子带的同时,按照 block_type[gr][ch] 所指示的类型(常规窗口变换)进行变换。

table_select[gr][ch][region] -- Different Huffman code tables are used depending on the maximum quantized value and the local statistics of the signal. There are a total of 32 possible tables given in table B.7.

table_select[gr][ch][region] -- 根据最大量化值和信号的局部统计特性,使用不同的霍夫曼码表。表 B.7 中给出了总共 32 种可能的码表。

subblock_gain[gr][ch][window] -- Indicates the gain offset (quantization: factor 4) from the global gain for one subblock. Used only with block type 2 (short windows). The values of the subblock have to be divided by 4(subblock_gain[window]) in the decoder. See 2.4.3.4. Formula for requantization and all scaling.

subblock_gain[gr][ch][window] -- 指示一个子块相对于全局增益的增益偏移量(量化因子为 4)。仅在块类型为 2(短窗口)时使用。解码器中,子块的值必须除以 4 (subblock_gain[window])。参见 2.4.3.4 节 “重新量化及所有缩放公式”。

region0_count[gr][ch] -- A further partitioning of the spectrum is used to enhance the performance of the Huffman coder. It is a subdivision of the region which is described by big_values. The purpose of this subdivision is to get better bitrate robustness and better coding efficiency. Three regions are used, they are named: region 0, 1 and 2. Each region is coded using a different Huffman code table depending on the maximum quantized_value and the local signal statistics.

The values region0_count and region1_count are used to indicate the boundaries of the regions. The region boundaries are aligned with the partitioning of the spectrum into scale factor bands.

The field region0_count contains a less than the number of scale factor bands in region 0. In the case of short blocks, each scale factor band is counted three times, once for each short window, so that a region0_count value of 8 indicates that region0 begins at scale factor band number 3.

If block_type==2 and mixed_block_flag==0, the total amount of scalefactor bands for the granule in this case is 12∗3=36. If block_type==2 and mixed_block_flag==1, the amount of scalefactor bands is 8+9∗3=35. If block_type==2, the amount of scalefactor bands is 21.

region0_count[gr][ch] -- 为提升霍夫曼编码器的性能,对频谱进行了进一步划分。这是对由 big_values 所描述区域的细分。这种细分的目的是为了获得更好的码率稳健性和编码效率。划分出了三个区域,分别命名为区域 0、区域 1 和区域 2 。根据最大量化值和信号局部统计特性,每个区域使用不同的霍夫曼码表进行编码。

region0_count 和 region1_count 这两个值用于指示这些区域的边界。区域边界与频谱划分为比例因子频带的划分相对齐。

region0_count 字段所包含的值小于区域 0 中比例因子频带的数量。在短块的情况下,每个比例因子频带会被统计三次(每个短窗口各统计一次),因此 region0_count 值为 8 表示区域 0 从第 3 号比例因子频带开始。

如果 block_type==2 且 mixed_block_flag==0,此时该颗粒的比例因子频带总数为 12×3=36 个。如果 block_type==2 且 mixed_block_flag==1,比例因子频带数量为 8+9×3=35 个。如果 block_type==2,比例因子频带数量为 21 个。

region1_count[gr][ch] -- The region1_count counts one less than the number of scalefactor bands in region 1. Again, block_type==2 regions scalefactor bands representing different timeslots are counted separately.

region1_count[gr][ch] -- region1_count 所计数值比区域 1 中比例因子频带的数量少 1 。同样,当 block_type==2 时,代表不同时隙的比例因子频带会分别计数。

preflag[gr][ch] -- This is a shortcut for additional high frequency amplification of the quantized values. If preflag is set, the values of a table are added to the scalefactors (see table B.6). This is equivalent to never set, the quantized scalefactors with table values. If block_type==2 (short blocks) preflag is never used.

preflag[gr][ch] -- 这是对量化值进行额外高频放大的一种简捷方式。如果 preflag 被置位,会将一个表格中的值加到比例因子上(见表 B.6)。这等同于从未设置该标志时,量化后的比例因子与表格值的情况。当 block_type==2(短块)时,preflag 从不使用。

scalefac_scale[gr][ch] -- The scalefactors are logarithmically quantized with a step size of 2 or (√2) depending on scalefac_scale. The following table indicates the scale factor multiplier used in the requantization equation for each stepsize.

scalefac_scale[gr]scalefac_multiplier
00.5
11

scalefac_scale[gr][ch] -- 根据 scalefac_scale 的取值,比例因子(scalefactors )以步长为 2 或根号2进行对数量化。下表给出了每种步长在重新量化公式中使用的比例因子乘数。

count1table_select[gr][ch] -- This flag selects one of two possible Huffman code tables for the region of quadruples of quantized values with magnitude not exceeding 1.

count1table_select[gr]
0Table B.7 - A
1Table B.7 - B

count1table_select[gr][ch] -- 该标志位用于为幅度不超过 1 的量化值四元组所在区域,从两种可能的霍夫曼码表中选择其一。

scalefac_l[gr][ch][sfb], scalefac_s[gr][ch][sfb][window], is_pos[sfb] -- The scalefactors are used to colour the quantization noise. If the quantization noise is colouring with the right shape, it is masked completely. Unlike Layers I and II, the Layer III scalefactors say nothing about the signal maximum of the quantized signal. In Layer III, scalefactors are used in the decoder to get division factors for groups of values. In the case of Layer III, the groups stretch over several frequency lines. These groups are called scalefactor bands and are selected to resemble critical bands as closely as possible. The scalefac_compress table shows that the scalefactors 0...10 have a range of 0 to 15 (maximum length 4 bits) and the scalefactors 11...21 have a range of 0 to 7 (maximum length 3 bits).

If intensity_stereo is enabled (modebit_extension) the scalefactors of the "zero_part" of the difference (right) channel are used as intensity_stereo_position, is_pos[sfb] (see 2.4.3.4, MS_stereo mode). is_pos[sfb] is the intensity stereo position for scalefactor band sfb.

The subdivision of the spectrum into scalefactor bands is fixed for every block length and sampling frequency and stored in tables in the coder and decoder (see table B.8). The scale factor for frequency lines above the highest line in the tables is zero, which means that the actual multiplication factor is 1.0.

The scalefactors are logarithmically quantized. The quantization step is set with scalefac_scale.

scalefac_l[gr][ch][sfb]scalefac_s[gr][ch][sfb][window]is_pos[sfb] -- 比例因子用于对量化噪声进行整形。如果量化噪声的整形恰当,它将被完全掩蔽。与第一层(Layer I)和第二层(Layer II)不同,第三层(Layer III)的比例因子与量化信号的最大值并无关联。在第三层解码时,比例因子用于为数值分组获取除因子。在第三层中,这些分组跨越若干频率线。这些分组被称为比例因子频带,其选取尽可能接近临界频带。

scalefac_compress 表表明,比例因子 0 到 10 的取值范围是 0 到 15(最大长度为 4 比特),比例因子 11 到 21 的取值范围是 0 到 7(最大长度为 3 比特)。

如果启用了强度立体声(intensity_stereo ,即模式位扩展modebit_extension ),差异(右)声道 “零部分” 的比例因子将用作强度立体声位置 is_pos[sfb](参见 2.4.3.4 节,MS 立体声模式 )。is_pos[sfb] 是比例因子频带 sfb 的强度立体声位置。

对于每种块长度和采样频率,频谱划分为比例因子频带的方式是固定的,并存储在编码器和解码器的表格中(见表 B.8 )。表格中最高频率线以上的频率线对应的比例因子为零,这意味着实际的乘法因子为 1.0 。

比例因子采用对数量化,量化步长由 scalefac_scale 设置。

huffmancodebits() -- Huffman encoded data.

The syntax for huffmancodebits() shows how quantized values are encoded. Within the big_values partition, pairs of quantized values with an absolute value less than 15 are directly coded using a Huffman code. The codes are selected from Huffman table 0 through 31 in table B.7. Always pairs of values (x,y) are coded. If quantized values of magnitude greater than or equal to 15 are coded, the values are coded with a separate field following the huffman code. If one or both values of a pair is not zero, one or two sign bits are appended to the code word.

huffmancodebits() -- 霍夫曼编码数据。

huffmancodebits() 的语法展示了量化值是如何编码的。在 big_values 分区内,绝对值小于 15 的量化值对会直接使用霍夫曼码进行编码。这些编码从表 B.7 中的霍夫曼表 0 到 31 里选取。始终是对值对(x, y)进行编码。如果要编码绝对值大于或等于 15 的量化值,这些值会在霍夫曼编码之后用一个单独的字段进行编码。如果值对中的一个或两个值不为零,会在码字后面附加一到两个符号位。

The Huffman tables for the big_values partition are comprised of three parameters:

  • hcod[b][l][lyl] is the Huffman code table entry for values x,y.
  • hlen[x][l][lyl] is the Huffman length table entry for values x,y.
  • linbits is the length of linbits or lbinbits when they are coded.

big_values 分区的霍夫曼表由三个参数构成:

  • hcod[b][l][lyl] 是值 x、y 对应的霍夫曼码表项。
  • hlen[x][l][lyl] 是值 x、y 对应的霍夫曼长度表项。
  • linbits 是linbits 或lbinbits 编码时的长度。

The syntax for huffmancodebits contains the following fields and parameters:

  • signv is the sign of v (0 if positive, 1 if negative).
  • signw is the sign of w (0 if positive, 1 if negative).
  • signx is the sign of x (0 if positive, 1 if negative).
  • signy is the sign of y (0 if positive, 1 if negative).
  • linbitsx is used to encode the value of x if the magnitude of x is greater or equal to 15. This field is coded only if |x| in hcod is equal to 15. If linbits is zero, so that no bits are actually coded when |x|==15, then the value linbitsx is defined to be zero.
  • linbitsy is the same as linbitsx but for y.
  • is[I] is the quantized value for frequency line number l.

huffmancodebits 的语法包含以下字段和参数:

  • signv 是 v 的符号(0 表示正数,1 表示负数)。
  • signw 是 w 的符号(0 表示正数,1 表示负数)。
  • signx 是 x 的符号(0 表示正数,1 表示负数)。
  • signy 是 y 的符号(0 表示正数,1 表示负数)。
  • linbitsx 用于在 x 的绝对值大于或等于 15 时对 x 的值进行编码。只有当hcod 中 | x | 等于 15 时,才对该字段进行编码。如果linbits 为零,即当 | x| = 15 时实际上没有比特被编码,那么linbitsx 的值被定义为零。
  • linbitsy 与linbitsx 类似,用于对 y 进行编码。
  • is[I] 是第 l 条频率线的量化值。

The linbitsx or linbitsy fields are only used if a value greater or equal to 15 needs to be encoded. These fields are interpreted as unsigned integers and added to 15 to obtain the encoded value. The linbitsx and linbitsy fields are never used if the selected table is one for blocks with a maximum quantized value less than 15. Note that a value of 15 can still be encoded with a huffman table for which linbits is zero. In this case, linbitsx or linbitsy fields are not actually coded, since linbits is zero.

Within the count1 partition, quadruples of values with magnitude less than or equal to one are coded. Again magnitude values are coded using a Huffman code from table A or B in table B.7. Again, for each non - zero value, a sign bit is appended after the huffman code symbol.

只有当需要编码的值大于或等于 15 时,才会使用linbitsx 或linbitsy 字段。这些字段被解释为无符号整数,并与 15 相加得到编码值。如果所选的表适用于最大量化值小于 15 的块,就永远不会使用linbitsx 和linbitsy 字段。需要注意的是,值为 15 时仍可以使用linbits 为零的霍夫曼表进行编码。在这种情况下,由于linbits 为零,linbitsx 或linbitsy 字段实际上不会被编码。

count1 分区内,绝对值小于或等于 1 的四元组值会被编码。同样,这些幅值会使用表 B.7 中表 A 或表 B 的霍夫曼码进行编码。而且,对于每个非零值,会在霍夫曼码符号后面附加一个符号位。

The Huffman tables for the count1 partition are comprised of the following parameters:

  • hcod[v][w][|x|][|y|] is the Huffman code table entry for values v,w,x,y.
  • hlen[v][w][|x|][|y|] is the Huffman length table entry for values v,w,x,y.

Huffman code table B is not really a 4 - dimensional code because it is constructed from the trivial code: 0 is coded with a 1, and 1 is coded with a 0.

Quantized values above the count1 partition are zero, so they are not encoded.

count1 分区的霍夫曼表由以下参数构成:

  • hcod[v][w][x][y] 是值 v、w、x、y 对应的霍夫曼码表项。
  • hlen[v][w][x][y] 是值 v、w、x、y 对应的霍夫曼长度表项。

霍夫曼码表 B 并非真正的四维码,因为它由简单码构成:0 编码为 1,1 编码为 0 。

count1 分区之上的量化值为零,因此不对其进行编码。

For clarity, the parameter "count1" is used in this document to indicate the number of Huffman codes in the count1 region. However, unlike the bigvalues partition, the number of values in the count1 partition is not explicitly coded by a field in the syntax. The end of the count1 partition is known only when all bits for the granule (as specified by part2_3_length), have been exhausted, and the value of count1 is known implicitly after decoding the count1 region.

The order of the Huffman data depends on the block_type of the granule. If block_type is 0, 1 or 3 the Huffman encoded data is ordered in terms of increasing frequency.

If block_type==2 (short blocks) the Huffman encoded data is ordered in the same order as the scalefactor values for that granule. The Huffman encoded data is given for successive scalefactor bands, beginning with scalefactor band 0. Within each scalefactor band, the data is given for successive short windows, beginning with window 0 and ending with window 2. Within each window, the quantized values are then arranged in order of increasing frequency.

为清晰起见,本文档使用参数 “count1” 来表示 count1 区域中霍夫曼码的数量。然而,与 bigvalues 分区不同,count1 分区中的值的数量并未在语法中通过一个字段进行显式编码。只有在颗粒(由 part2_3_length 指定 )的所有比特都被用尽时,才能确定 count1 分区的结束位置,并且在对 count1 区域进行解码后才能隐含地得知 count1 的值。

霍夫曼编码数据的顺序取决于颗粒的 block_type。如果 block_type 为 0、1 或 3 ,霍夫曼编码数据按频率递增的顺序排列。

如果 block_type 等于 2(短块 ),霍夫曼编码数据的排列顺序与该颗粒的比例因子值的顺序相同。霍夫曼编码数据按连续的比例因子频带给出,从比例因子频带 0 开始。在每个比例因子频带内,数据按连续的短窗口给出,从窗口 0 开始到窗口 2 结束。在每个窗口内,量化值再按频率递增的顺序排列。

2.4.2.8 Ancillary data

Ancillary_bit -- User definable.

The number of ancillary bits (no_of_ancillary_bits) equals the available number of bits in an audio frame minus the number of bits actually used for header, error check and audio data. In Layer I and II the no_of_ancillary_bits corresponds to the distance between the end of the audio data and the beginning of the next header. In Layer III the no_of_ancillary_bits corresponds to the distance between the end of Huffman_code_bits and the location in the bitstream where the next frame's main_data_begin pointer points.

辅助位(Ancillary_bit) -- 用户可定义。

辅助位的数量(no_of_ancillary_bits)等于音频帧中可用的比特数减去实际用于头部、错误校验和音频数据的比特数。在第一层(Layer I)和第二层(Layer II)中,no_of_ancillary_bits 对应音频数据末尾与下一个头部起始位置之间的距离。在第三层(Layer III)中,no_of_ancillary_bits 对应霍夫曼编码比特(Huffman_code_bits)末尾与比特流中下一个帧的main_data_begin 指针所指位置之间的距离。