持续创作，加速成长！这是我参与「掘金日新计划 · 10 月更文挑战」的第8天，点击查看活动详情

cstdio中的格式化输入输出函数

fprintf函数的实现vfprintf中包含了相当多的宏定义和辅助函数，接下来我们一起来分析一下它们对应的源码实现。

函数逻辑分析---vfprintf

2.宏定义/辅助函数分析

(14)__printf_fp_spec---打印一个特定类型的变量

入参
- FILE *fp：文件流对象
- const struct printf_info *info：本次识别的format占位符对应解析出的相关信息
- const void *const *args：本次占位符对应的参数列表

这个函数是其中比较核心的函数，在每一次处理format string时，提取出一个占位符，并填充相关识别信息到printf_info结构体中，传入其对应的参数指针，对当前的特定类型变量进行打印。

由于这个函数承载了大量的细节处理，所以还是通过函数调用的方式进行实现的，根据对应的类型，如果是'a'或'A'即16进制打印，那需要调用__printf_fphex完成相关的打印工作，否则统一使用__printf_fp完成相关的打印工作，这里我们将__printf_fphex/__printf_fp的核心实现留到后文解析，目前就当成这个函数能够完成一个特定类型数据的打印即可。

        struct printf_info info =
          {
        .prec = prec,
        .width = width,
        .spec = spec,
        .is_long_double = is_long_double,
        .is_short = is_short,
        .is_long = is_long,
        .alt = alt,
        .space = space,
        .left = left,
        .showsign = showsign,
        .group = group,
        .pad = pad,
        .extra = 0,
        .i18n = use_outdigits,
        .wide = sizeof (CHAR_T) != 1,
        .is_binary128 = 0
          };

        PARSE_FLOAT_VA_ARG_EXTENDED (info);
        const void *ptr = &the_arg;

        int function_done = __printf_fp_spec (s, &info, &ptr);
        
/* Calls __printf_fp or __printf_fphex based on the value of the
   format specifier INFO->spec.  */
static inline int
__printf_fp_spec (FILE *fp, const struct printf_info *info,
          const void *const *args)
{
  if (info->spec == 'a' || info->spec == 'A') 
    return __printf_fphex (fp, info, args);
  else 
    return __printf_fp (fp, info, args);
}

当然，在这里，我们有必要看一下printf_info结构体的内容，如下：

参照前文所说一个format占位符的格式

%[flags][width][.precision][length]specifier

可以看到，在printg_info结构体中，实际上是按顺序包含我们在打印过程中所需要的所有控制信息，vfprintf函数的核心逻辑就是识别字符串并构造这样一个结构体，然后传给__printf_fp_spec进行打印处理，不断循环，直到处理结束。

// glibc/stdio-common/printf.h
struct printf_info
{
  int prec;         /* Precision.  */
  int width;            /* Width.  */
  wchar_t spec;         /* Format letter.  */
  unsigned int is_long_double:1;/* L flag.  */
  unsigned int is_short:1;  /* h flag.  */
  unsigned int is_long:1;   /* l flag.  */
  unsigned int alt:1;       /* # flag.  */
  unsigned int space:1;     /* Space flag.  */
  unsigned int left:1;      /* - flag.  */
  unsigned int showsign:1;  /* + flag.  */
  unsigned int group:1;     /* ' flag.  */
  unsigned int extra:1;     /* For special use.  */
  unsigned int is_char:1;   /* hh flag.  */
  unsigned int wide:1;      /* Nonzero for wide character streams.  */
  unsigned int i18n:1;      /* I flag.  */
  unsigned int is_binary128:1;  /* Floating-point argument is ABI-compatible
                   with IEC 60559 binary128.  */
  unsigned int __pad:3;     /* Unused so far.  */
  unsigned short int user;  /* Bits for user-installed modifiers.  */
  wchar_t pad;          /* Padding character.  */
};

(15).is_longlong与is_long_num---针对32位和64位的兼容机制

如果LONG_MAX == LONG_LONG_MAX，说明是32位机器，需要将is_longlong置为0，否则与is_long_double保持一致；

如果INT_MAX == LONG_MAX，说明是32位机器，需要将is_long_num置为0，否则与is_long保持一致。

/* For handling long_double and longlong we use the same flag.  If
   `long' and `long long' are effectively the same type define it to
   zero.  */
#if LONG_MAX == LONG_LONG_MAX
# define is_longlong 0
#else
# define is_longlong is_long_double
#endif

/* If `long' and `int' is effectively the same type we don't have to
   handle `long separately.  */
#if INT_MAX == LONG_MAX
# define is_long_num    0
#else
# define is_long_num    is_long
#endif

(16)跳转表机制---用于处理format占位符的识别

回到上面提到的vfprintf的核心任务，一个是识别format占位符构造printf_info结构体，一个是打印该种特定类型的变量，这个跳转表机制就是为了识别format占位符而创建的。

%[flags][width][.precision][length]specifier对于这样的一个字符串类型识别，我们通常会采用自动机的方式进行识别，设定多个状态，每个状态代表一个识别位，状态之间的跳转需要条件，直到最后识别到specifier类型，这时，我们就能够完全解析出所有字段，上面构想的方法其实就是有限状态自动机的方法。

但是，因为C标准中规定了上面五个字段中可以有很多种组合，这个状态数量的叠加将是非常巨大的，我们能否寻找到一种比较简单的方式实现这个逻辑呢？

答案是有的，在C语言中有一种语法深为人所诟病，那就goto语句，可以进行无条件跳转到设定好的标号处，当然大部分情况下我们是不建议使用的，但是它在这种场景下就很契合，试想一下，我们的有限状态自动机中每一个状态是否就是一个标号呢？在这个标号中处理当前符号的识别，处理完成之后，移动到下一个字符，并通过我们设定好的跳转表跳转到下一个标号处进行处理，这个过程中，我们需要维护好的就是这个关键的跳转表，它记录了当前符号的下一个处理标号的位置，跳转表充当了有限状态自动机中触发状态变化函数的作用。

我们来看一看具体代码中的例子：

这里我们以打印一个%c为例

首先，在vfprintf中，我们首先找到第一个%号；

  /* Find the first format specifier.  */
  f = lead_str_end = __find_specmb ((const UCHAR_T *) format);

然后输出第一个%号之前的所有字符串，

这些字符串是不需要解析的

  /* Write the literal text before the first format.  */
  outstring ((const UCHAR_T *) format,
         lead_str_end - (const UCHAR_T *) format);

定义跳转表

  /* Process whole format string.  */
  do
    {
      STEP0_3_TABLE;
      STEP4_TABLE;
      ...
  }

处理%后面的第一个字符

      /* Get current character in format string.  */
      JUMP (*++f, step0_jumps);

JUMP宏解析

接下来我们来看看这个JUMP宏是如何进行解析的，注意：按照我们例子，现在传入的是字符c

这里核心是找到ptr(即当前字符c对应的label地址)，然后goto *ptr跳转。

# define JUMP(ChExpr, table)                              \
      do                                      \
    {                                     \
      int offset;                                 \
      void *ptr;                                  \
      spec = (ChExpr);                            \
      offset = NOT_IN_JUMP_RANGE (spec) ? REF (form_unknown)          \
        : table[CHAR_CLASS (spec)];                       \
      ptr = &&JUMP_TABLE_BASE_LABEL + offset;                 \
      goto *ptr;                                  \
    }                                     \
      while (0)

首先是判断输入字符是否在跳转范围内，跳转表是从L_(' ')到L_('z')的所有字符（看到这里不由得对ASCII码设计者和C语言中print format关键字设置的逻辑感到赞叹，都是相互联系的，否则这里的代码就更加混乱不能理解了）。

如果不在这其中，那我们默认返回REF (form_unknown)，即do_form_unknown与do_form_unknown的地址差，为0；

如果在其中，那么我们查表可知字符’c‘对应的值为20，所以我们需要访问表table[20]，注意现在的表是传入的step0_jumps

/* This table maps a character into a number representing a class.  In
   each step there is a destination label for each class.  */
static const uint8_t jump_table[] =
  {                                                                                                                                                      
    /* ' ' */  1,            0,            0, /* '#' */  4,
           0, /* '%' */ 14,            0, /* '''*/  6,
           0,            0, /* '*' */  7, /* '+' */  2,
           0, /* '-' */  3, /* '.' */  9,            0,
    /* '0' */  5, /* '1' */  8, /* '2' */  8, /* '3' */  8,
    /* '4' */  8, /* '5' */  8, /* '6' */  8, /* '7' */  8,
    /* '8' */  8, /* '9' */  8,            0,            0,
           0,            0,            0,            0,
           0, /* 'A' */ 26, /* 'B' */ 30, /* 'C' */ 25,
           0, /* 'E' */ 19, /* F */   19, /* 'G' */ 19,
           0, /* 'I' */ 29,            0,            0,
    /* 'L' */ 12,            0,            0,            0,
           0,            0,            0, /* 'S' */ 21,
           0,            0,            0,            0,
    /* 'X' */ 18,            0, /* 'Z' */ 13,            0,
           0,            0,            0,            0,
           0, /* 'a' */ 26, /* 'b' */ 30, /* 'c' */ 20,
    /* 'd' */ 15, /* 'e' */ 19, /* 'f' */ 19, /* 'g' */ 19,
    /* 'h' */ 10, /* 'i' */ 15, /* 'j' */ 28,            0,
    /* 'l' */ 11, /* 'm' */ 24, /* 'n' */ 23, /* 'o' */ 17,
    /* 'p' */ 22, /* 'q' */ 12,            0, /* 's' */ 21,
    /* 't' */ 27, /* 'u' */ 16,            0,            0,
    /* 'x' */ 18,            0, /* 'z' */ 13
  };

#define NOT_IN_JUMP_RANGE(Ch) ((Ch) < L_(' ') || (Ch) > L_('z'))
#define CHAR_CLASS(Ch) (jump_table[(INT_T) (Ch) - L_(' ')])

# define JUMP_TABLE_TYPE const int
# define JUMP_TABLE_BASE_LABEL do_form_unknown
# define REF(Name) &&do_##Name - &&JUMP_TABLE_BASE_LABEL

查表step0_jumps[20]，可以看到index20的位置是REF (form_character)，根据REF的解析格式，应该是

&&do_form_character - &&do_form_unknown，接下来我们需要找到do_form_character的标号位置进行处理

    /* Step 0: at the beginning.  */                          \
    static JUMP_TABLE_TYPE step0_jumps[31] =                      \
    {                                         \
      REF (form_unknown),                             \
      REF (flag_space),     /* for ' ' */                     \
      REF (flag_plus),      /* for '+' */                     \
      REF (flag_minus),     /* for '-' */                     \
      REF (flag_hash),      /* for '<hash>' */                \
      REF (flag_zero),      /* for '0' */                     \
      REF (flag_quote),     /* for ''' */                    \
      REF (width_asterics), /* for '*' */                     \
      REF (width),      /* for '1'...'9' */               \
      REF (precision),      /* for '.' */                     \
      REF (mod_half),       /* for 'h' */                     \
      REF (mod_long),       /* for 'l' */                     \
      REF (mod_longlong),   /* for 'L', 'q' */                \
      REF (mod_size_t),     /* for 'z', 'Z' */                \
      REF (form_percent),   /* for '%' */                     \
      REF (form_integer),   /* for 'd', 'i' */                \
      REF (form_unsigned),  /* for 'u' */                     \
      REF (form_octal),     /* for 'o' */                     \
      REF (form_hexa),      /* for 'X', 'x' */                \
      REF (form_float),     /* for 'E', 'e', 'F', 'f', 'G', 'g' */        \
      REF (form_character), /* for 'c' */                     \
...
}

LABEL(form_character)的处理逻辑

注意到有专门的宏LABEL定义标号，所以我们要找LABEL(form_character)

这个标号下的处理逻辑就比较明晰了：

先处理宽字符情况，跳转到LABEL (form_wcharacter)；

因为标准字符只占用一个字节，所以对齐宽度先减一；

如果指定非左对齐，那么则需要在左侧增加空格进行对齐；

然后调用outchar输出当前字符，当前字符的获取通过process_arg_int得到(实际上就是通过va_arg获取int型数据)

最后进行左对齐的处理，在右侧增加空格进行对齐。

#define LABEL(Name) do_##Name

// glibc/stdio-common/vfprintf-process-arg.c
LABEL (form_character):
  /* Character.  */                                                                                                                                      
  if (is_long)
    goto LABEL (form_wcharacter);
  --width;  /* Account for the character itself.  */
  if (!left)
    PAD (L_(' '));
#ifdef COMPILE_WPRINTF
  outchar (__btowc ((unsigned char) process_arg_int ())); /* Promoted. */
#else
  outchar ((unsigned char) process_arg_int ()); /* Promoted.  */
#endif
  if (left)
    PAD (L_(' '));
  break;
  
  #define process_arg_int() va_arg (ap, int)

因为跳转表的机制过于繁琐，上面我们只是以%c这样一个小例子作为解析方式，后续会专门写文章说明跳转表的机制，因为跳转表不止一张，其中的跳转逻辑也不尽相同，读者主要要理解这种有限状态自动机转换的思想。

cstdio的源码学习分析10-格式化输入输出函数fprintf---宏定义/辅助函数分析04