WTSC 4.1 Parse: Option parsing

140 阅读8分钟

We need some terms before we move on into option parsing. What are command line, command, option, argument?

As we know, when we tell shell on linux what to do, we input a string of characters then type 'Enter' key, and the 'Enter' key on shell is interpreted as new line character, '\n', that means a line is finished, moreover the shell would accept input line by line. If you want to use multiple lines for an input string, before you clike 'Enter', you need to add a backlash, '\', to escape the normal meaning of 'Enter' key. Then the shell will know you want to continue the same line but display it on next, in other words, the string you input is still one line that will not cause shell to finish accepting input. That is command line. a whole string line that cause shell finishing input acception and do what it means.

After the shell accepts a command line, it will seperate the whole string by whitespace, e.g. space key, tab key, etc, into smaller strings without whitespace except the string enclosed by double quotion or single quotion which will be treated as one parts.

The first sub-string is command, and all sub-strings are arguments. That means the fisrt argument is the string representing the command. A command can be an executable binary, an executable script, builtin function of shell, an alias/symlink to other executable. The arguments are passed to those executables from command line by shell.

An option is a type of argument modifying the behavior of a command. As their names suggets, options are usually optional. Some arguments are options, usually prefixed by '-' or '--', and some arguments are arguments for options not the command, which are usually following options. Or some options have form as "-option=value" that sets value for that option, if an option is not assigned value, that option is a flag, which can be on (present) or off(not exist). In following example,

swiftc -o helloworld Helloworld.swift

The command line is the whole string.

The command is swiftc, which is a symbolic link to swift-frontend. By the way, the shell will search the request command, swiftc, from its PATH variable, which is semicolon-seperated list string encoding which directories for searching command in order.

The arguments are 'swiftc', '-o', 'helloworld' and 'Helloworld.swift'.

The option is '-o'.

The arugment for option '-o' is 'helloworld'.

Beside options and arguments for options, the rest of arguments, 'Helloworld.swift', are for command.

Based on the above knowledge, there are some classes in llvm Opiton library representing those concepts and their relationship.

Particularly, class Arg, class ArgList, class InputArgList, class Option, class OptTable

/// A concrete instance of a particular driver option.
///
/// The Arg class encodes just enough information to be able to
/// derive the argument values efficiently.
class Arg

/// ArgList - Ordered collection of driver arguments.
///
/// The ArgList class manages a list of Arg instances as well as
/// auxiliary data and convenience methods to allow Tools to quickly
/// check for the presence of Arg instances for a particular Option
/// and to iterate over groups of arguments.
class ArgList
class InputArgList final : public ArgList

/// Option - Abstract representation for a single form of driver
/// argument.
///
/// An Option class represents a form of option that the driver
/// takes, for example how many arguments the option has and how
/// they can be provided. Individual option instances store
/// additional information about what group the option is a member
/// of (if any), if the option is an alias, and a number of
/// flags. At runtime the driver parses the command line into
/// concrete Arg instances, each of which corresponds to a
/// particular Option instance.
class Option

/// Provide access to the Option info table.
///
/// The OptTable class provides a layer of indirection which allows Option
/// instance to be created lazily. In the common case, only a few options will
/// be needed at runtime; the OptTable class maintains enough information to
/// parse command lines without instantiating Options, while letting other
/// parts of the driver still use Option instances where convenient.
class OptTable {
...
/// Entry for a single option instance in the option data table.
  struct Info
...
private:
  /// The option information table.
  std::vector<Info> OptionInfos;
...
}

However, different command requires different options and its accepting arguments. How to specify the entire set of options and arguments to a command is up to the command author, because those options and arguments when representing on shell are just a whitespace-seperated string line. How to parse it into options and aguments is the job of command. Moreover, llvm offers a framework for doing the arugment parsing in order to reduce replicate works for command writer. So the above classes are common parts which need not be changed when writing a new program using llvm argument parsing framework, the way to specify your own set of options and arguments are to write a td (target descritption) file which describe those options and arugments. And llvm infrastructure will call tablegen program to parse the td files and convert them into cpp files which will be integreted into your program source code, then together produce the whole program.

We will see how td file look like, how tablegen parses and converts it, how to integret generate cpp file.

There some documents from llvm.org, TableGen Overview, TableGen Programmer's Reference, TableGen Manual, that can show us how tablegen works.

Over all, the td file uses compat syntax to define records for collecting information describing an instance of a type. And tablegen parses it and use those information to generate different required format or files.

For example, a car.

/* /wtsc/car/car.td */
class Car <string b, string c, int yyyy> {
      string brand = b;
      string country = c;
      int productionYear = yyyy;
}

def Ford: Car<"Ford", "USA", 1960>;

/wtsc/build/Ninja-ReleaseAssert+swift-DebugAssert/llvm-linux-x86_64/bin/llvm-tblgen -print-records car.td
------------- Classes -----------------
class Car<string Car:b = ?, string Car:c = ?, int Car:yyyy = ?> {
  string brand = Car:b;
  string country = Car:c;
  int productionYear = Car:yyyy;
}
------------- Defs -----------------
def Ford {	// Car
  string brand = "Ford";
  string country = "USA";
  int productionYear = 1960;
}

Because tablegen doesn't know how to generate something from our car, class Car. So we can't generate a cpp file from our car td file.

So let us have a look the swift compiler's td files for options and how to use tablegen to convert td files into cpp files.

Here is Swift's option files (.td files):

ll /wtsc/swift/include/swift/Option/*.td
 inode Permissions Links Size User Date Modified Name
272809 .rw-rw-r--      1  34k k    13 Oct 18:55  /wtsc/swift/include/swift/Option/FrontendOptions.td
272811 .rw-rw-r--      1  54k k    13 Oct 18:55  /wtsc/swift/include/swift/Option/Options.td

The -driver-print-jobs option, which we use to list jobs needed to be done, is defined in Options.td:

def driver_print_jobs : Flag<["-"], "driver-print-jobs">, InternalDebugOpt,
  HelpText<"Dump list of jobs to execute">;

We can use llvm-tblgen to transform the file in which it's defined.

/wtsc/build/Ninja-ReleaseAssert+swift-DebugAssert/llvm-linux-x86_64/bin/llvm-tblgen \
 -I /wtsc/llvm-project/llvm/include \
 -I /wtsc/swift/include/swift/Option \
 /wtsc/swift/include/swift/Option/Options.td \
 -gen-opt-parser-defs
...
OPTION(prefix_1,                                    //PREFIX
       &"-driver-print-jobs"[1],                    //NAME
       driver_print_jobs,                           //ID
       Flag,                                        //KIND
       internal_debug_Group,                        //GOUP
       INVALID,                                     //ALIAS
       nullptr,                                     //ALIASARGS
       HelpHidden | DoesNotAffectIncrementalBuild,  //FLAGS
       0,                                           //PARAM
       "Dump list of jobs to execute",              //HELPTEXT
       nullptr,                                     //METAVAR
       nullptr)                                     //VALUES
...

By checking out Options.h file, we can see the meaning of each item in option.

/* /wtsc/swift/include/swift/Option/options.h */
...
#define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM,  \
               HELPTEXT, METAVAR, VALUES)                                      \
...

If you want to make more sure about it, you can checkout llvm-tblgen's source code file to see what it means for each item.

/* /wtsc/llvm-project/llvm/utils/TableGen/OptParserEmitter.cpp */
...
/// OptParserEmitter - This tablegen backend takes an input .td file
/// describing a list of options and emits a data structure for parsing and
/// working with those options when given an input command line.
namespace llvm {
void EmitOptParser(RecordKeeper &Records, raw_ostream &OS) {
...
  OS << "//////////\n";
  OS << "// Options\n\n";
  auto WriteOptRecordFields = [&](raw_ostream &OS, const Record &R) {
    // The option prefix;
    std::vector<StringRef> prf = R.getValueAsListOfStrings("Prefixes");
    OS << Prefixes[PrefixKeyT(prf.begin(), prf.end())] << ", ";
    // The option string.
    emitNameUsingSpelling(OS, R);
    // The option identifier name.
    OS << ", " << getOptionName(R);
    // The option kind.
    OS << ", " << R.getValueAsDef("Kind")->getValueAsString("Name");
    // The containing option group (if any).
    OS << ", ";
    const ListInit *GroupFlags = nullptr;
    if (const DefInit *DI = dyn_cast<DefInit>(R.getValueInit("Group"))) {
      GroupFlags = DI->getDef()->getValueAsListInit("Flags");
      OS << getOptionName(*DI->getDef());
    } else
      OS << "INVALID";
    // The option alias (if any).
    OS << ", ";
    if (const DefInit *DI = dyn_cast<DefInit>(R.getValueInit("Alias")))
      OS << getOptionName(*DI->getDef());
    else
      OS << "INVALID";
    // The option alias arguments (if any).
    // Emitted as a \0 separated list in a string, e.g. ["foo", "bar"]
    // would become "foo\0bar\0". Note that the compiler adds an implicit
    // terminating \0 at the end.
    OS << ", ";
    std::vector<StringRef> AliasArgs = R.getValueAsListOfStrings("AliasArgs");
    if (AliasArgs.size() == 0) {
      OS << "nullptr";
    } else {
      OS << "\"";
      for (size_t i = 0, e = AliasArgs.size(); i != e; ++i)
        OS << AliasArgs[i] << "\\0";
      OS << "\"";
    }
    // The option flags.
    OS << ", ";
    int NumFlags = 0;
    const ListInit *LI = R.getValueAsListInit("Flags");
    for (Init *I : *LI)
      OS << (NumFlags++ ? " | " : "") << cast<DefInit>(I)->getDef()->getName();
    if (GroupFlags) {
      for (Init *I : *GroupFlags)
        OS << (NumFlags++ ? " | " : "")
           << cast<DefInit>(I)->getDef()->getName();
    }
    if (NumFlags == 0)
      OS << '0';
    // The option parameter field.
    OS << ", " << R.getValueAsInt("NumArgs");
    // The option help text.
    if (!isa<UnsetInit>(R.getValueInit("HelpText"))) {
      OS << ",\n";
      OS << "       ";
      write_cstring(OS, R.getValueAsString("HelpText"));
    } else
      OS << ", nullptr";
    // The option meta-variable name.
    OS << ", ";
    if (!isa<UnsetInit>(R.getValueInit("MetaVarName")))
      write_cstring(OS, R.getValueAsString("MetaVarName"));
    else
      OS << "nullptr";
    // The option Values. Used for shell autocompletion.
    OS << ", ";
    if (!isa<UnsetInit>(R.getValueInit("Values")))
      write_cstring(OS, R.getValueAsString("Values"));
    else
      OS << "nullptr";
  };
  std::vector<std::unique_ptr<MarshallingKindInfo>> OptsWithMarshalling;
  for (unsigned I = 0, E = Opts.size(); I != E; ++I) {
    const Record &R = *Opts[I];
    // Start a single option entry.
    OS << "OPTION(";
    WriteOptRecordFields(OS, R);
    OS << ")\n";
    if (!isa<UnsetInit>(R.getValueInit("MarshallingKind")))
      OptsWithMarshalling.push_back(MarshallingKindInfo::create(R));
  }
  OS << "#endif // OPTION\n";
...
}
...

Overall, the tablegen would takes all option records (of class Option) in Options.td and outputs them as calls to a C macro named OPTION.

Use our -driver-print-jobs as example, it is defined as instance of Flag, which is subclass of Option.

class Flag<list<string> prefixes, string name>
  : Option<prefixes, name, KIND_FLAG>;

If we pass -print-records argument to llvm-tblgen instead of -gen-opt-parser-defs, we can see driver-print-jobs is of class Option, Flag, Group, etc, which are showed after double forelash, '//'.

/wtsc/build/Ninja-ReleaseAssert+swift-DebugAssert/llvm-linux-x86_64/bin/llvm-tblgen \
 -I /wtsc/llvm-project/llvm/include \
 -I /wtsc/swift/include/swift/Option \
 /wtsc/swift/include/swift/Option/Options.td \
 -print-records
...
def driver_print_jobs {	// Option Flag Group Flags InternalDebugOpt HelpText
  string EnumName = ?;
  list<string> Prefixes = ["-"];
  string Name = "driver-print-jobs";
  OptionKind Kind = KIND_FLAG;
  int NumArgs = 0;
  string HelpText = "Dump list of jobs to execute";
  string MetaVarName = ?;
  string Values = ?;
  code ValuesCode = ?;
  list<OptionFlag> Flags = [HelpHidden, DoesNotAffectIncrementalBuild];
  OptionGroup Group = internal_debug_Group;
  Option Alias = ?;
  list<string> AliasArgs = [];
  string MarshallingKind = ?;
  code KeyPath = ?;
  code DefaultValue = ?;
  bit ShouldAlwaysEmit = 0;
  bit IsPositive = 1;
  code NormalizerRetTy = ?;
  code NormalizedValuesScope = [{}];
  code Normalizer = [{}];
  code Denormalizer = [{}];
  list<code> NormalizedValues = ?;
}
...

And the C macro line of -driver-print-jobs has been showed above.

Next, we gonna see how swift use this mechanism to use td files to specify its accepting options and arguments.

In the swift driver executable, its main function calls run_driver function, which instantiate an swift::Driver instance from libswiftDriver library and call its swift::Driver::parseArgString method to parse string arguments into llvm:opt::InputArgList, which calls llvm::opt::OptTable::ParseArgs method of Driver's instance data member opts which is std::unique_ptrllvm::opt::OptTable from libllvmOption library. And the opts is created in the construct of swift::Driver by createSwiftOptTable() which is from libswiftOption library.

Therefore we can see how libswiftOption is built in its CMakeLists.txt file.

/* /wtsc/swift/lib/Option/CMakeLists.txt */
add_swift_host_library(swiftOption STATIC
  Options.cpp
  SanitizerOptions.cpp)
add_dependencies(swiftOption
  SwiftOptions)
target_link_libraries(swiftOption PRIVATE
  swiftBasic)

It shows that libswiftOption depends on SwiftOptions, so we wanna have a look what is SwiftOptions with git-grep.

git -C /wtsc/swift grep -n "(SwiftOptions"
include/swift/Option/CMakeLists.txt:3:swift_add_public_tablegen_target(SwiftOptions)

/* /wtsc/swift/include/swift/Option/CMakeLists.txt */
set(LLVM_TARGET_DEFINITIONS Options.td)
swift_tablegen(Options.inc -gen-opt-parser-defs)
swift_add_public_tablegen_target(SwiftOptions)

Cool, we get to Options.td file, which defines all information about swift's options and arguments.

So let us find out what these three cmake command do. With git-grep and checkout swift_tablegen and swift_add_public_tablegen_target definition, we got the following code snippet that satisfy our purpose.

# /wtsc/llvm-project/llvm/cmake/modules/TableGen.cmake 
...
# ofn means output file name
function(tablegen project ofn)
...
  add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${ofn}
    COMMAND ${${project}_TABLEGEN_EXE} ${ARGN} -I ${CMAKE_CURRENT_SOURCE_DIR}
    ${tblgen_includes}
    ${LLVM_TABLEGEN_FLAGS}
    ${LLVM_TARGET_DEFINITIONS_ABSOLUTE}
    ${tblgen_change_flag}
    ${additional_cmdline}
    # The file in LLVM_TARGET_DEFINITIONS may be not in the current
    # directory and local_tds may not contain it, so we must
    # explicitly list it here:
    DEPENDS ${${project}_TABLEGEN_TARGET} ${${project}_TABLEGEN_EXE}
      ${local_tds} ${global_tds}
    ${LLVM_TARGET_DEFINITIONS_ABSOLUTE}
    COMMENT "Building ${ofn}..."
    )

  # `make clean' must remove all those generated files:
  set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${ofn})

  set(TABLEGEN_OUTPUT ${TABLEGEN_OUTPUT} ${CMAKE_CURRENT_BINARY_DIR}/${ofn} PARENT_SCOPE)
  set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/${ofn} PROPERTIES
    GENERATED 1)
endfunction()

# Creates a target for publicly exporting tablegen dependencies.
function(add_public_tablegen_target target)
...
  add_custom_target(${target}
    DEPENDS ${TABLEGEN_OUTPUT})
  if(LLVM_COMMON_DEPENDS)
    add_dependencies(${target} ${LLVM_COMMON_DEPENDS})
  endif()
  set_target_properties(${target} PROPERTIES FOLDER "Tablegenning")
  set(LLVM_COMMON_DEPENDS ${LLVM_COMMON_DEPENDS} ${target} PARENT_SCOPE)
endfunction()
...

The above code sinppet says that create a custom command to generate a cpp file, and create a custom target which can be depended by other target. When target A depends on target B, B will be built before A, and A can use the output of B. That is what we need. Our libswiftOption target depends on SwiftOption custom target which depends on TABLEGEN_OUTPUT, the output files generated by tablegen custom command, then transitively depends on the tablegen custom command which generate cpp files from td files according to our needs.

By the way, if we use cmake directly to build SwiftOption target, the cmake will do what we have done above to call tablegen with -gen-opt-parser-defs.

cmake --build \
    /wtsc/build/Ninja-ReleaseAssert+swift-DebugAssert/swift-linux-x86_64 \    
    --target SwiftOpionts

As a result of above procedure, building libswiftOption results in SwiftOptions being built first, which means llvm-tblgen, with -gen-opt-parser-defs argument, is run on /wtsc/swift/include/swift/Option/Options.td to produce the file /wtsc/build/Ninja-ReleaseAssert+swift-DebugAssert/swift-linux-x86_64/include/swift/Option/Options.inc which is populated with one call to an OPTION macro for each option, then be integreted into libswiftOption library.

There two place in libswiftOption library using Options.inc file, with git-grep to find.

git -C /wtsc/swift grep -n '#include "swift/Option/Options.inc"'
include/swift/Option/Options.h:47:#include "swift/Option/Options.inc"
lib/Option/Options.cpp:23:#include "swift/Option/Options.inc"
lib/Option/Options.cpp:31:#include "swift/Option/Options.inc"

The first one is for swift::options::ID.

/* /wtsc/swift/include/swift/Option/Options.h */
...
namespace swift {
namespace options {
...
  enum ID {
    OPT_INVALID = 0, // This is not an option ID.
#define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM,  \
               HELPTEXT, METAVAR, VALUES)                                      \
  OPT_##ID,
#include "swift/Option/Options.inc"
    LastOption
#undef OPTION
  };
} //end namespace options
...
} // end namespace swift

The second one is for InfoTable, which is used by SwiftOptTable created by swift::createSwiftOptTable function, which has been mentioned above.

/* /wtsc/swift/lib/Option/Options.cpp */
...
#define PREFIX(NAME, VALUE) static const char *const NAME[] = VALUE;
#include "swift/Option/Options.inc"
#undef PREFIX
static const OptTable::Info InfoTable[] = {
#define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM,  \
               HELPTEXT, METAVAR, VALUES)                                      \
  {PREFIX, NAME,  HELPTEXT,    METAVAR,     OPT_##ID,  Option::KIND##Class,    \
   PARAM,  FLAGS, OPT_##GROUP, OPT_##ALIAS, ALIASARGS, VALUES},
#include "swift/Option/Options.inc"
#undef OPTION
};
namespace {
class SwiftOptTable : public OptTable {
public:
  SwiftOptTable() : OptTable(InfoTable) {}
};
} // end anonymous namespace
std::unique_ptr<OptTable> swift::createSwiftOptTable() {
  return std::unique_ptr<OptTable>(new SwiftOptTable());
}


/* /wtsc/llvm-project/llvm/include/llvm/Option/OptTable.h */
...
/// Provide access to the Option info table.
///
/// The OptTable class provides a layer of indirection which allows Option
/// instance to be created lazily. In the common case, only a few options will
/// be needed at runtime; the OptTable class maintains enough information to
/// parse command lines without instantiating Options, while letting other
/// parts of the driver still use Option instances where convenient.
class OptTable {
public:
  /// Entry for a single option instance in the option data table.
  struct Info {
    /// A null terminated array of prefix strings to apply to name while
    /// matching.
    const char *const *Prefixes;
    const char *Name;
    const char *HelpText;
    const char *MetaVar;
    unsigned ID;
    unsigned char Kind;
    unsigned char Param;
    unsigned short Flags;
    unsigned short GroupID;
    unsigned short AliasID;
    const char *AliasArgs;
    const char *Values;
  };
protected:
  OptTable(ArrayRef<Info> OptionInfos, bool IgnoreCase = false);
..
public:
...
  /// Parse an list of arguments into an InputArgList.
  ///
  /// The resulting InputArgList will reference the strings in [\p ArgBegin,
  /// \p ArgEnd), and their lifetime should extend past that of the returned
  /// InputArgList.
  ///
  /// The only error that can occur in this routine is if an argument is
  /// missing values; in this case \p MissingArgCount will be non-zero.
  ///
  /// \param MissingArgIndex - On error, the index of the option which could
  /// not be parsed.
  /// \param MissingArgCount - On error, the number of missing options.
  /// \param FlagsToInclude - Only parse options with any of these flags.
  /// Zero is the default which includes all flags.
  /// \param FlagsToExclude - Don't parse options with this flag.  Zero
  /// is the default and means exclude nothing.
  /// \return An InputArgList; on error this will contain all the options
  /// which could be parsed.
  InputArgList ParseArgs(ArrayRef<const char *> Args, unsigned &MissingArgIndex,
                         unsigned &MissingArgCount, unsigned FlagsToInclude = 0,
                         unsigned FlagsToExclude = 0) const;
...
}
...

The swift driver eventually calls OptTable::ParseArgs to parse arguments. There is the whole logic for driver to parsing arguments as well.