WTSC 5: Lexer

143 阅读1分钟

With -driver-print-jobs option to swiftc, we can see each job need to been down before we get the final executable binary for our hello world example.

PATH=$PATH:/wtsc/usr/bin
/wtsc/usr/bin/swift-frontend \
 -frontend -c \
 -primary-file /wtsc/helloworld/Helloworld.swift \
 -target x86_64-unknown-linux-gnu \
 -disable-objc-interop \
 -color-diagnostics \
 -module-name helloworld \
 -o /tmp/Helloworld-7ada2e.o
/wtsc/usr/bin/swift-autolink-extract /tmp/Helloworld-7ada2e.o \
 -o /tmp/Helloworld-f438c7.autolink
/wtsc/usr/bin/clang \
 -target x86_64-unknown-linux-gnu \
 -fuse-ld=gold \
 -pie \
 -Xlinker \
 -rpath \
 -Xlinker /wtsc/usr/lib/swift/linux /wtsc/usr/lib/swift/linux/x86_64/swiftrt.o /tmp/Helloworld-7ada2e.o @/tmp/Helloworld-f438c7.autolink \
 -L /wtsc/usr/lib/swift/linux \
 -lswiftCore \
 --target=x86_64-unknown-linux-gnu \
 -o /wtsc/helloworld/helloworld

Moreover, we know that we can invoke those three command lines manually instead to produce the same result. That will offer us an opportunity to use lldb to track how swift-frontend executes with -frontend and -c option. 

lldb -- \
     /wtsc/usr/bin/swift-frontend \
     -frontend \
     -c \
     -primary-file /wtsc/helloworld/Helloworld.swift \
     -target x86_64-unknown-linux-gnu \
     -disable-objc-interop \
     -color-diagnostics \
     -module-name helloworld \
     -o /tmp/Helloworld-7ada2e.o

Afterward, we set a breakpoint on the first statement of run_driver function, and run.

(lldb) b driver.cpp:132
(lldb) run

With next step, we will get into the performFrontend function, which is what we know the actual motion of compiling source code into object file. 

The main idea of swift::performFrontend is to construct an CompilerInstance, and then call performCompile with compilerInstance as argument and other arguments as well, which in turn calls performAction, that is to do the actual work according to the ActionType. In our case, the ActionType is FrontendOptions::ActionType::EmitObject, thus it invoke withSemaniticAnalysis functions, which does lexing, parsing, type checking and so on, moreover, just right before it finishes, it will call its argument function, and return calling result as its return. And in our case, withSemanticAnalysis will call Instance.performSema, which does lexing, parsing, type checking and so, and performCompileStepPostSema, which does AST lowering to SIL, SIL optimization, LLVM IR generation, LLVM backend. 

/* /wtsc/swift/lib/FrontendTool/FrontendTool.cpp */
...
static bool
withSemanticAnalysis(CompilerInstance &Instance, FrontendObserver *observer,
                     llvm::function_ref<bool(CompilerInstance &)> cont) {
  ...
  Instance.performSema();
  ...
  return cont(Instance);
}
...
static bool performAction(CompilerInstance &Instance,
                          int &ReturnValue,
                          FrontendObserver *observer) {
...
switch (Instance.getInvocation().getFrontendOptions().RequestedAction) {
...
case FrontendOptions::ActionType::DumpTypeInfo:
    return withSemanticAnalysis(
        Instance, observer, [&](CompilerInstance &Instance) {
          assert(FrontendOptions::doesActionGenerateSIL(opts.RequestedAction) &&
                 "All actions not requiring SILGen must have been handled!");
          return performCompileStepsPostSema(Instance, ReturnValue, observer);
        });
...
}
...
}
...
/// Performs the compile requested by the user.
/// \param Instance Will be reset after performIRGeneration when the verifier
///                 mode is NoVerify and there were no errors.
/// \returns true on error
static bool performCompile(CompilerInstance &Instance,
                           int &ReturnValue,
                           FrontendObserver *observer) {
...
  bool hadError = performAction(Instance, ReturnValue, observer);
...
}
...
int swift::performFrontend(ArrayRef<const char *> Args,
                           const char *Argv0, void *MainAddr,
                           FrontendObserver *observer)  {
...
int ReturnValue = 0;
  bool HadError = performCompile(*Instance, ReturnValue, observer);
...
}
...

In this post we gonna see some details of Lexer, which take a character stream and turn them into a stream of tokens which will be consumed by Parser. Therefore we now focus on swift::CompilerInstance::performSema method, which invoke swift::performParseAndResolveImportsOnly from library libSema.

/* /wtsc/swift/lib/Frontend/Frontend.cpp */
...
bool CompilerInstance::performParseAndResolveImportsOnly() {
  // Resolve imports for all the source files.
  auto *mainModule = getMainModule();
  for (auto *file : mainModule->getFiles()) {
    if (auto *SF = dyn_cast<SourceFile>(file))
      performImportResolution(*SF);
  }
...
}
...
void CompilerInstance::performSema() {
  performParseAndResolveImportsOnly();
  FrontendStatsTracer tracer(getStatsReporter(), "perform-sema");
  forEachFileToTypeCheck([&](SourceFile &SF) {
    performTypeChecking(SF);
  });
  finishTypeChecking();
}
...

The swift::performParseAndResolveImportsOnly will walks the AST to resolve imports, that causes it to parse the file to know what are imported.

/* /wtsc/swift/lib/Sema/ImportResolution.cpp */
...
/// performImportResolution - This walks the AST to resolve imports.
///
/// Before we can type-check a source file, we need to make declarations
/// imported from other modules available. This is done by processing top-level
/// \c ImportDecl nodes, along with related validation.
///
/// Import resolution operates on a parsed but otherwise unvalidated AST.
void swift::performImportResolution(SourceFile &SF) {
  ...
  // Resolve each import declaration.
  for (auto D : SF.getTopLevelDecls())
    resolver.visit(D);
  ...
}
...

The swift::SourceFile::getTopLevelDecls method is the one to trigger its representing file to be parsed into AST (Abstract Syntax Tree). 

/* /wtsc/swift/lib/AST/Module.cpp */
...
ArrayRef<Decl *> SourceFile::getTopLevelDecls() const {
  auto &ctx = getASTContext();
  auto *mutableThis = const_cast<SourceFile *>(this);
  return evaluateOrDefault(ctx.evaluator, ParseSourceFileRequest{mutableThis},
                           {}).TopLevelDecls;
}
...

The secret evaluateOrDefault looks like this, simplifiedly, it just call request evaluate method.

/* /wtsc/swift/include/swift/AST/Evaluator.h */
...
/// Evaluates a given request or returns a default value if a cycle is detected.
template <typename Request>
typename Request::OutputType
evaluateOrDefault(
  Evaluator &eval, Request req, typename Request::OutputType def) {
  auto result = eval(req);
  if (auto err = result.takeError()) {
    llvm::handleAllErrors(std::move(err),
      [](const CyclicalRequestError<Request> &E) {
        // cycle detected
      });
    return def;
  }
  return *result;
}
...

Therefore we need to check out what is inside ParseSourceFileRequest, and its evaluate method.

/* /wtsc/swift/include/AST/ParseRequests.h */
...
/// Parse the top-level decls of a SourceFile.
class ParseSourceFileRequest
    : public SimpleRequest<
          ParseSourceFileRequest, SourceFileParsingResult(SourceFile *),
          RequestFlags::SeparatelyCached | RequestFlags::DependencySource> {
...
private:
...
// Evaluation.
  SourceFileParsingResult evaluate(Evaluator &evaluator, SourceFile *SF) const;
...
}
...

/* /wtsc/swift/include/AST/ParseRequests.cpp */
...
SourceFileParsingResult ParseSourceFileRequest::evaluate(Evaluator &evaluator,
                                                         SourceFile *SF) const {
...
  Parser parser(*bufferID, *SF, /*SIL*/ nullptr, state, sTreeCreator);
  ...
  parser.parseTopLevel(decls);
...
}
...

Eventually, we get to our swift::Parser, and its entry point swift::Parser::parseTopLevel method.

So let us check out swift::Parser's constructor and swift::Parser::parseTopLevel method.

/* /wtsc/swift/lib/Parse/Pasrer.cpp */
...
Parser::Parser(unsigned BufferID, SourceFile &SF, DiagnosticEngine* LexerDiags,
               SILParserStateBase *SIL,
               PersistentParserState *PersistentState,
               std::shared_ptr<SyntaxParseActions> SPActions)
    : Parser(
          std::unique_ptr<Lexer>(new Lexer(
              SF.getASTContext().LangOpts, SF.getASTContext().SourceMgr,
              BufferID, LexerDiags,
              sourceFileKindToLexerMode(SF.Kind),
              SF.Kind == SourceFileKind::Main
                  ? HashbangMode::Allowed
                  : HashbangMode::Disallowed,
              SF.getASTContext().LangOpts.AttachCommentsToDecls
                  ? CommentRetentionMode::AttachToNextToken
                  : CommentRetentionMode::None,
              SF.shouldBuildSyntaxTree()
                  ? TriviaRetentionMode::WithTrivia
                  : TriviaRetentionMode::WithoutTrivia)),
          SF, SIL, PersistentState, std::move(SPActions)) {}
...

/* /swtsc/swift/lib/Parse/ParseDecl.cpp */
...
/// Main entrypoint for the parser.
///
/// \verbatim
///   top-level:
///     stmt-brace-item*
///     decl-sil       [[only in SIL mode]
///     decl-sil-stage [[only in SIL mode]
/// \endverbatim
void Parser::parseTopLevel(SmallVectorImpl<Decl *> &decls) {
...
parseBraceItems(items, allowTopLevelCode()
                               ? BraceItemListKind::TopLevelCode
                               : BraceItemListKind::TopLevelLibrary);
...
}
...

From the constructor, we can see it creates a lexer to lex source file into tokens, then in parseXXX method, it parses different construct by consuming tokens which are supplied by its lexer data member. We will talk about those paseXXX method later in next post.

The gist of lexer is that, it owns a data member named NextToken of type swift::Token. The NextToken represents the current token which is the newest recognized token by lexer. When the lexer is asked to return a token, it would makes a copy of NextToken, and recognize next token and store it back to NextToken, then return that copy token. 

So you can see why it is call NextToken not CurrentToken. Therefore, the current token is used in the Paser, and the next token is kept in Lexer for next use of Parser. 

The core algorithm of lexer is to mimic the mechanism of FSA (Finite State Automata), that is each token type, such as keyword, identifier, operator or so, is represented by an regular expression, the FSA is to recognize each token with the regular expression, then return that token. The major code for that logic sit on swift::Lexer::LexImpl method.

/* /wtsc/swift/lib/Pase/Lexer.cpp */
//===----------------------------------------------------------------------===//
// Main Lexer Loop
//===----------------------------------------------------------------------===//

void Lexer::lexImpl() {
...
// Remember the start of the token so we can form the text range.
  const char *TokStart = CurPtr;
switch ((signed char)*CurPtr++) {
...
  case '@': return formToken(tok::at_sign, TokStart);
  case '{': return formToken(tok::l_brace, TokStart);
...
  case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G':
  case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N':
  case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T': case 'U':
  case 'V': case 'W': case 'X': case 'Y': case 'Z':
  case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g':
  case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n':
  case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u':
  case 'v': case 'w': case 'x': case 'y': case 'z':
  case '_':
    return lexIdentifier();
...
}
}
...

In addition, you can checkout the document about the pattern of each kind of Token.

One more thing.

It is quite pity that there is an option -dump-ast which causes compiler to show the type-checked AST of source code on standard error output, or -dump-parse option without type-checking, but no -dump-token option which will cause compiler to list all tokens of source code.

swiftc -dump-ast Helloworld.swift
(source_file "Helloworld.swift"
  (top_level_code_decl range=[Helloworld.swift:1:1 - line:1:21]
    (brace_stmt implicit range=[Helloworld.swift:1:1 - line:1:21]
      (call_expr type='()' location=Helloworld.swift:1:1 range=[Helloworld.swift:1:1 - line:1:21] nothrow arg_labels=_:
        (declref_expr type='(Any..., String, String) -> ()' location=Helloworld.swift:1:1 range=[Helloworld.swift:1:1 - line:1:1] decl=Swift.(file).print(_:separator:terminator:) function_ref=single)
        (tuple_expr type='(Any..., separator: String, terminator: String)' location=Helloworld.swift:1:6 range=[Helloworld.swift:1:6 - line:1:21] names='',separator,terminator
          (vararg_expansion_expr implicit type='[Any]' location=Helloworld.swift:1:7 range=[Helloworld.swift:1:7 - line:1:7]
            (array_expr implicit type='[Any]' location=Helloworld.swift:1:7 range=[Helloworld.swift:1:7 - line:1:7] initializer=**NULL**
              (erasure_expr implicit type='Any' location=Helloworld.swift:1:7 range=[Helloworld.swift:1:7 - line:1:7]
                (string_literal_expr type='String' location=Helloworld.swift:1:7 range=[Helloworld.swift:1:7 - line:1:7] encoding=utf8 value="Hello world!" builtin_initializer=Swift.(file).String extension.init(_builtinStringLiteral:utf8CodeUnitCount:isASCII:) initializer=**NULL**))))
          (default_argument_expr implicit type='String' location=Helloworld.swift:1:6 range=[Helloworld.swift:1:6 - line:1:6] default_args_owner=Swift.(file).print(_:separator:terminator:) param=1)
          (default_argument_expr implicit type='String' location=Helloworld.swift:1:6 range=[Helloworld.swift:1:6 - line:1:6] default_args_owner=Swift.(file).print(_:separator:terminator:) param=2))))))

swiftc -dump-parse Helloworld.swift
(source_file "Helloworld.swift"
  (top_level_code_decl range=[Helloworld.swift:1:1 - line:1:21]
    (brace_stmt implicit range=[Helloworld.swift:1:1 - line:1:21]
      (call_expr type='<null>' arg_labels=_:
        (unresolved_decl_ref_expr type='<null>' name=print function_ref=unapplied)
        (paren_expr type='<null>'
          (string_literal_expr type='<null>' encoding=utf8 value="Hello world!" builtin_initializer=**NULL** initializer=**NULL**))))))

Therefore we are going to add this feature to swift compiler, to accomplish following output.

swiftc -dump-token Helloworld.swift
identifier'print'		line:1:1 [StartOfLine]
l_paren'('			line:1:6
string_literal'"Hello world!"'	line:1:7
r_paren')'			line:1:21
eof''			        line:2:1 [StartOfLine]

There are 9 files needed to modified to add -dump-token option.

modified:   include/swift/Option/Options.td
modified:   lib/Frontend/ArgsToFrontendOptionsConverter.cpp
modified:   include/swift/Frontend/Frontend.h
modified:   lib/Frontend/Frontend.cpp
modified:   include/swift/Frontend/FrontendOptions.h
modified:   lib/Frontend/FrontendOptions.cpp
modified:   include/swift/Parse/Lexer.h
modified:   lib/Driver/Driver.cpp
modified:   lib/FrontendTool/FrontendTool.cpp

Like mentioning above, in the function performAction in FrontendTool.cpp, it checks the FrontendOptions::ActionType to determine its action, what it should do. Therefore, we need to add one more ActionType, FrontendOptions::ActionType::DumpToken, and corresponding switch-case, as following.

...
static bool performAction(CompilerInstance &Instance,
                          int &ReturnValue,
                          FrontendObserver *observer) {
...
switch (Instance.getInvocation().getFrontendOptions().RequestedAction) {
...
case FrontendOptions::ActionType::DumpToken: {
    SmallVector<Token, 16> tokens;
    auto *mainModule = Instance.getMainModule();
    for (auto *file : mainModule->getFiles()) {
      if (auto *SF = dyn_cast<SourceFile>(file)) {
        Instance.performLexingOnly(tokens, SF);
        for (auto &tok : tokens) {
          swift::dumpTokenKind(llvm::errs(), tok.getKind());
          llvm::errs() << "'" << tok.getText() << "'\t\t\t";
          tok.getLoc().printLineAndColumn(llvm::errs(), SF->getASTContext().SourceMgr);
          if (tok.isAtStartOfLine())
            llvm::errs() << " [StartOfLine]";
          llvm::errs() << '\n';
        }
      }
    }
    return false;
  }
...
}
...
}
...

Before we talk about how to add that ActionType, I wanna finish how we call lexer to lex all tokens of a souce file, which is implemented in CompilerInstance::performLexingOnly method as following. It mimics how the swift::Parser uses swift::Lexer.

/* /wtsc/swift/include/swift/Frontend/Frontend.h */
...
class CompilerInstance {
...
public:
...
void performLexingOnly(
      SmallVectorImpl<Token> &tokens,
      SourceFile *SF);
...
}
...

/* /wtsc/swift/include/swift/Frontend/Frontend.cpp */
...
static LexerMode sourceFileKindToLexerMode(SourceFileKind kind) {
  switch (kind) {
  case swift::SourceFileKind::Interface:
    return LexerMode::SwiftInterface;
  case swift::SourceFileKind::SIL:
    return LexerMode::SIL;
  case swift::SourceFileKind::Library:
  case swift::SourceFileKind::Main:
    return LexerMode::Swift;
  }
  llvm_unreachable("covered switch");
}
...
void CompilerInstance::performLexingOnly(SmallVectorImpl<Token> &tokens,
                                         SourceFile *SF) {
  Lexer lexer(SF->getASTContext().LangOpts, SF->getASTContext().SourceMgr,
              *SF->getBufferID(), &SF->getASTContext().Diags,
              sourceFileKindToLexerMode(SF->Kind),
              SF->Kind == SourceFileKind::Main ? HashbangMode::Allowed
                                               : HashbangMode::Disallowed,
              SF->getASTContext().LangOpts.AttachCommentsToDecls
                  ? CommentRetentionMode::AttachToNextToken
                  : CommentRetentionMode::None,
              SF->shouldBuildSyntaxTree() ? TriviaRetentionMode::WithTrivia
                                          : TriviaRetentionMode::WithoutTrivia);
  Token Tok;
  ParsedTrivia LeadingTrivia;
  ParsedTrivia TrailingTrivia;
  while (Tok.isNot(tok::eof)) {
    lexer.lex(Tok, LeadingTrivia, TrailingTrivia);
    tokens.push_back(Tok);
  }
}
...

Now, we get back to deal with our FrontendOptions::ActionType.

The FrontendOptions::ActionType is actually an enum class, so we add one more case to it.

/* /wtsc/swift/include/swift/Frontend/FrontendOptions. */
...
enum class ActionType {
    ...
    DumpAST,           ///< Parse, type-check, and dump AST
    DumpToken,         ///< Lex, and dump Tokens
    ...
  };
...

Afterward, we need to see where the Instance.getInvocation().getFrontendOptions().RequestedAction is defined by tracking back to its definition. Then we can see RequestedAction is a data member of FrontendOptions which is a data member of CompilerInvocation.

/* /wtsc/swift/include/swift/Frontend/Frontend.h */
...
/// The abstract configuration of the compiler, including:
///   - options for all stages of translation,
///   - information about the build environment,
///   - information about the job being performed, and
///   - lists of inputs and outputs.
///
/// A CompilerInvocation can be built from a frontend command line
/// using parseArgs.  It can then be used to build a CompilerInstance,
/// which manages the actual compiler execution.
class CompilerInvocation {
...
  FrontendOptions FrontendOpts;
...
public:
...
  FrontendOptions &getFrontendOptions() { return FrontendOpts; }
  const FrontendOptions &getFrontendOptions() const { return FrontendOpts; }
...
}
...

Thus, we are going to check how the value of CompilerInvocation is given, that is how the value of CompilerInstance is defined, because CompilerInvocation is a data member of CompilerInstance as well. We can start from swift::performFrontend function due to it is the entry point of the swift compiler frontend.

/* /wtsc/swift/lib/FrontendToll/FrontendTool.cpp */
...
int swift::performFrontend(ArrayRef<const char *> Args, const char *Argv0,
                           void *MainAddr, FrontendObserver *observer) {
...
  std::unique_ptr<CompilerInstance> Instance =
      std::make_unique<CompilerInstance>();
...
  CompilerInvocation Invocation;
  SmallString<128> workingDirectory;
  llvm::sys::fs::current_path(workingDirectory);
  std::string MainExecutablePath =
      llvm::sys::fs::getMainExecutable(Argv0, MainAddr);
  // Parse arguments.
  SmallVector<std::unique_ptr<llvm::MemoryBuffer>, 4> configurationFileBuffers;
  if (Invocation.parseArgs(Args, Instance->getDiags(),
                           &configurationFileBuffers, workingDirectory,
                           MainExecutablePath)) {
    return finishDiagProcessing(1, /*verifierEnabled*/ false);
  }
...
  if (Instance->setup(Invocation)) {
    return finishDiagProcessing(1, /*verifierEnabled*/ false);
  }
...
  int ReturnValue = 0;
  bool HadError = performCompile(*Instance, ReturnValue, observer);
...
}
...

Here, we know that after we make a CompilerInstance, Instance, we make an CompilerInvocation, Invocation, as well. then call Invocation.parseArgs to set its options member. Later on, use Invocation to setup CompilerInstance before performCompile is invoked.

The significant part is sit inside Invocation.parseArgs method, let us check it out.

/* /wtsc/swift/lib/Frontend/CompilerInvocation.cpp */
...
static bool ParseFrontendArgs(
    FrontendOptions &opts, ArgList &args, DiagnosticEngine &diags,
    SmallVectorImpl<std::unique_ptr<llvm::MemoryBuffer>> *buffers) {
  ArgsToFrontendOptionsConverter converter(diags, args, opts);
  return converter.convert(buffers);
}
...
bool CompilerInvocation::parseArgs(
    ArrayRef<const char *> Args, DiagnosticEngine &Diags,
    SmallVectorImpl<std::unique_ptr<llvm::MemoryBuffer>>
        *ConfigurationFileBuffers,
    StringRef workingDirectory, StringRef mainExecutablePath) {
...
   if (ParseFrontendArgs(FrontendOpts, ParsedArgs, Diags,
                        ConfigurationFileBuffers)) {
    return true;
  }
...
}
...

It calls ParseFrontendArgs method in turn, which uses an ArgsToFrontendOptionsConverter to convert the arguments to options.

/* /wtsc/swift/lib/Frontend/ArgsToFrontendOptionsConverter.cpp */
...
#include "swift/Option/Options.h"
...
bool ArgsToFrontendOptionsConverter::convert(
    SmallVectorImpl<std::unique_ptr<llvm::MemoryBuffer>> *buffers) {
...
  if (Opts.RequestedAction == FrontendOptions::ActionType::NoneAction) {
    Opts.RequestedAction = determineRequestedAction(Args);
  }
...
}
...
FrontendOptions::ActionType
ArgsToFrontendOptionsConverter::determineRequestedAction(const ArgList &args) {
...
  if (Opt.matches(OPT_dump_ast))
    return FrontendOptions::ActionType::DumpAST;
  if (Opt.matches(OPT_dump_token))
    return FrontendOptions::ActionType::DumpToken;
...
}
...

In the ArgsToFrontendOptionsConverter::convert method, we can see it sets the Opts.RequestedAction, which will be swith-cased in performAction, as return result of ArgsToFrontendOptionsConverter::determineRequestedAction method if Opts.RequestedAction has not been set, its initial value is FrontendOptions::ActionType::NoneAction.

Therefore, we need to add one if-condition statement to let it recognizes our -dump-token option and return FrontendOptions::ActionType::DumpToken we have already added.

By noticing the OPT_XXX pattern, we know that it is generated by llvm tablegen tool with its target description file (.td). As showed in /wtsc/swift/include/swift/Option/Options.h which is included in /wtsc/swift/lib/Frontend/ArgsToFrontendOptionsConverter.cpp .

/* /wtsc/swift/include/swift/Option/Options.h */
...
  enum ID {
    OPT_INVALID = 0, // This is not an option ID.
#define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM,  \
               HELPTEXT, METAVAR, VALUES)                                      \
  OPT_##ID,
#include "swift/Option/Options.inc"
    LastOption
#undef OPTION
  };
...

And the Options.inc is the generated file from Options.td by tablegen tool which lists all the options swift compiler will accept. Therefore, we add our -dump-token option inside it.

// /wtsc/swift/include/swift/Option/Options.td
def dump_token : Flag<["-"], "dump-token">,
  HelpText<"Lexes input file(s) and dumps tokens">, ModeOpt,
  Flags<[FrontendOption, NoInteractiveOption, DoesNotAffectIncrementalBuild]>;

Now, it seems all works have been done. We added our -dump-token option to Options.td file to let compiler recognize our new option. We modified ArgsToFrontendOptionsConverter to know -dump-token and convert to ActionType::DumpToken. We added a new case FrontendOptions::ActionType::DumpToken to enter the proceduce of output lexed tokens of source files on standard error, llvm::errs(). We implemented the proceduce of token output.

So it is ready to recompile the whole project to get our fruit.

/wtsc/swift/utils/build-script \
 --skip-build-benchmarks \
 --cmake-c-launcher="$(which sccache)" --cmake-cxx-launcher="$(which sccache)" \
 --release \
 --debug-swift \
 --llvm-targets-to-build="X86" \
 --install-destdir=/wtsc \
 --install-all

But unfortunately, there are some errors saying that there is a switch-case ActionType::DumpToken has not been deal with in file 

/wtsc/swift/lib/Frontend/FrontendOptions.cpp

So we get to those error switch-case and add case for ActionType::DumpToken to return result as ActionType::Parse. Because our dump-token likes Parse but only lexing, which is the frontmost step of parsing.

One more change is about the output file type, becuase our new option will not make an output file, therefore its output file type should be TY_Nothing, by seting OI.CompilerOutputType as file_types::TY_Nothing.

/* /wtsc/swift/lib/Driver/Driver.cpp */
...
void Driver::buildOutputInfo(const ToolChain &TC, const DerivedArgList &Args,
                             const bool BatchMode, const InputFileList &Inputs,
                             OutputInfo &OI) const {
...
  switch (OutputModeArg->getOption().getID()) {
...
    case options::OPT_dump_token:
      OI.CompilerOutputType = file_types::TY_Nothing;
      break;
...
}
...
}
...

After those modifications, recompile again. This time the compliation succeeds. We can see the result as following.

swiftc -dump-token Helloworld.swift
identifier'print'			line:1:1 [StartOfLine]
l_paren'('			line:1:6
string_literal'"Hello world!"'			line:1:7
r_paren')'			line:1:21
eof''			line:2:1 [StartOfLine]

But the format is somehow wierd, so the reader can twists the output to make it prettier.

It is quite simple and fun to add new option and do something else by taking advantage of swift compiler's infrastructure. In next post, we will take a look of Parser.