2020-09-11 Java 语法规则

105 阅读4分钟

1 概述

本文适用 Java 8 及以下版本,以下称 Java

2 Java 词法规则(lexer grammar)

笔者将以自下而上的顺序进行词法规则(lexer grammar) 的介绍

2.1 关键字(keywords)

以下是 Java 中关键字/保留字的词法规则:

ABSTRACT: 'abstract';    ASSERT:       'assert';          BOOLEAN:   'boolean';      BREAK:      'break';
BYTE:     'byte';        CASE:         'case';            CATCH:     'catch';        CHAR:       'char';
CLASS:    'class';       CONST:        'const';           CONTINUE:  'continue';     DEFAULT:    'default';
DO:       'do';          DOUBLE:       'double';          LSE:       'else';         ENUM:       'enum';
EXTENDS:  'extends';     FINAL:        'final';           FINALLY:   'finally';      FLOAT:      'float';
FOR:      'for';         IF:           'if';              GOTO:      'goto';         IMPLEMENTS: 'implements';
IMPORT:   'import';      INSTANCEOF:   'instanceof';      INT:       'int';          INTERFACE:  'interface';
LONG:     'long';        NATIVE:       'native';          NEW:       'new';          PACKAGE:    'package';
PRIVATE:  'private';     PROTECTED:    'protected';       PUBLIC:    'public';       RETURN:     'return';
SHORT:    'short';       STATIC:       'static';          STRICTFP:  'strictfp';     SUPER:      'super';
SWITCH:   'switch';      SYNCHRONIZED: 'synchronized';    THIS:      'this';         THROW:      'throw';
THROWS:   'throws';      TRANSIENT:    'transient';       TRY:       'try';          VOID:       'void';
VOLATILE: 'volatile';    WHILE:        'while';

2.2 片段规则(fragment rule)

片段(fragment)规则只能为其他词法规则提供基础,而不参与到语法规则的解析中


数字(Digits)、指数部分(ExponentPart):

fragment Digits
    : [0-9] ([0-9_]* [0-9])?
    ;
fragment ExponentPart
    : [eE] [+-]? Digits
    ;

字母(Letter)、字母或数字(LetterOrDigit):

fragment Letter
    : [a-zA-Z$_]
    | ~[\u0000-\u007F\uD800-\uDBFF]
    | [\uD800-\uDBFF] [\uDC00-\uDFFF]
    ;
fragment LetterOrDigit
    : Letter
    | [0-9]
    ;

十六进制数字(HexDigits、HexDigit)、转义序列(EscapeSequence):

fragment HexDigit
    : [0-9a-fA-F]
    ;
fragment HexDigits
    : HexDigit ((HexDigit | '_')* HexDigit)?
    ;
fragment EscapeSequence
    : '\' [btnfr"'\]
    | '\' ([0-3]? [0-7])? [0-7]
    | '\' 'u'+ HexDigit HexDigit HexDigit HexDigit
    ;

2.3 字面量(literals)

十进制字面量:

DECIMAL_LITERAL: ('0' | [1-9] (Digits? | '_'+ Digits)) [lL]?;

十六进制字面量:

HEX_LITERAL: '0' [xX] [0-9a-fA-F] ([0-9a-fA-F_]* [0-9a-fA-F])? [lL]?;

八进制字面量:

OCT_LITERAL: '0' '_'* [0-7] ([0-7_]* [0-7])? [lL]?;

二进制字面量:

BINARY_LITERAL: '0' [bB] [01] ([01_]* [01])? [lL]?;

浮点数字面量:

FLOAT_LITERAL: (Digits '.' Digits? | '.' Digits) ExponentPart? [fFdD]?
             | Digits (ExponentPart [fFdD]? | [fFdD])
             ;

十六进制浮点数字面量:

HEX_FLOAT_LITERAL: '0' [xX] (HexDigits '.'? | HexDigits? '.' HexDigits) [pP] [+-]? Digits [fFdD]?;

布尔字面量:

BOOL_LITERAL: 'true'
            | 'false'
            ;

字符字面量:

CHAR_LITERAL: ''' (~['\\r\n] | EscapeSequence) ''';

字符串字面量:

STRING_LITERAL: '"' (~["\\r\n] | EscapeSequence)* '"';

null字面量:

NULL_LITERAL: 'null';

2.4 分隔符(separators)

以下是对分隔符的定义:

LPAREN: '(';    RPAREN: ')';
LBRACE: '{';    RBRACE: '}';
LBRACK: '[';    RBRACK: ']';
SEMI: ';';      COMMA: ',';
DOT: '.';

2.5 操作符(operators)

以下是对操作符的定义:

ASSIGN: '=';            LT: '<';                GT: '>';                    BANG: '!';
EQUAL: '==';            LE: '<=';               GE: '>=';                   NOTEQUAL: '!=';
AND: '&&';              OR: '||';               BITAND: '&';                BITOR: '|';
ADD: '+';               SUB: '-';               MUL: '*';                   DIV: '/';
INC: '++';              DEC: '--';              CARET: '^';                 MOD: '%';
TILDE: '~';             QUESTION: '?';          COLON: ':';

// 委派(assign)
ADD_ASSIGN: '+=';       SUB_ASSIGN: '-=';       MUL_ASSIGN: '*=';           DIV_ASSIGN: '/=';
AND_ASSIGN: '&=';       OR_ASSIGN: '|=';        XOR_ASSIGN: '^=';           MOD_ASSIGN: '%=';
LSHIFT_ASSIGN: '<<=';   RSHIFT_ASSIGN: '>>=';   URSHIFT_ASSIGN: '>>>=';

// Java 8 中的词法
ARROW: '->';            COLONCOLON: '::';

// 词法规范中未定义的附加符号
AT: '@';                ELLIPSIS: '...';

// 空白字符和注释
WS:             [ \t\r\n\u000C]+ -> channel(HIDDEN);
COMMENT:        '/*' .*? '*/'    -> channel(HIDDEN);
LINE_COMMENT:   '//' ~[\r\n]*    -> channel(HIDDEN);

// 标识符
IDENTIFIER:     Letter LetterOrDigit*;

以上便是Java的词法规则,接下来将基于词法规则介绍语法规则

3 Java 语法规则(parser grammar)

笔者将以广度优先搜索的顺序进行语法规则(parser grammar)的列举。经过梳理,一共11层,以下是各层中的语法规则:

     1 compilationUnit
     2 packageDeclaration importDeclaration typeDeclaration
     3 annotation qualifiedName classOrInterfaceModifier classDeclaration enumDeclaration interfaceDeclaration
       annotationTypeDeclaration
     4 altAnnotationQualifiedName elementValuePairs elementValue typeParameters typeType typeList classBody
       enumConstants enumBodyDeclarations interfaceBody annotationTypeBody
     5 elementValuePair expression elementValueArrayInitializer typeParameter classOrInterfaceType primitiveType
       classBodyDeclaration enumConstant interfaceBodyDeclaration annotationTypeElementDeclaration
     6 elementValue primary methodCall nonWildcardTypeArguments innerCreator superSuffix explicitGenericInvocation
       creator lambdaExpression typeArguments classType typeBound block modifier memberDeclaration arguments
       interfaceMemberDeclaration annotationTypeElementRest
     7 literal typeTypeOrVoid explicitGenericInvocationSuffix expressionList nonWildcardTypeArgumentsOrDiamond
       createdName classCreatorRest arrayCreatorRest lambdaParameters lambdaBody typeArgument blockStatement
       methodDeclaration genericMethodDeclaration fieldDeclaration constructorDeclaration genericConstructorDeclaration
       constDeclaration interfaceMethodDeclaration genericInterfaceMethodDeclaration annotationMethodOrConstantRest
     8 integerLiteral floatLiteral typeArgumentsOrDiamond arrayInitializer formalParameterList localVariableDeclaration
       statement localTypeDeclaration formalParameters qualifiedNameList methodBody variableDeclarators
       constantDeclarator interfaceMethodModifier annotationMethodRest annotationConstantRest
     9 formalParameter lastFormalParameter variableModifier parExpression catchClause forControl finallyBlock
       resourceSpecification switchBlockStatementGroup switchLabel variableDeclarator variableInitializer defaultValue
     10 variableDeclaratorId catchType enhancedForControl forInit resources
     11 resource

3.1 第1层

/*
     1 compilationUnit
*/
compilationUnit
    : packageDeclaration? importDeclaration* typeDeclaration* EOF
    ;

3.2 第2层

/*
     2 packageDeclaration importDeclaration typeDeclaration
*/
packageDeclaration
    : annotation* PACKAGE qualifiedName ';'
    ;
importDeclaration
    : IMPORT STATIC? qualifiedName ('.' '*')? ';'
    ;
typeDeclaration
    : classOrInterfaceModifier*
      (classDeclaration | enumDeclaration | interfaceDeclaration | annotationTypeDeclaration)
    | ';'
    ;

3.3 第3层

/*
     3 annotation qualifiedName classOrInterfaceModifier classDeclaration enumDeclaration interfaceDeclaration
       annotationTypeDeclaration
*/
annotation
    : ('@' qualifiedName | altAnnotationQualifiedName) ('(' ( elementValuePairs | elementValue )? ')')?
    ;
qualifiedName
    : IDENTIFIER ('.' IDENTIFIER)*
    ;
classOrInterfaceModifier
    : annotation
    | PUBLIC
    | PROTECTED
    | PRIVATE
    | STATIC
    | ABSTRACT
    | FINAL
    | STRICTFP
    ;
classDeclaration
    : CLASS IDENTIFIER typeParameters?
      (EXTENDS typeType)?
      (IMPLEMENTS typeList)?
      classBody
    ;
enumDeclaration
    : ENUM IDENTIFIER (IMPLEMENTS typeList)? '{' enumConstants? ','? enumBodyDeclarations? '}'
    ;
interfaceDeclaration
    : INTERFACE IDENTIFIER typeParameters? (EXTENDS typeList)? interfaceBody
    ;
annotationTypeDeclaration
    : '@' INTERFACE IDENTIFIER annotationTypeBody
    ;

3.4 第4层

/*
     4 altAnnotationQualifiedName elementValuePairs elementValue typeParameters typeType typeList classBody
       enumConstants enumBodyDeclarations interfaceBody annotationTypeBody
*/
altAnnotationQualifiedName
    : (IDENTIFIER DOT)* '@' IDENTIFIER
    ;
elementValuePairs
    : elementValuePair (',' elementValuePair)*
    ;
typeParameters
    : '<' typeParameter (',' typeParameter)* '>'
    ;
typeType
    : annotation? (classOrInterfaceType | primitiveType) ('[' ']')*
    ;
typeList
    : typeType (',' typeType)*
    ;
classBody
    : '{' classBodyDeclaration* '}'
    ;
enumConstants
    : enumConstant (',' enumConstant)*
    ;
enumBodyDeclarations
    : ';' classBodyDeclaration*
    ;
interfaceBody
    : '{' interfaceBodyDeclaration* '}'
    ;
annotationTypeBody
    : '{' (annotationTypeElementDeclaration)* '}'
    ;

3.5 第5层

/*
     5 elementValuePair expression elementValueArrayInitializer typeParameter classOrInterfaceType primitiveType
       classBodyDeclaration enumConstant interfaceBodyDeclaration annotationTypeElementDeclaration
*/
elementValuePair
    : IDENTIFIER '=' elementValue
    ;
expression
    : primary
    | expression bop='.'
      ( IDENTIFIER
      | methodCall
      | THIS
      | NEW nonWildcardTypeArguments? innerCreator
      | SUPER superSuffix
      | explicitGenericInvocation
      )
    | expression '[' expression ']'
    | methodCall
    | NEW creator
    | '(' typeType ')' expression
    | expression postfix=('++' | '--')
    | prefix=('+'|'-'|'++'|'--') expression
    | prefix=('~'|'!') expression
    | expression bop=('*'|'/'|'%') expression
    | expression bop=('+'|'-') expression
    | expression ('<' '<' | '>' '>' '>' | '>' '>') expression
    | expression bop=('<=' | '>=' | '>' | '<') expression
    | expression bop=INSTANCEOF typeType
    | expression bop=('==' | '!=') expression
    | expression bop='&' expression
    | expression bop='^' expression
    | expression bop='|' expression
    | expression bop='&&' expression
    | expression bop='||' expression
    |  expression bop='?' expression ':' expression
    |  expression
      bop=('=' | '+=' | '-=' | '*=' | '/=' | '&=' | '|=' | '^=' | '>>=' | '>>>=' | '<<=' | '%=')
      expression
    | lambdaExpression // Java8
    // Java 8 methodReference
    | expression '::' typeArguments? IDENTIFIER
    | typeType '::' (typeArguments? IDENTIFIER | NEW)
    | classType '::' typeArguments? NEW
    ;
elementValueArrayInitializer
    : '{' (elementValue (',' elementValue)*)? (',')? '}'
    ;
typeParameter
    : annotation* IDENTIFIER (EXTENDS typeBound)?
    ;
classOrInterfaceType
    : IDENTIFIER typeArguments? ('.' IDENTIFIER typeArguments?)*
    ;
primitiveType
    : BOOLEAN
    | CHAR
    | BYTE
    | SHORT
    | INT
    | LONG
    | FLOAT
    | DOUBLE
    ;
classBodyDeclaration
    : ';'
    | STATIC? block
    | modifier* memberDeclaration
    ;
enumConstant
    : annotation* IDENTIFIER arguments? classBody?
    ;
interfaceBodyDeclaration
    : modifier* interfaceMemberDeclaration
    | ';'
    ;
annotationTypeElementDeclaration
    : modifier* annotationTypeElementRest
    | ';'
    ;

3.6 第6层

/*
     6 elementValue primary methodCall nonWildcardTypeArguments innerCreator superSuffix explicitGenericInvocation
       creator lambdaExpression typeArguments classType typeBound block modifier memberDeclaration arguments
       interfaceMemberDeclaration annotationTypeElementRest
*/
elementValue
    : expression
    | annotation
    | elementValueArrayInitializer
    ;
primary
    : '(' expression ')'
    | THIS
    | SUPER
    | literal
    | IDENTIFIER
    | typeTypeOrVoid '.' CLASS
    | nonWildcardTypeArguments (explicitGenericInvocationSuffix | THIS arguments)
    ;
methodCall
    : IDENTIFIER '(' expressionList? ')'
    | THIS '(' expressionList? ')'
    | SUPER '(' expressionList? ')'
    ;
nonWildcardTypeArguments
    : '<' typeList '>'
    ;
innerCreator
    : IDENTIFIER nonWildcardTypeArgumentsOrDiamond? classCreatorRest
    ;
superSuffix
    : arguments
    | '.' IDENTIFIER arguments?
    ;
explicitGenericInvocation
    : nonWildcardTypeArguments explicitGenericInvocationSuffix
    ;
creator
    : nonWildcardTypeArguments createdName classCreatorRest
    | createdName (arrayCreatorRest | classCreatorRest)
    ;
// Java8
lambdaExpression
    : lambdaParameters '->' lambdaBody
    ;
typeArguments
    : '<' typeArgument (',' typeArgument)* '>'
    ;
classType
    : (classOrInterfaceType '.')? annotation* IDENTIFIER typeArguments?
    ;
typeBound
    : typeType ('&' typeType)*
    ;
block
    : '{' blockStatement* '}'
    ;
modifier
    : classOrInterfaceModifier
    | NATIVE
    | SYNCHRONIZED
    | TRANSIENT
    | VOLATILE
    ;
memberDeclaration
    : methodDeclaration
    | genericMethodDeclaration
    | fieldDeclaration
    | constructorDeclaration
    | genericConstructorDeclaration
    | interfaceDeclaration
    | annotationTypeDeclaration
    | classDeclaration
    | enumDeclaration
    ;
arguments
    : '(' expressionList? ')'
    ;
interfaceMemberDeclaration
    : constDeclaration
    | interfaceMethodDeclaration
    | genericInterfaceMethodDeclaration
    | interfaceDeclaration
    | annotationTypeDeclaration
    | classDeclaration
    | enumDeclaration
    ;
annotationTypeElementRest
    : typeType annotationMethodOrConstantRest ';'
    | classDeclaration ';'?
    | interfaceDeclaration ';'?
    | enumDeclaration ';'?
    | annotationTypeDeclaration ';'?
    ;

3.7 第7层

/*
     7 literal typeTypeOrVoid explicitGenericInvocationSuffix expressionList nonWildcardTypeArgumentsOrDiamond
       createdName classCreatorRest arrayCreatorRest lambdaParameters lambdaBody typeArgument blockStatement
       methodDeclaration genericMethodDeclaration fieldDeclaration constructorDeclaration genericConstructorDeclaration
       constDeclaration interfaceMethodDeclaration genericInterfaceMethodDeclaration annotationMethodOrConstantRest
*/
literal
    : integerLiteral
    | floatLiteral
    | CHAR_LITERAL
    | STRING_LITERAL
    | BOOL_LITERAL
    | NULL_LITERAL
    ;
typeTypeOrVoid
    : typeType
    | VOID
    ;
explicitGenericInvocationSuffix
    : SUPER superSuffix
    | IDENTIFIER arguments
    ;
expressionList
    : expression (',' expression)*
    ;
nonWildcardTypeArgumentsOrDiamond
    : '<' '>'
    | nonWildcardTypeArguments
    ;
createdName
    : IDENTIFIER typeArgumentsOrDiamond? ('.' IDENTIFIER typeArgumentsOrDiamond?)*
    | primitiveType
    ;
classCreatorRest
    : arguments classBody?
    ;
arrayCreatorRest
    : '[' (']' ('[' ']')* arrayInitializer | expression ']' ('[' expression ']')* ('[' ']')*)
    ;
// Java8
lambdaParameters
    : IDENTIFIER
    | '(' formalParameterList? ')'
    | '(' IDENTIFIER (',' IDENTIFIER)* ')'
    ;
// Java8
lambdaBody
    : expression
    | block
    ;
typeArgument
    : typeType
    | '?' ((EXTENDS | SUPER) typeType)?
    ;
blockStatement
    : localVariableDeclaration ';'
    | statement
    | localTypeDeclaration
    ;
methodDeclaration
    : typeTypeOrVoid IDENTIFIER formalParameters ('[' ']')*
      (THROWS qualifiedNameList)?
      methodBody
    ;
genericMethodDeclaration
    : typeParameters methodDeclaration
    ;
fieldDeclaration
    : typeType variableDeclarators ';'
    ;
constructorDeclaration
    : IDENTIFIER formalParameters (THROWS qualifiedNameList)? constructorBody=block
    ;
genericConstructorDeclaration
    : typeParameters constructorDeclaration
    ;
constDeclaration
    : typeType constantDeclarator (',' constantDeclarator)* ';'
    ;
interfaceMethodDeclaration
    : interfaceMethodModifier* (typeTypeOrVoid | typeParameters annotation* typeTypeOrVoid)
      IDENTIFIER formalParameters ('[' ']')* (THROWS qualifiedNameList)? methodBody
    ;
genericInterfaceMethodDeclaration
    : typeParameters interfaceMethodDeclaration
    ;
annotationMethodOrConstantRest
    : annotationMethodRest
    | annotationConstantRest
    ;

3.8 第8层

/*
     8 integerLiteral floatLiteral typeArgumentsOrDiamond arrayInitializer formalParameterList localVariableDeclaration
       statement localTypeDeclaration formalParameters qualifiedNameList methodBody variableDeclarators
       constantDeclarator interfaceMethodModifier annotationMethodRest annotationConstantRest
*/
integerLiteral
    : DECIMAL_LITERAL
    | HEX_LITERAL
    | OCT_LITERAL
    | BINARY_LITERAL
    ;
floatLiteral
    : FLOAT_LITERAL
    | HEX_FLOAT_LITERAL
    ;
typeArgumentsOrDiamond
    : '<' '>'
    | typeArguments
    ;
arrayInitializer
    : '{' (variableInitializer (',' variableInitializer)* (',')? )? '}'
    ;
formalParameterList
    : formalParameter (',' formalParameter)* (',' lastFormalParameter)?
    | lastFormalParameter
    ;
localVariableDeclaration
    : variableModifier* typeType variableDeclarators
    ;
statement
    : blockLabel=block
    | ASSERT expression (':' expression)? ';'
    | IF parExpression statement (ELSE statement)?
    | WHILE parExpression statement
    | DO statement WHILE parExpression ';'
    | FOR '(' forControl ')' statement
    | TRY block (catchClause+ finallyBlock? | finallyBlock)
    | TRY resourceSpecification block catchClause* finallyBlock?
    | SWITCH parExpression '{' switchBlockStatementGroup* switchLabel* '}'
    | SYNCHRONIZED parExpression block
    | RETURN expression? ';'
    | THROW expression ';'
    | BREAK IDENTIFIER? ';'
    | CONTINUE IDENTIFIER? ';'
    | SEMI
    | statementExpression=expression ';'
    | identifierLabel=IDENTIFIER ':' statement
    ;
localTypeDeclaration
    : classOrInterfaceModifier*
      (classDeclaration | interfaceDeclaration)
    | ';'
    ;
formalParameters
    : '(' formalParameterList? ')'
    ;
qualifiedNameList
    : qualifiedName (',' qualifiedName)*
    ;
methodBody
    : block
    | ';'
    ;
variableDeclarators
    : variableDeclarator (',' variableDeclarator)*
    ;
constantDeclarator
    : IDENTIFIER ('[' ']')* '=' variableInitializer
    ;
// Java8
interfaceMethodModifier
    : annotation
    | PUBLIC
    | ABSTRACT
    | DEFAULT
    | STATIC
    | STRICTFP
    ;
annotationMethodRest
    : IDENTIFIER '(' ')' defaultValue?
    ;
annotationConstantRest
    : variableDeclarators
    ;

3.9 第9层

/*
     9 formalParameter lastFormalParameter variableModifier parExpression catchClause forControl finallyBlock
       resourceSpecification switchBlockStatementGroup switchLabel variableDeclarator variableInitializer defaultValue
*/
formalParameter
    : variableModifier* typeType variableDeclaratorId
    ;
lastFormalParameter
    : variableModifier* typeType '...' variableDeclaratorId
    ;
variableModifier
    : annotation
    | FINAL
    ;
parExpression
    : '(' expression ')'
    ;
catchClause
    : CATCH '(' variableModifier* catchType IDENTIFIER ')' block
    ;
forControl
    : enhancedForControl
    | forInit? ';' expression? ';' forUpdate=expressionList?
    ;
finallyBlock
    : FINALLY block
    ;
resourceSpecification
    : '(' resources ';'? ')'
    ;
switchBlockStatementGroup
    : switchLabel+ blockStatement+
    ;
switchLabel
    : CASE (constantExpression=expression | enumConstantName=IDENTIFIER) ':'
    | DEFAULT ':'
    ;
variableDeclarator
    : variableDeclaratorId ('=' variableInitializer)?
    ;
variableInitializer
    : arrayInitializer
    | expression
    ;
defaultValue
    : DEFAULT elementValue
    ;

3.10 第10层

/*
     10 variableDeclaratorId catchType enhancedForControl forInit resources
*/
variableDeclaratorId
    : IDENTIFIER ('[' ']')*
    ;
catchType
    : qualifiedName ('|' qualifiedName)*
    ;
enhancedForControl
    : variableModifier* typeType variableDeclaratorId ':' expression
    ;
forInit
    : localVariableDeclaration
    | expressionList
    ;
resources
    : resource (';' resource)*
    ;

3.11 第11层

/*
     11 resource
*/
resource
    : variableModifier* classOrInterfaceType variableDeclaratorId '=' expression
    ;