1 概述
本文适用
Java 8及以下版本,以下称Java
2 Java 词法规则(lexer grammar)
笔者将以自下而上的顺序进行词法规则(lexer grammar) 的介绍
2.1 关键字(keywords)
以下是 Java 中关键字/保留字的词法规则:
ABSTRACT: 'abstract'; ASSERT: 'assert'; BOOLEAN: 'boolean'; BREAK: 'break';
BYTE: 'byte'; CASE: 'case'; CATCH: 'catch'; CHAR: 'char';
CLASS: 'class'; CONST: 'const'; CONTINUE: 'continue'; DEFAULT: 'default';
DO: 'do'; DOUBLE: 'double'; LSE: 'else'; ENUM: 'enum';
EXTENDS: 'extends'; FINAL: 'final'; FINALLY: 'finally'; FLOAT: 'float';
FOR: 'for'; IF: 'if'; GOTO: 'goto'; IMPLEMENTS: 'implements';
IMPORT: 'import'; INSTANCEOF: 'instanceof'; INT: 'int'; INTERFACE: 'interface';
LONG: 'long'; NATIVE: 'native'; NEW: 'new'; PACKAGE: 'package';
PRIVATE: 'private'; PROTECTED: 'protected'; PUBLIC: 'public'; RETURN: 'return';
SHORT: 'short'; STATIC: 'static'; STRICTFP: 'strictfp'; SUPER: 'super';
SWITCH: 'switch'; SYNCHRONIZED: 'synchronized'; THIS: 'this'; THROW: 'throw';
THROWS: 'throws'; TRANSIENT: 'transient'; TRY: 'try'; VOID: 'void';
VOLATILE: 'volatile'; WHILE: 'while';
2.2 片段规则(fragment rule)
片段(fragment)规则只能为其他词法规则提供基础,而不参与到语法规则的解析中
数字(Digits)、指数部分(ExponentPart):
fragment Digits
: [0-9] ([0-9_]* [0-9])?
;
fragment ExponentPart
: [eE] [+-]? Digits
;
字母(Letter)、字母或数字(LetterOrDigit):
fragment Letter
: [a-zA-Z$_]
| ~[\u0000-\u007F\uD800-\uDBFF]
| [\uD800-\uDBFF] [\uDC00-\uDFFF]
;
fragment LetterOrDigit
: Letter
| [0-9]
;
十六进制数字(HexDigits、HexDigit)、转义序列(EscapeSequence):
fragment HexDigit
: [0-9a-fA-F]
;
fragment HexDigits
: HexDigit ((HexDigit | '_')* HexDigit)?
;
fragment EscapeSequence
: '\' [btnfr"'\]
| '\' ([0-3]? [0-7])? [0-7]
| '\' 'u'+ HexDigit HexDigit HexDigit HexDigit
;
2.3 字面量(literals)
十进制字面量:
DECIMAL_LITERAL: ('0' | [1-9] (Digits? | '_'+ Digits)) [lL]?;
十六进制字面量:
HEX_LITERAL: '0' [xX] [0-9a-fA-F] ([0-9a-fA-F_]* [0-9a-fA-F])? [lL]?;
八进制字面量:
OCT_LITERAL: '0' '_'* [0-7] ([0-7_]* [0-7])? [lL]?;
二进制字面量:
BINARY_LITERAL: '0' [bB] [01] ([01_]* [01])? [lL]?;
浮点数字面量:
FLOAT_LITERAL: (Digits '.' Digits? | '.' Digits) ExponentPart? [fFdD]?
| Digits (ExponentPart [fFdD]? | [fFdD])
;
十六进制浮点数字面量:
HEX_FLOAT_LITERAL: '0' [xX] (HexDigits '.'? | HexDigits? '.' HexDigits) [pP] [+-]? Digits [fFdD]?;
布尔字面量:
BOOL_LITERAL: 'true'
| 'false'
;
字符字面量:
CHAR_LITERAL: ''' (~['\\r\n] | EscapeSequence) ''';
字符串字面量:
STRING_LITERAL: '"' (~["\\r\n] | EscapeSequence)* '"';
null字面量:
NULL_LITERAL: 'null';
2.4 分隔符(separators)
以下是对分隔符的定义:
LPAREN: '('; RPAREN: ')';
LBRACE: '{'; RBRACE: '}';
LBRACK: '['; RBRACK: ']';
SEMI: ';'; COMMA: ',';
DOT: '.';
2.5 操作符(operators)
以下是对操作符的定义:
ASSIGN: '='; LT: '<'; GT: '>'; BANG: '!';
EQUAL: '=='; LE: '<='; GE: '>='; NOTEQUAL: '!=';
AND: '&&'; OR: '||'; BITAND: '&'; BITOR: '|';
ADD: '+'; SUB: '-'; MUL: '*'; DIV: '/';
INC: '++'; DEC: '--'; CARET: '^'; MOD: '%';
TILDE: '~'; QUESTION: '?'; COLON: ':';
// 委派(assign)
ADD_ASSIGN: '+='; SUB_ASSIGN: '-='; MUL_ASSIGN: '*='; DIV_ASSIGN: '/=';
AND_ASSIGN: '&='; OR_ASSIGN: '|='; XOR_ASSIGN: '^='; MOD_ASSIGN: '%=';
LSHIFT_ASSIGN: '<<='; RSHIFT_ASSIGN: '>>='; URSHIFT_ASSIGN: '>>>=';
// Java 8 中的词法
ARROW: '->'; COLONCOLON: '::';
// 词法规范中未定义的附加符号
AT: '@'; ELLIPSIS: '...';
// 空白字符和注释
WS: [ \t\r\n\u000C]+ -> channel(HIDDEN);
COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
LINE_COMMENT: '//' ~[\r\n]* -> channel(HIDDEN);
// 标识符
IDENTIFIER: Letter LetterOrDigit*;
以上便是Java的词法规则,接下来将基于词法规则介绍语法规则
3 Java 语法规则(parser grammar)
笔者将以广度优先搜索的顺序进行语法规则(parser grammar)的列举。经过梳理,一共11层,以下是各层中的语法规则:
1 compilationUnit
2 packageDeclaration importDeclaration typeDeclaration
3 annotation qualifiedName classOrInterfaceModifier classDeclaration enumDeclaration interfaceDeclaration
annotationTypeDeclaration
4 altAnnotationQualifiedName elementValuePairs elementValue typeParameters typeType typeList classBody
enumConstants enumBodyDeclarations interfaceBody annotationTypeBody
5 elementValuePair expression elementValueArrayInitializer typeParameter classOrInterfaceType primitiveType
classBodyDeclaration enumConstant interfaceBodyDeclaration annotationTypeElementDeclaration
6 elementValue primary methodCall nonWildcardTypeArguments innerCreator superSuffix explicitGenericInvocation
creator lambdaExpression typeArguments classType typeBound block modifier memberDeclaration arguments
interfaceMemberDeclaration annotationTypeElementRest
7 literal typeTypeOrVoid explicitGenericInvocationSuffix expressionList nonWildcardTypeArgumentsOrDiamond
createdName classCreatorRest arrayCreatorRest lambdaParameters lambdaBody typeArgument blockStatement
methodDeclaration genericMethodDeclaration fieldDeclaration constructorDeclaration genericConstructorDeclaration
constDeclaration interfaceMethodDeclaration genericInterfaceMethodDeclaration annotationMethodOrConstantRest
8 integerLiteral floatLiteral typeArgumentsOrDiamond arrayInitializer formalParameterList localVariableDeclaration
statement localTypeDeclaration formalParameters qualifiedNameList methodBody variableDeclarators
constantDeclarator interfaceMethodModifier annotationMethodRest annotationConstantRest
9 formalParameter lastFormalParameter variableModifier parExpression catchClause forControl finallyBlock
resourceSpecification switchBlockStatementGroup switchLabel variableDeclarator variableInitializer defaultValue
10 variableDeclaratorId catchType enhancedForControl forInit resources
11 resource
3.1 第1层
/*
1 compilationUnit
*/
compilationUnit
: packageDeclaration? importDeclaration* typeDeclaration* EOF
;
3.2 第2层
/*
2 packageDeclaration importDeclaration typeDeclaration
*/
packageDeclaration
: annotation* PACKAGE qualifiedName ';'
;
importDeclaration
: IMPORT STATIC? qualifiedName ('.' '*')? ';'
;
typeDeclaration
: classOrInterfaceModifier*
(classDeclaration | enumDeclaration | interfaceDeclaration | annotationTypeDeclaration)
| ';'
;
3.3 第3层
/*
3 annotation qualifiedName classOrInterfaceModifier classDeclaration enumDeclaration interfaceDeclaration
annotationTypeDeclaration
*/
annotation
: ('@' qualifiedName | altAnnotationQualifiedName) ('(' ( elementValuePairs | elementValue )? ')')?
;
qualifiedName
: IDENTIFIER ('.' IDENTIFIER)*
;
classOrInterfaceModifier
: annotation
| PUBLIC
| PROTECTED
| PRIVATE
| STATIC
| ABSTRACT
| FINAL
| STRICTFP
;
classDeclaration
: CLASS IDENTIFIER typeParameters?
(EXTENDS typeType)?
(IMPLEMENTS typeList)?
classBody
;
enumDeclaration
: ENUM IDENTIFIER (IMPLEMENTS typeList)? '{' enumConstants? ','? enumBodyDeclarations? '}'
;
interfaceDeclaration
: INTERFACE IDENTIFIER typeParameters? (EXTENDS typeList)? interfaceBody
;
annotationTypeDeclaration
: '@' INTERFACE IDENTIFIER annotationTypeBody
;
3.4 第4层
/*
4 altAnnotationQualifiedName elementValuePairs elementValue typeParameters typeType typeList classBody
enumConstants enumBodyDeclarations interfaceBody annotationTypeBody
*/
altAnnotationQualifiedName
: (IDENTIFIER DOT)* '@' IDENTIFIER
;
elementValuePairs
: elementValuePair (',' elementValuePair)*
;
typeParameters
: '<' typeParameter (',' typeParameter)* '>'
;
typeType
: annotation? (classOrInterfaceType | primitiveType) ('[' ']')*
;
typeList
: typeType (',' typeType)*
;
classBody
: '{' classBodyDeclaration* '}'
;
enumConstants
: enumConstant (',' enumConstant)*
;
enumBodyDeclarations
: ';' classBodyDeclaration*
;
interfaceBody
: '{' interfaceBodyDeclaration* '}'
;
annotationTypeBody
: '{' (annotationTypeElementDeclaration)* '}'
;
3.5 第5层
/*
5 elementValuePair expression elementValueArrayInitializer typeParameter classOrInterfaceType primitiveType
classBodyDeclaration enumConstant interfaceBodyDeclaration annotationTypeElementDeclaration
*/
elementValuePair
: IDENTIFIER '=' elementValue
;
expression
: primary
| expression bop='.'
( IDENTIFIER
| methodCall
| THIS
| NEW nonWildcardTypeArguments? innerCreator
| SUPER superSuffix
| explicitGenericInvocation
)
| expression '[' expression ']'
| methodCall
| NEW creator
| '(' typeType ')' expression
| expression postfix=('++' | '--')
| prefix=('+'|'-'|'++'|'--') expression
| prefix=('~'|'!') expression
| expression bop=('*'|'/'|'%') expression
| expression bop=('+'|'-') expression
| expression ('<' '<' | '>' '>' '>' | '>' '>') expression
| expression bop=('<=' | '>=' | '>' | '<') expression
| expression bop=INSTANCEOF typeType
| expression bop=('==' | '!=') expression
| expression bop='&' expression
| expression bop='^' expression
| expression bop='|' expression
| expression bop='&&' expression
| expression bop='||' expression
| expression bop='?' expression ':' expression
| expression
bop=('=' | '+=' | '-=' | '*=' | '/=' | '&=' | '|=' | '^=' | '>>=' | '>>>=' | '<<=' | '%=')
expression
| lambdaExpression // Java8
// Java 8 methodReference
| expression '::' typeArguments? IDENTIFIER
| typeType '::' (typeArguments? IDENTIFIER | NEW)
| classType '::' typeArguments? NEW
;
elementValueArrayInitializer
: '{' (elementValue (',' elementValue)*)? (',')? '}'
;
typeParameter
: annotation* IDENTIFIER (EXTENDS typeBound)?
;
classOrInterfaceType
: IDENTIFIER typeArguments? ('.' IDENTIFIER typeArguments?)*
;
primitiveType
: BOOLEAN
| CHAR
| BYTE
| SHORT
| INT
| LONG
| FLOAT
| DOUBLE
;
classBodyDeclaration
: ';'
| STATIC? block
| modifier* memberDeclaration
;
enumConstant
: annotation* IDENTIFIER arguments? classBody?
;
interfaceBodyDeclaration
: modifier* interfaceMemberDeclaration
| ';'
;
annotationTypeElementDeclaration
: modifier* annotationTypeElementRest
| ';'
;
3.6 第6层
/*
6 elementValue primary methodCall nonWildcardTypeArguments innerCreator superSuffix explicitGenericInvocation
creator lambdaExpression typeArguments classType typeBound block modifier memberDeclaration arguments
interfaceMemberDeclaration annotationTypeElementRest
*/
elementValue
: expression
| annotation
| elementValueArrayInitializer
;
primary
: '(' expression ')'
| THIS
| SUPER
| literal
| IDENTIFIER
| typeTypeOrVoid '.' CLASS
| nonWildcardTypeArguments (explicitGenericInvocationSuffix | THIS arguments)
;
methodCall
: IDENTIFIER '(' expressionList? ')'
| THIS '(' expressionList? ')'
| SUPER '(' expressionList? ')'
;
nonWildcardTypeArguments
: '<' typeList '>'
;
innerCreator
: IDENTIFIER nonWildcardTypeArgumentsOrDiamond? classCreatorRest
;
superSuffix
: arguments
| '.' IDENTIFIER arguments?
;
explicitGenericInvocation
: nonWildcardTypeArguments explicitGenericInvocationSuffix
;
creator
: nonWildcardTypeArguments createdName classCreatorRest
| createdName (arrayCreatorRest | classCreatorRest)
;
// Java8
lambdaExpression
: lambdaParameters '->' lambdaBody
;
typeArguments
: '<' typeArgument (',' typeArgument)* '>'
;
classType
: (classOrInterfaceType '.')? annotation* IDENTIFIER typeArguments?
;
typeBound
: typeType ('&' typeType)*
;
block
: '{' blockStatement* '}'
;
modifier
: classOrInterfaceModifier
| NATIVE
| SYNCHRONIZED
| TRANSIENT
| VOLATILE
;
memberDeclaration
: methodDeclaration
| genericMethodDeclaration
| fieldDeclaration
| constructorDeclaration
| genericConstructorDeclaration
| interfaceDeclaration
| annotationTypeDeclaration
| classDeclaration
| enumDeclaration
;
arguments
: '(' expressionList? ')'
;
interfaceMemberDeclaration
: constDeclaration
| interfaceMethodDeclaration
| genericInterfaceMethodDeclaration
| interfaceDeclaration
| annotationTypeDeclaration
| classDeclaration
| enumDeclaration
;
annotationTypeElementRest
: typeType annotationMethodOrConstantRest ';'
| classDeclaration ';'?
| interfaceDeclaration ';'?
| enumDeclaration ';'?
| annotationTypeDeclaration ';'?
;
3.7 第7层
/*
7 literal typeTypeOrVoid explicitGenericInvocationSuffix expressionList nonWildcardTypeArgumentsOrDiamond
createdName classCreatorRest arrayCreatorRest lambdaParameters lambdaBody typeArgument blockStatement
methodDeclaration genericMethodDeclaration fieldDeclaration constructorDeclaration genericConstructorDeclaration
constDeclaration interfaceMethodDeclaration genericInterfaceMethodDeclaration annotationMethodOrConstantRest
*/
literal
: integerLiteral
| floatLiteral
| CHAR_LITERAL
| STRING_LITERAL
| BOOL_LITERAL
| NULL_LITERAL
;
typeTypeOrVoid
: typeType
| VOID
;
explicitGenericInvocationSuffix
: SUPER superSuffix
| IDENTIFIER arguments
;
expressionList
: expression (',' expression)*
;
nonWildcardTypeArgumentsOrDiamond
: '<' '>'
| nonWildcardTypeArguments
;
createdName
: IDENTIFIER typeArgumentsOrDiamond? ('.' IDENTIFIER typeArgumentsOrDiamond?)*
| primitiveType
;
classCreatorRest
: arguments classBody?
;
arrayCreatorRest
: '[' (']' ('[' ']')* arrayInitializer | expression ']' ('[' expression ']')* ('[' ']')*)
;
// Java8
lambdaParameters
: IDENTIFIER
| '(' formalParameterList? ')'
| '(' IDENTIFIER (',' IDENTIFIER)* ')'
;
// Java8
lambdaBody
: expression
| block
;
typeArgument
: typeType
| '?' ((EXTENDS | SUPER) typeType)?
;
blockStatement
: localVariableDeclaration ';'
| statement
| localTypeDeclaration
;
methodDeclaration
: typeTypeOrVoid IDENTIFIER formalParameters ('[' ']')*
(THROWS qualifiedNameList)?
methodBody
;
genericMethodDeclaration
: typeParameters methodDeclaration
;
fieldDeclaration
: typeType variableDeclarators ';'
;
constructorDeclaration
: IDENTIFIER formalParameters (THROWS qualifiedNameList)? constructorBody=block
;
genericConstructorDeclaration
: typeParameters constructorDeclaration
;
constDeclaration
: typeType constantDeclarator (',' constantDeclarator)* ';'
;
interfaceMethodDeclaration
: interfaceMethodModifier* (typeTypeOrVoid | typeParameters annotation* typeTypeOrVoid)
IDENTIFIER formalParameters ('[' ']')* (THROWS qualifiedNameList)? methodBody
;
genericInterfaceMethodDeclaration
: typeParameters interfaceMethodDeclaration
;
annotationMethodOrConstantRest
: annotationMethodRest
| annotationConstantRest
;
3.8 第8层
/*
8 integerLiteral floatLiteral typeArgumentsOrDiamond arrayInitializer formalParameterList localVariableDeclaration
statement localTypeDeclaration formalParameters qualifiedNameList methodBody variableDeclarators
constantDeclarator interfaceMethodModifier annotationMethodRest annotationConstantRest
*/
integerLiteral
: DECIMAL_LITERAL
| HEX_LITERAL
| OCT_LITERAL
| BINARY_LITERAL
;
floatLiteral
: FLOAT_LITERAL
| HEX_FLOAT_LITERAL
;
typeArgumentsOrDiamond
: '<' '>'
| typeArguments
;
arrayInitializer
: '{' (variableInitializer (',' variableInitializer)* (',')? )? '}'
;
formalParameterList
: formalParameter (',' formalParameter)* (',' lastFormalParameter)?
| lastFormalParameter
;
localVariableDeclaration
: variableModifier* typeType variableDeclarators
;
statement
: blockLabel=block
| ASSERT expression (':' expression)? ';'
| IF parExpression statement (ELSE statement)?
| WHILE parExpression statement
| DO statement WHILE parExpression ';'
| FOR '(' forControl ')' statement
| TRY block (catchClause+ finallyBlock? | finallyBlock)
| TRY resourceSpecification block catchClause* finallyBlock?
| SWITCH parExpression '{' switchBlockStatementGroup* switchLabel* '}'
| SYNCHRONIZED parExpression block
| RETURN expression? ';'
| THROW expression ';'
| BREAK IDENTIFIER? ';'
| CONTINUE IDENTIFIER? ';'
| SEMI
| statementExpression=expression ';'
| identifierLabel=IDENTIFIER ':' statement
;
localTypeDeclaration
: classOrInterfaceModifier*
(classDeclaration | interfaceDeclaration)
| ';'
;
formalParameters
: '(' formalParameterList? ')'
;
qualifiedNameList
: qualifiedName (',' qualifiedName)*
;
methodBody
: block
| ';'
;
variableDeclarators
: variableDeclarator (',' variableDeclarator)*
;
constantDeclarator
: IDENTIFIER ('[' ']')* '=' variableInitializer
;
// Java8
interfaceMethodModifier
: annotation
| PUBLIC
| ABSTRACT
| DEFAULT
| STATIC
| STRICTFP
;
annotationMethodRest
: IDENTIFIER '(' ')' defaultValue?
;
annotationConstantRest
: variableDeclarators
;
3.9 第9层
/*
9 formalParameter lastFormalParameter variableModifier parExpression catchClause forControl finallyBlock
resourceSpecification switchBlockStatementGroup switchLabel variableDeclarator variableInitializer defaultValue
*/
formalParameter
: variableModifier* typeType variableDeclaratorId
;
lastFormalParameter
: variableModifier* typeType '...' variableDeclaratorId
;
variableModifier
: annotation
| FINAL
;
parExpression
: '(' expression ')'
;
catchClause
: CATCH '(' variableModifier* catchType IDENTIFIER ')' block
;
forControl
: enhancedForControl
| forInit? ';' expression? ';' forUpdate=expressionList?
;
finallyBlock
: FINALLY block
;
resourceSpecification
: '(' resources ';'? ')'
;
switchBlockStatementGroup
: switchLabel+ blockStatement+
;
switchLabel
: CASE (constantExpression=expression | enumConstantName=IDENTIFIER) ':'
| DEFAULT ':'
;
variableDeclarator
: variableDeclaratorId ('=' variableInitializer)?
;
variableInitializer
: arrayInitializer
| expression
;
defaultValue
: DEFAULT elementValue
;
3.10 第10层
/*
10 variableDeclaratorId catchType enhancedForControl forInit resources
*/
variableDeclaratorId
: IDENTIFIER ('[' ']')*
;
catchType
: qualifiedName ('|' qualifiedName)*
;
enhancedForControl
: variableModifier* typeType variableDeclaratorId ':' expression
;
forInit
: localVariableDeclaration
| expressionList
;
resources
: resource (';' resource)*
;
3.11 第11层
/*
11 resource
*/
resource
: variableModifier* classOrInterfaceType variableDeclaratorId '=' expression
;