Java语言规范第3章 词法结构2

108 阅读52分钟

3.6. White Space

White space is defined as the ASCII space character, horizontal tab character, form feed character, and line terminator characters (§3.4). 空格定义为 ASCII 空格字符、水平制表符、表单进纸字符和行终止符字符 (§3.4)。

WhiteSpace:

the ASCII SP character, also known as "space" the ASCII HT character, also known as "horizontal tab" the ASCII FF character, also known as "form feed" LineTerminator

总结

  • 关于空格的定义是宽泛的:ASCII 空格字符、水平制表符、表单进纸字符和行终止符字符。这些都属于空格

3.7. Comments

There are two kinds of comments: 有两种评论:

  • /* text */

    A traditional comment: all the text from the ASCII characters /* to the ASCII characters */ is ignored (as in C and C++). 传统注释:从 ASCII 字符 /*到 ASCII 字符 */ 的所有文本都将被忽略(如 C 和 C++)。

  • // text

    An end-of-line comment: all the text from the ASCII characters /``/ to the end of the line is ignored (as in C++). 行尾注释:从 ASCII 字符 //到行尾的所有文本都将被忽略(如 C++ 所示)。

Comment:

TraditionalComment EndOfLineComment

TraditionalComment:

/ * CommentTail

CommentTail:

* CommentTailStar NotStar CommentTail

CommentTailStar:

/ * CommentTailStar NotStarNotSlash CommentTail

NotStar:

InputCharacter but not * LineTerminator

NotStarNotSlash:

InputCharacter but not * or / LineTerminator

EndOfLineComment:

// {InputCharacter}

These productions imply all of the following properties: 这些产品意味着以下所有属性:

  • Comments do not nest. 注释不嵌套。
  • /*and */ have no special meaning in comments that begin with //. /*和 */ 在以 // 开头的注释中没有特殊含义。
  • /``/ has no special meaning in comments that begin with /``* or /``*``*. / / 在以 / */ * * 开头的注释中没有特殊含义。

As a result, the following text is a single complete comment: 因此,以下文本是一个完整的注释:

/* this comment /* // /** ends here: */

The lexical grammar implies that comments do not occur within character literals, string literals, or text blocks (§3.10.4, §3.10.5, §3.10.6). 词汇语法意味着注释不会出现在字符文字、字符串文字或文本块中 ( §3.10.4 , §3.10.5 , §3.10.6 )。

总结

  • 注释分为两种多行注释以及单行注释
    • 多行注释:从/* 到 */结束的多行文本都是注释
    • 单行注释://之后的单行文本都是注释,以行终止符号结束
  • 注释规则
    • 注释表达之间不嵌套
    • 多行注释表示出现在单行注释的文本区域视为普通字符文本处理
    • 单行注释表示出现在多行注释的文本区域视为普通字符文本处理

3.8. Identifiers

An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. 标识符是 Java 字母和 Java 数字的无限长度序列,其中第一个必须是 Java 字母。

Identifier:

IdentifierChars but not a ReservedKeyword or BooleanLiteral or NullLiteral

IdentifierChars:

JavaLetter {JavaLetterOrDigit}

JavaLetter:

any Unicode character that is a "Java letter"

JavaLetterOrDigit:

any Unicode character that is a "Java letter-or-digit"

A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true. “Java letter”是方法 Character.isJavaIdentifierStart(int) 返回 true 的字符。

A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true. “Java letter-or-digit” 是方法 Character.isJavaIdentifierPart(int) 返回 true 的字符。

The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII dollar sign ($, or \u0024) and underscore (_, or \u005f). The dollar sign should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems. The underscore may be used in identifiers formed of two or more characters, but it cannot be used as a one-character identifier due to being a keyword. “Java 字母”包括大写和小写的 ASCII 拉丁字母 A-Z\u0041-\u005a ) 和 a-z\u0061-\u007a ),以及由于历史原因,ASCII 美元符号 ( $\u0024 ) 和下划线 ( _\u005f )。美元符号只能用于机械生成的源代码,或者很少用于访问遗留系统上预先存在的名称。下划线可用于由两个或多个字符组成的标识符,但由于是关键字,它不能用作单字符标识符。

The "Java digits" include the ASCII digits 0-9 (\u0030-\u0039). “Java 数字”包括 ASCII 数字 0-9\u0030-\u0039 )。

Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages. 字母和数字可以从整个 Unicode 字符集中提取,该字符集支持当今世界上使用的大多数书写脚本,包括中文、日文和韩文的大型字符集。这允许程序员在用其母语编写的程序中标识符。

Two identifiers are the same only if, after ignoring characters that are ignorable, the identifiers have the same Unicode character for each letter or digit. An ignorable character is a character for which the method Character.isIdentifierIgnorable(int) returns true. Identifiers that have the same external appearance may yet be different. 仅当忽略可忽略的字符后,标识符只有在每个字母或数字具有相同的 Unicode 字符编码时,两个标识符才相同。可忽略字符是方法 Character.isIdentifierIgnorable(int) 返回 true 的字符。具有相同外观的标识符可能不是相同的。

For example, the identifiers consisting of the single letters LATIN CAPITAL LETTER A (A, \u0041), LATIN SMALL LETTER A (a, \u0061), GREEK CAPITAL LETTER ALPHA (A, \u0391), CYRILLIC SMALL LETTER A (a, \u0430) and MATHEMATICAL BOLD ITALIC SMALL A (a, \ud835\udc82) are all different. 例如,由单个字母组成的标识符 LATIN CAPITAL LETTER A ( A\u0041 )、LATIN SMALL LETTER A ( a\u0061 )、希腊文大写字母 ALPHA ( A\u0391 )、西里尔文小写字母 A ( a\u0430 ) 和 MATHEMATICAL BOLD ITALIC SMALL A ( a\ud835\udc82 ) 都是不同的。

Unicode composite characters are different from their canonical equivalent decomposed characters. For example, a LATIN CAPITAL LETTER A ACUTE (Á, \u00c1) is different from a LATIN CAPITAL LETTER A (A, \u0041) immediately followed by a NON-SPACING ACUTE (´, \u0301) in identifiers. See The Unicode Standard, Section 3.11 "Normalization Forms". Unicode 复合字符不同于其规范等效的分解字符。例如,拉丁文大写字母 A 急性 ( Á\u00c1 ) 不同于拉丁文大写字母 A ( A\u0041 ) ,在标识符中紧跟非空格急性 ( ´\u0301 )。请参阅 Unicode 标准第 3.11 节 “规范化形式”。

Examples of identifiers are:

  • String
  • i3
  • αρετη
  • MAX_VALUE
  • isLetterOrDigit

An identifier never has the same spelling (Unicode character sequence) as a reserved keyword (§3.9), a boolean literal (§3.10.3) or the null literal (§3.10.8), due to the rules of tokenization (§3.5). However, an identifier may have the same spelling as a contextual keyword, because the tokenization of a sequence of input characters as an identifier or a contextual keyword depends on where the sequence appears in the program. 由于标记化规则 (§3.5),标识符永远不会具有与保留关键字 (§3.9)、布尔字面(§3.10.3)或空字面(§3.10.8)相同的拼写(Unicode 字符序列)。但是,标识符可能与上下文关键字具有相同的拼写,因为将输入字符序列标记为标识符或上下文关键字取决于序列在程序中的显示位置。

To facilitate the recognition of contextual keywords, the syntactic grammar (§2.3) sometimes disallows certain identifiers by defining a production to accept only a subset of identifiers. The subsets are as follows: 为了便于识别上下文关键字,句法语法 ( §2.3 ) 有时通过定义产品以仅接受标识符的子集来禁止某些标识符。子集如下:

TypeIdentifier:

Identifier but not permits, record, sealed, var, or yield

UnqualifiedMethodIdentifier:

Identifier but not yield

TypeIdentifier is used in the declaration of classes, interfaces, and type parameters (§8.1, §9.1, §4.4), and when referring to types (§6.5). For example, the name of a class must be a TypeIdentifier, so it is illegal to declare a class named permits, record, sealed, var, or yield. TypeIdentifier 用于类、接口和类型参数的声明 ( §8.1 , §9.1 , §4.4 ) ,以及引用类型 ( §6.5 )。例如,类的名称必须是 TypeIdentifier,因此声明名为 permitsrecordsealedvaryield 的类是非法的。

UnqualifiedMethodIdentifier is used when a method invocation expression refers to a method by its simple name (§6.5.7.1). Since the term yield is excluded from UnqualifiedMethodIdentifier, any invocation of a method named yield must be qualified, thus distinguishing the invocation from a yield statement (§14.21). 当方法调用表达式通过方法的简单名称 (§6.5.7.1) 引用方法时,使用 UnqualifiedMethodIdentifier。由于术语 yield 已从 UnqualifiedMethodIdentifier 中排除,因此对名为 yield 的方法的任何调用都必须进行限定,从而将调用与 yield 语句区分开来 ( §14.21 )。

总结

  • 标识符是第一个字符是java字母的由Java字母和Java数字组成的无限长度的字符序列

  • 标识符不包含保留关键字、布尔字面量以及null字面量的字符序列

  • 标识符可以是上下文关键字,对于不同类型标识符通过定义子集的方式排除特定上下文中不应存在的标识符名称

  • 当方法调用的时候如果标识符与上下文关键字冲突时需要使用限定的调用方式以区分语句含义

  • java字母表示a-z、A-Z、、_(用于机器生成代码,_是关键字不能单独存在作为标识符

  • Java数字表示0-9

  • Java程序也允许使用Unicode字符书写标识符(注意字母长得像比一定就是相同的字母,有相同的Unicode编码才是字母相同的唯一标准)

3.9. Keywords

51 character sequences, formed from ASCII characters, are reserved for use as keywords and cannot be used as identifiers (§3.8). Another 17 character sequences, also formed from ASCII characters, may be interpreted as keywords or as other tokens, depending on the context in which they appear. 51 个字符序列,由 ASCII 字符组成,保留用作关键字,不能用作标识符 (§3.8)。另外 17 个字符序列也由 ASCII 字符组成,可以解释为关键字或其他标记,具体取决于它们出现的上下文。

Keyword: 关键词:

ReservedKeyword ContextualKeyword

ReservedKeyword:

(one of)

abstract   continue   for          new         switch
assert     default    if           package     synchronized
boolean    do         goto         private     this
break      double     implements   protected   throw
byte       else       import       public      throws
case       enum       instanceof   return      transient
catch      extends    int          short       try
char       final      interface    static      void
class      finally    long         strictfp    volatile
const      float      native       super       while
_ (underscore)

ContextualKeyword:

(one of)

exports      opens      requires     uses   yield
module       permits    sealed       var         
non-sealed   provides   to           when        
open         record     transitive   with        

The keywords const and goto are reserved, even though they are not currently used. This may allow a Java compiler to produce better error messages if these C++ keywords incorrectly appear in programs. 关键字 constgoto 是保留的,即使它们当前未使用。如果这些 C++ 关键字错误地出现在程序中,这可能会允许 Java 编译器生成更好的错误消息。

The keyword strictfp is obsolete and should not be used in new code. 关键字 strictfp 已过时,不应在新代码中使用。

The keyword _ (underscore) may be used in certain declarations in place of an identifier (§6.1). 关键字 _ (下划线)可以在某些声明中代替标识符 (§6.1)。

true and false are not keywords, but rather boolean literals (§3.10.3). truefalse 不是关键字,而是布尔文字 ( §3.10.3 )。

null is not a keyword, but rather the null literal (§3.10.8). null 不是关键字,而是空字面 ( §3.10.8 )。

During the reduction of input characters to input elements (§3.5), a sequence of input characters that notionally matches a contextual keyword is reduced to a contextual keyword if and only if both of the following conditions hold: 在将输入字符简化为输入元素 ( §3.5 ) 期间,当且仅当以下两个条件都成立时,理论上与上下文关键字匹配的输入字符序列会被简化为上下文关键字:

  1. The sequence is recognized as a terminal specified in a suitable context of the syntactic grammar (§2.3), as follows: 该序列被识别为在句法语法 (§2.3) 的合适上下文中指定的终结符,如下所示:

    • For module and open, when recognized as a terminal in a ModuleDeclaration (§7.7). 对于 moduleopen ,当在 ModuleDeclaration 中被识别为终终结符时 ( §7.7 )。

    • For exports, opens, provides, requires, to, uses, and with, when recognized as a terminal in a ModuleDirective. For exportsopensprovidesrequirestouseswith ,当在 ModuleDirective 中被识别为终终结符时。

    • For transitive, when recognized as a terminal in a RequiresModifier. 对于 transitive ,当被识别为 RequiresModifier 中的终结符时。

      For example, recognizing the sequence requires transitive ; does not make use of RequiresModifier, so the term transitive is reduced here to an identifier and not a contextual keyword. 例如,识别序列 requires transitive ; 不会使用 RequiresModifier,因此术语 transitive 在此处简化为标识符,而不是上下文关键字。

    • For var, when recognized as a terminal in a LocalVariableType (§14.4) or a LambdaParameterType (§15.27.1). 对于 var ,当被识别为 LocalVariableType ( §14.4 ) 或 LambdaParameterType ( §15.27.1 ) 中的终结符时。

      In other contexts, attempting to use var as an identifier will cause an error, because var is not a TypeIdentifier (§3.8). 在其他上下文中,尝试使用 var 作为标识符将导致错误,因为 var 不是 TypeIdentifier ( §3.8 )。

    • For yield, when recognized as a terminal in a YieldStatement (§14.21). 对于 yield ,当在 YieldStatement 中被识别为终终结符时 ( §14.21 )。

      In other contexts, attempting to use the yield as an identifier will cause an error, because yield is neither a TypeIdentifier nor a UnqualifiedMethodIdentifier. 在其他上下文中,尝试使用 yield 作为标识符将导致错误,因为 yield 既不是 TypeIdentifier 也不是 UnqualifiedMethodIdentifier。

    • For record, when recognized as a terminal in a RecordDeclaration (§8.10). 对于 record ,当在 RecordDeclaration 中被识别为终结符时 ( §8.10 )。

    • For non-sealed, permits, and sealed, when recognized as a terminal in a NormalClassDeclaration (§8.1) or a NormalInterfaceDeclaration (§9.1). 对于 non-sealedpermitssealed ,当在 NormalClassDeclaration ( §8.1 ) 或 NormalInterfaceDeclaration ( §9.1 ) 中被识别为终结符时。

    • For when, when recognized as a terminal in a Guard (§14.11.1). 对于 when ,当被识别为 Guard 中的终结符时 ( §14.11.1 )。

  2. The sequence is not immediately preceded or immediately followed by an input character that matches JavaLetterOrDigit. 序列之前或后面没有与 JavaLetterOrDigit 匹配的输入字符。

In general, accidentally omitting white space in source code will cause a sequence of input characters to be tokenized as an identifier, due to the "longest possible translation" rule (§3.2). For example, the sequence of twelve input characters p u b l i c s t a t i c is always tokenized as the identifier publicstatic, rather than as the reserved keywords public and static. If two tokens are intended, they must be separated by white space or a comment. 通常,由于“尽可能长的翻译”规则(§3.2),在源代码中意外省略空格将导致输入字符序列被标记为标识符。例如,12 个输入字符 p u b l i c s t a t i c 的序列始终被标记为标识符 publicstatic ,而不是保留关键字 publicstatic 。如果要使用两个标记,则必须用空格或注释分隔它们。

The rule above works in tandem with the "longest possible translation" rule to produce an intuitive result in contexts where contextual keywords may appear. For example, the sequence of eleven input characters v a r f i l e n a m e is usually tokenized as the identifier varfilename, but in a local variable declaration, the first three input characters are tentatively recognized as the contextual keyword var by the first condition of the rule above. However, it would be confusing to overlook the lack of white space in the sequence by recognizing the next eight input characters as the identifier filename. (This would mean that the sequence undergoes different tokenization in different contexts: an identifier in most contexts, but a contextual keyword and an identifier in local variable declarations.) Accordingly, the second condition prevents recognition of the contextual keyword var on the grounds that the immediately following input character f is a JavaLetterOrDigit. The sequence v a r f i l e n a m e is therefore tokenized as the identifier varfilename in a local variable declaration. 上述规则与“尽可能长的翻译”规则协同工作,以在可能出现上下文关键字的上下文中产生直观的结果。例如,11 个输入字符 v a r f i l e n a m e 的序列通常被标记为标识符 varfilename ,但在局部变量声明中,前三个输入字符被上述规则的第一个条件暂时识别为上下文关键字 var 。但是,通过将接下来的八个输入字符识别为标识符 filename 来忽略序列中缺少空格会令人困惑。(这意味着序列在不同的上下文中会经历不同的标记化:在大多数情况下是标识符,但在局部变量声明中是上下文关键字和标识符。因此,第二个条件阻止识别上下文关键字 var ,理由是紧跟在后的输入字符 f是 JavaLetterOrDigit。因此,序列 v a r f i l e n a m e 在局部变量声明中被标记为标识符 varfilename

As another example of the careful recognition of contextual keywords, consider the sequence of 15 input characters n o n - s e a l e d c l a s s. This sequence is usually translated to three tokens - the identifier non, the operator -, and the identifier sealedclass - but in a normal class declaration, where the first condition holds, the first ten input characters are tentatively recognized as the contextual keyword non-sealed. To avoid translating the sequence to two keyword tokens (non-sealed and class) rather than three non-keyword tokens, and to avoid rewarding the programmer for omitting white space before class, the second condition prevents recognition of the contextual keyword. The sequence n o n - s e a l e d c l a s s is therefore tokenized as three tokens in a class declaration. 作为仔细识别上下文关键字的另一个示例,请考虑 15 个输入字符 n o n - s e a l e d c l a s s 的序列。此序列通常转换为三个标记 - 标识符 non 、运算符 - 和标识符 sealedclass - 但在正常类声明中,第一个条件成立,前十个输入字符被暂时识别为上下文关键字 non-sealed 。为了避免将序列转换为两个关键字标记( non-sealedclass )而不是三个非关键字标记,并且这对于避免程序员省略 class 之前的空格是有益的,第二个条件会阻止识别上下文关键字。因此,序列 n o n - s e a l e d c l a s s 在类声明中被标记为三个标记。

In the rule above, the first condition depends on details of the syntactic grammar, but a compiler for the Java programming language can implement the rule without fully parsing the input program. For example, a heuristic could be used to track the contextual state of the tokenizer, as long as the heuristic guarantees that valid uses of contextual keywords are tokenized as keywords, and valid uses of identifiers are tokenized as identifiers. Alternatively, a compiler could always tokenize a contextual keyword as an identifier, leaving it to a later phase to recognize special uses of these identifiers. 在上面的规则中,第一个条件取决于句法语法的细节,但 Java 编程语言的编译器可以在不完全解析输入程序的情况下实现该规则。例如,启发式方法可用于跟踪分词器的上下文状态,只要启发式保证上下文关键字的有效使用被标记化为关键字,并且标识符的有效使用被标记为标识符。或者,编译器始终可以将上下文关键字标记为标识符,将其留到稍后阶段来识别这些标识符的特殊用途。

总结

  • 关键字分类:

    • 保留关键字(51个),不可以作为标识符使用
    • 上下文关键字(17个)可以作为标识符使用,需要根据上下文区分
  • true false null 都属于字面量,不属于关键字

  • 以下规则同时满足时字符序列才标记为上下文关键字

    • 输入字符序列在特定的上下文中
    • 此字符序列前后直接相邻的字符不是JavaLetterOrDigit
  • 在特定的上下文中,以上第二个条件在尽最大努力标记的时候,避免将一个普通标识符识别成:上下文关键字 + 一个普通标识符这样一个结果

  • Java编译器可以通过启发式标记,可以在不完全解析程序输入的情况下保证上下文关键字的正确标记。另一种处理方式是先将上下文关键字处理为普通标识符,后面再处理这些标识符是否能标记为特殊标识符

3.10. Literals

A literal is the source code representation of a value of a primitive type (§4.2), the String type (§4.3.3), or the null type (§4.1). 文字是基本类型 ( §4.2 )、 String 类型 ( §4.3.3 ) 或 null 类型 ( §4.1 ) 的值的源代码表示形式。

Literal:

IntegerLiteral FloatingPointLiteral BooleanLiteral CharacterLiteral StringLiteral TextBlock NullLiteral

3.10.1. Integer Literals

An integer literal may be expressed in decimal (base 10), hexadecimal (base 16), octal (base 8), or binary (base 2). 整数文字可以用十进制(以 10 为基数)、十六进制(以 16 为基数)、八进制(以 8 为基数)或二进制(以 2 为基数)表示。

IntegerLiteral:

DecimalIntegerLiteral HexIntegerLiteral OctalIntegerLiteral BinaryIntegerLiteral

DecimalIntegerLiteral:

DecimalNumeral [IntegerTypeSuffix]

HexIntegerLiteral:

HexNumeral [IntegerTypeSuffix]

OctalIntegerLiteral:

OctalNumeral [IntegerTypeSuffix]

BinaryIntegerLiteral:

BinaryNumeral [IntegerTypeSuffix]

IntegerTypeSuffix:

(one of) l L

An integer literal is of type long if it is suffixed with an ASCII letter L or l (ell); otherwise it is of type int (§4.2.1). 如果整数文字以 ASCII 字母 Ll (ell) 为后缀,则其类型为 long ;否则,它的类型为 int ( §4.2.1 )。

The suffix L is preferred, because the letter l (ell) is often hard to distinguish from the digit 1 (one). 后缀 L 是首选,因为字母 l (ell) 通常很难与数字 1 (one) 区分开来。

Underscores are allowed as separators between digits that denote the integer. 允许在表示整数的数字之间使用下划线作为分隔符。

In a hexadecimal or binary literal, the integer is only denoted by the digits after the 0x or 0b characters and before any type suffix. Therefore, underscores may not appear immediately after 0x or 0b, or after the last digit in the numeral. 在十六进制或二进制文本中,整数仅由 0x0b 字符后面以及任何类型后缀之前的数字表示。因此,下划线可能不会紧跟在 0x0b 之后,或者出现在数字的最后一位数字之后。

In a decimal or octal literal, the integer is denoted by all the digits in the literal before any type suffix. Therefore, underscores may not appear before the first digit or after the last digit in the numeral. Underscores may appear after the initial 0 in an octal numeral (since 0 is a digit that denotes part of the integer) and after the initial non-zero digit in a non-zero decimal literal. 在十进制或八进制文本中,整数由文本中任何类型后缀之前的所有数字表示。因此,下划线可能不会出现在数字的第一个数字之前或最后一个数字之后。下划线可能出现在八进制数字的初始 0 之后(因为 0 是表示整数一部分的数字),也可能出现在非零十进制文字中的初始非零数字之后。

A decimal numeral is either the single ASCII digit 0, representing the integer zero, or consists of an ASCII digit from 1 to 9 optionally followed by one or more ASCII digits from 0 to 9 interspersed with underscores, representing a positive integer. 十进制数字是表示整数零的单个 ASCII 数字 0 ,或者由从 19 的 ASCII 数字组成,可以选择后跟从 09 的一个或多个 ASCII 数字,并穿插下划线,表示正整数。

DecimalNumeral:

0 NonZeroDigit [Digits] NonZeroDigit Underscores Digits

NonZeroDigit:

(one of) 1 2 3 4 5 6 7 8 9

Digits:

Digit Digit [DigitsAndUnderscores] Digit

Digit:

0 NonZeroDigit

DigitsAndUnderscores:

DigitOrUnderscore {DigitOrUnderscore}

DigitOrUnderscore:

Digit _

Underscores:

_ {_}

A hexadecimal numeral consists of the leading ASCII characters 0x or 0X followed by one or more ASCII hexadecimal digits interspersed with underscores, and can represent a positive, zero, or negative integer. 十六进制数字由前导 ASCII 字符 0x0X 组成,后跟一个或多个穿插下划线的 ASCII 十六进制数字,可以表示正整数、零整数或负整数。

Hexadecimal digits with values 10 through 15 are represented by the ASCII letters a through f or A through F, respectively; each letter used as a hexadecimal digit may be uppercase or lowercase. 值为 10 到 15 的十六进制数字分别由 ASCII 字母 afAF 表示;用作十六进制数字的每个字母可以是大写或小写。

HexNumeral:

0 x HexDigits 0 X HexDigits

HexDigits:

HexDigit HexDigit [HexDigitsAndUnderscores] HexDigit

HexDigit:

(one of) 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

HexDigitsAndUnderscores:

HexDigitOrUnderscore {HexDigitOrUnderscore}

HexDigitOrUnderscore:

HexDigit _

The HexDigit production above comes from §3.3. 上面的 HexDigit 生产来自 §3.3 。

An octal numeral consists of an ASCII digit 0 followed by one or more of the ASCII digits 0 through 7 interspersed with underscores, and can represent a positive, zero, or negative integer. 八进制数字由一个 ASCII 数字 0 组成,后跟一个或多个 ASCII 数字 07 ,中间穿插着下划线,可以表示正整数、零整数或负整数。

OctalNumeral:

0 OctalDigits 0 Underscores OctalDigits

OctalDigits:

OctalDigit OctalDigit [OctalDigitsAndUnderscores] OctalDigit

OctalDigit:

(one of) 0 1 2 3 4 5 6 7

OctalDigitsAndUnderscores:

OctalDigitOrUnderscore {OctalDigitOrUnderscore}

OctalDigitOrUnderscore:

OctalDigit _

Note that octal numerals always consist of two or more digits, as 0 alone is always considered to be a decimal numeral - not that it matters much in practice, for the numerals 0, 00, and 0x0 all represent exactly the same integer value. 请注意,八进制数字总是由两个或多个数字组成,因为单独的 0 总是被认为是一个十进制数字 - 这在实践中并不重要,因为数字 0000x0 都表示完全相同的整数值。

A binary numeral consists of the leading ASCII characters 0b or 0B followed by one or more of the ASCII digits 0 or 1 interspersed with underscores, and can represent a positive, zero, or negative integer. 二进制数字由前导 ASCII 字符 0b0B 组成,后跟一个或多个 ASCII 数字 01 ,并穿插着下划线,可以表示正整数、零整数或负整数。

BinaryNumeral:

0 b BinaryDigits 0 B BinaryDigits

BinaryDigits:

BinaryDigit BinaryDigit [BinaryDigitsAndUnderscores] BinaryDigit

BinaryDigit:

(one of) 0 1

BinaryDigitsAndUnderscores:

BinaryDigitOrUnderscore {BinaryDigitOrUnderscore}

BinaryDigitOrUnderscore:

BinaryDigit _

The largest decimal literal of type int is 2147483648 (2^31). int 类型的最大十进制文字是 2147483648 (2^31 )。

All decimal literals from 0 to 2147483647 may appear anywhere an int literal may appear. The decimal literal 2147483648 may appear only as the operand of the unary minus operator - (§15.15.4). 从 02147483647 的所有十进制文字都可能出现在 int 文字可能出现的任何位置。十进制文字 2147483648 只能作为一元减号运算符 - ( §15.15.4 ) 的操作数出现。

It is a compile-time error if the decimal literal 2147483648 appears anywhere other than as the operand of the unary minus operator; or if a decimal literal of type int is larger than 2147483648 (2^31). 如果十进制文字 2147483648 出现在一元减号运算符的操作数以外的任何位置,则为编译时错误;或者,如果 int 类型的十进制文本大于 2147483648 (2^31 )。

The largest positive hexadecimal, octal, and binary literals of type int - each of which represents the decimal value 2147483647 (231-1) - are respectively: int 类型的最大正十六进制、八进制和二进制文字(每个都表示十进制值 2147483647 (2^31 -1)分别是:

  • 0x7fff_ffff,
  • 0177_7777_7777, and
  • 0b0111_1111_1111_1111_1111_1111_1111_1111

The most negative hexadecimal, octal, and binary literals of type int - each of which represents the decimal value -2147483648 (-2^31) - are respectively: int 类型的最负十六进制、八进制和二进制文字 - 每个都表示十进制值 -2147483648 (-2^31 ) - 分别是:

  • 0x8000_0000,
  • 0200_0000_0000, and
  • 0b1000_0000_0000_0000_0000_0000_0000_0000

The following hexadecimal, octal, and binary literals represent the decimal value -1: 以下十六进制、八进制和二进制文本表示十进制值 -1

  • 0xffff_ffff,
  • 0377_7777_7777, and
  • 0b1111_1111_1111_1111_1111_1111_1111_1111

It is a compile-time error if a hexadecimal, octal, or binary int literal does not fit in 32 bits. 如果十六进制、八进制或二进制 int 文本不适合 32 位,则为编译时错误。

The largest decimal literal of type long is 9223372036854775808L (2^63). long 类型的最大十进制文字是 9223372036854775808L (2^63 )。

All decimal literals from 0L to 9223372036854775807L may appear anywhere a long literal may appear. The decimal literal 9223372036854775808L may appear only as the operand of the unary minus operator - (§15.15.4). 从 0L9223372036854775807L 的所有十进制文字都可能出现在 long 文字可能出现的任何位置。十进制文字 9223372036854775808L 只能作为一元减号运算符 - ( §15.15.4 ) 的操作数出现。

It is a compile-time error if the decimal literal 9223372036854775808L appears anywhere other than as the operand of the unary minus operator; or if a decimal literal of type long is larger than 9223372036854775808L (263). 如果十进制文字 9223372036854775808L 出现在一元减号运算符的操作数以外的任何位置,则为编译时错误;或者,如果 long 类型的十进制文本大于 9223372036854775808L (2^63 )。

The largest positive hexadecimal, octal, and binary literals of type long - each of which represents the decimal value 9223372036854775807L (263-1) - are respectively: long 类型的最大正十六进制、八进制和二进制文字(每个都表示十进制值 9223372036854775807L (2 63 -1)分别是:

  • 0x7fff_ffff_ffff_ffffL,
  • 07_7777_7777_7777_7777_7777L, and
  • 0b0111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111L

The most negative hexadecimal, octal, and binary literals of type long - each of which represents the decimal value -9223372036854775808L (-2^63) - are respectively: long 类型的最负十六进制、八进制和二进制文字 - 每个都表示十进制值 -9223372036854775808L (-2^63 ) - 分别是:

  • 0x8000_0000_0000_0000L, and
  • 010_0000_0000_0000_0000_0000L, and
  • 0b1000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000L

The following hexadecimal, octal, and binary literals represent the decimal value -1L: 以下十六进制、八进制和二进制文本表示十进制值 -1L

  • 0xffff_ffff_ffff_ffffL,
  • 017_7777_7777_7777_7777_7777L, and
  • 0b1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111L

It is a compile-time error if a hexadecimal, octal, or binary long literal does not fit in 64 bits. 如果十六进制、八进制或二进制 long 文本不适合 64 位,则为编译时错误。

Examples of int literals:

0    2    0372    0xDada_Cafe    1996    0x00_FF__00_FF

Examples of long literals:

0l    0777L    0x100000000L    2_147_483_648L    0xC0B0L

总结

  • 整形文本,可以有十进制、十六进制(0x、0X)、八进制(0)、二进制表示(0b、0B)
  • 数字之间可由_分割
  • l或者L作为后缀可以区分int和long类型
  • 文本表示的数值范围不得超过int或者long最大可表示范围(编译错误)

3.10.2. Floating-Point Literals

A floating-point literal has the following parts: a whole-number part, a decimal or hexadecimal point (represented by an ASCII period character), a fraction part, an exponent, and a type suffix. 浮点文字包含以下部分:整数部分、小数或十六进制点(由 ASCII 句点字符表示)、分数部分、指数和类型后缀。

A floating-point literal may be expressed in decimal (base 10) or hexadecimal (base 16). 浮点文字可以用十进制(以 10 为基数)或十六进制(以 16 为基数)表示。

For decimal floating-point literals, at least one digit (in either the whole number or the fraction part) and either a decimal point, an exponent, or a float type suffix are required. All other parts are optional. The exponent, if present, is indicated by the ASCII letter e or E followed by an optionally signed integer. 对于十进制浮点文字,至少需要一位数字(整数或分数部分)和小数点、指数或浮点类型后缀。所有其他部件都是可选的。指数(如果存在)由 ASCII 字母 eE 表示,后跟可选的有符号整数。

For hexadecimal floating-point literals, at least one digit is required (in either the whole number or the fraction part), and the exponent is mandatory, and the float type suffix is optional. The exponent is indicated by the ASCII letter p or P followed by an optionally signed integer. 对于十六进制浮点文字,至少需要一位数字(在整数或分数部分),指数是必需的,浮点类型后缀是可选的。指数由 ASCII 字母 pP 表示,后跟可选的有符号整数。

Underscores are allowed as separators between digits that denote the whole-number part, and between digits that denote the fraction part, and between digits that denote the exponent. 下划线可以作为表示整数部分的数字之间、表示分数部分的数字之间以及表示指数的数字之间的分隔符。

FloatingPointLiteral:

DecimalFloatingPointLiteral HexadecimalFloatingPointLiteral

DecimalFloatingPointLiteral:

Digits . [Digits] [ExponentPart] [FloatTypeSuffix] . Digits [ExponentPart] [FloatTypeSuffix] Digits ExponentPart [FloatTypeSuffix] Digits [ExponentPart] FloatTypeSuffix

ExponentPart:

ExponentIndicator SignedInteger

ExponentIndicator:

(one of) e E

SignedInteger:

[Sign] Digits

Sign:

(one of) + -

FloatTypeSuffix:

(one of) f F d D

HexadecimalFloatingPointLiteral:

HexSignificand BinaryExponent [FloatTypeSuffix]

HexSignificand:

HexNumeral [.] 0 x [HexDigits] . HexDigits 0 X [HexDigits] . HexDigits

BinaryExponent:

BinaryExponentIndicator SignedInteger

BinaryExponentIndicator:

(one of) p P

A floating-point literal is of type float if it is suffixed with an ASCII letter F or f; otherwise its type is double and it can optionally be suffixed with an ASCII letter D or d. 如果浮点文字以 ASCII 字母 Ff 为后缀,则其类型为 float ;否则,其类型为 double ,可以选择以 ASCII 字母 Dd 为后缀。

The elements of the types float and double are those values that can be represented using the IEEE 754 binary32 and IEEE 754 binary64 floating-point formats, respectively (§4.2.3). floatdouble 类型的元素是可以分别使用 IEEE 754 binary32 和 IEEE 754 binary64 浮点格式表示的值 (§4.2.3)。

The details of proper input conversion from a Unicode string representation of a floating-point number to the internal IEEE 754 binary floating-point representation are described for the methods valueOf of class Float and class Double of the package java.lang. 对于包 java.lang 的类 Float 和类 Double 的方法,描述了从浮点数的 Unicode 字符串表示形式到内部 IEEE 754 二进制浮点表示{0}形式的正确输入转换的详细信息。

The largest and smallest positive literals of type float are as follows: float 类型的最大和最小正文本如下所示:

  • The largest positive finite float value is numerically equal to (2 - 2^-23) ⋅ 2^127. 最大正有限 float 值在数值上等于 (2 - 2^-23 ) ⋅ 2^127 。

    The shortest decimal literal which rounds to this value is 3.4028235e38f. 舍入到此值的最短十进制文字是 3.4028235e38f

    A hexadecimal literal for this value is 0x1.fffffeP+127f. 此值的十六进制文字为 0x1.fffffeP+127f

  • The smallest positive finite non-zero float value is numerically equal to 2^-149. 最小的正有限非零 float 值在数值上等于 2^-149 。

    The shortest decimal literal which rounds to this value is 1.4e-45f. 舍入到此值的最短十进制文字是 1.4e-45f

    Two hexadecimal literals for this value are 0x0.000002P-126f and 0x1.0P-149f. 此值的两个十六进制文本是 0x0.000002P-126f0x1.0P-149f

The largest and smallest positive literals of type double are as follows: double 类型的最大和最小正文本如下所示:

  • The largest positive finite double value is numerically equal to (2 - 2^-52) ⋅ 2^1023. 最大正有限 double 值在数值上等于 (2 - 2^-52 ) ⋅ 2^1023 。

    The shortest decimal literal which rounds to this value is 1.7976931348623157e308. 舍入到此值的最短十进制文字是 1.7976931348623157e308

    A hexadecimal literal for this value is 0x1.f_ffff_ffff_ffffP+1023. 此值的十六进制文字为 0x1.f_ffff_ffff_ffffP+1023

  • The smallest positive finite non-zero double value is numerically equal to 2^-1074. 最小的正有限非零 double 值在数值上等于 2^-1074 。

    The shortest decimal literal which rounds to this value is 4.9e-324. 舍入到此值的最短十进制文字是 4.9e-324

    Two hexadecimal literals for this value are 0x0.0_0000_0000_0001P-1022 and 0x1.0P-1074. 此值的两个十六进制文本是 0x0.0_0000_0000_0001P-10220x1.0P-1074

It is a compile-time error if a non-zero floating-point literal is too large, so that on rounded conversion to its internal representation, it becomes an IEEE 754 infinity. 如果非零浮点文字太大,则为编译时错误,因此在四舍五入转换为其内部表示形式时,它将成为 IEEE 754 无穷大。

A program can represent infinities without producing a compile-time error by using constant expressions such as 1f/0f or -1d/0d or by using the predefined constants POSITIVE_INFINITY and NEGATIVE_INFINITY of the classes Float and Double. 通过使用 1f/0f-1d/0d 等常量表达式或使用 FloatDouble 类的预定义常量 POSITIVE_INFINITYNEGATIVE_INFINITY ,程序可以表示无穷大而不会产生编译时错误。

It is a compile-time error if a non-zero floating-point literal is too small, so that, on rounded conversion to its internal representation, it becomes a zero. 如果非零浮点文字太小,则为编译时错误,因此,在四舍五入转换为其内部表示形式时,它变为零。

A compile-time error does not occur if a non-zero floating-point literal has a small value that, on rounded conversion to its internal representation, becomes a non-zero subnormal number. 如果非零浮点文字具有一个小值,该值在四舍五入转换为其内部表示形式时变为非零次正态数,则不会发生编译时错误。

Predefined constants representing Not-a-Number values are defined in the classes Float and Double as Float.NaN and Double.NaN. 表示 Not-a-Number 值的预定义常量在类 FloatDouble 中定义为 Float.NaNDouble.NaN

Examples of float literals:

1e1f    2.f    .3f    0f    3.14f    6.022137e+23f

Examples of double literals:

1e1    2.    .3    0.0    3.14    1e-9d    1e137

总结

  • 浮点字面量有十进制和十六进制(0x、0X)表示形式
  • 指数部分十进制(e,E)、十六进制(p、P)
  • 默认是double,后接f、F为float,后接d、D为double
  • 非零浮点文字太大,则为编译时错误,因此在四舍五入转换为其内部表示形式时,它将成为 IEEE 754 无穷大
  • 使用 1f/0f-1d/0d 等常量表达式或使用 FloatDouble 类的预定义常量 POSITIVE_INFINITYNEGATIVE_INFINITY ,程序可以表示无穷大而不会产生编译时错误
  • 如果非零浮点文字太小,则为编译时错误,因此,在四舍五入转换为其内部表示形式时,它变为零
  • 表示 Not-a-Number 值的预定义常量在类 FloatDouble 中定义为 Float.NaNDouble.NaN

3.10.3. Boolean Literals

The boolean type has two values, represented by the boolean literals true and false, formed from ASCII letters. boolean 类型有两个值,由布尔文本 truefalse 表示,由 ASCII 字母组成。

BooleanLiteral:

(one of) true false

A boolean literal is always of type boolean (§4.2.5). 布尔文字的类型始终为 boolean ( §4.2.5 )。

总结

  • 布尔字面量由true和false表示

3.10.4. Character Literals

A character literal is expressed as a character or an escape sequence (§3.10.7), enclosed in ASCII single quotes. (The single-quote, or apostrophe, character is \u0027.) 字符文字表示为字符或转义序列 (§3.10.7),用 ASCII 单引号括起来。(单引号或撇号字符为 \u0027 )。

CharacterLiteral:

'SingleCharacter ' 'EscapeSequence '

SingleCharacter:

InputCharacter but not ' or \

A character literal is always of type char (§4.2.1). 字符文字始终是 char ( §4.2.1 ) 类型的。

The content of a character literal is the SingleCharacter or the EscapeSequence which follows the opening '. 字符文本的内容是 SingleCharacter 或 EscapeSequence,它跟在开头 ' 之后。

It is a compile-time error for the character following the content to be other than a '. 如果内容后面的字符不是 ' ,则编译时错误。

It is a compile-time error for a line terminator (§3.4) to appear after the opening ' and before the closing '. 行终止符 (§3.4) 出现在开始 ' 和结束 ' 之前是编译时错误。

The characters CR and LF are never an InputCharacter; each is recognized as constituting a LineTerminator, so may not appear in a character literal, even in the escape sequence \ LineTerminator. 字符 CR 和 LF 从来都不是 InputCharacter;每个都被视为构成一个 LineTerminator,因此可能不会出现在字符文本中,即使在转义序列 \ LineTerminator 中也是如此。

The character represented a character literal is the content of the character literal with any escape sequence interpreted, as if by execution of String.translateEscapes on the content. 表示字符文字的字符是字符文字的内容,并解释任何转义序列,就像通过对内容执行 String.translateEscapes 一样。

Character literals can only represent UTF-16 code units (§3.1), i.e., they are limited to values from \u0000 to \uffff. Supplementary characters must be represented either as a surrogate pair within a char sequence, or as an integer, depending on the API they are used with. 字符文字只能表示 UTF-16 代码单元 ( §3.1 ),即它们仅限于 \u0000\uffff 之间的值。补充字符必须表示为 char 序列中的代理项对,或表示为整数,具体取决于使用它们的 API。

The following are examples of char literals:

  • 'a'
  • '%'
  • '\t'
  • '\\'
  • '\''
  • '\u03a9'
  • '\uFFFF'
  • '\177'
  • '™'

Because Unicode escapes are processed very early, it is not correct to write '\u000a' for a character literal whose value is linefeed (LF); the Unicode escape \u000a is transformed into an actual linefeed in translation step 1 (§3.3) and the linefeed becomes a LineTerminator in step 2 (§3.4), so the character literal is not valid in step 3. Instead, one should use the escape sequence '\n'. Similarly, it is not correct to write '\u000d' for a character literal whose value is carriage return (CR). Instead, use '\r'. Finally, it is not possible to write '\u0027' for a character literal containing an apostrophe ('). 由于 Unicode 转义的处理非常早,因此为值为 linefeed (LF) 的字符文本编写 '\u000a' 是不正确的;Unicode 转义 \u000a 在翻译步骤 1 (§3.3) 中转换为实际的换行符,换行符在步骤 2 (§3.4) 中转换为行终止符,因此字符文字在步骤 3 中无效。相反,应使用转义序列 '\n' 。同样,为值为回车符 (CR) 的字符文本编写 '\u000d' 也是不正确的。请改用 '\r' 。最后,对于包含撇号 ( ' ) 的字符文字,不可能写 '\u0027'

In C and C++, a character literal may contain representations of more than one character, but the value of such a character literal is implementation-defined. In the Java programming language, a character literal always represents exactly one character. 在 C 和 C++ 中,字符文本可能包含多个字符的表示形式,但此类字符文本的值是实现定义的。在 Java 编程语言中,字符文字始终只表示一个字符。

总结

  • 字符字面量由双单引号之间(内容区域)放单个字符或者Unicode转义序列组成
  • 单引号是单个字符时需要加转义符号,形如:'\''
  • Unicode转义序列的范围为\u0000\uffff。扩展序列表示成整数值,或者序列对(目前编译器无法直接表示)
  • LF或者CR字符出现在内容区域时建议用\n或者\r代替(否则会编译错误),同时表示这两个字符的Unicode转义序列也不可以出现(程序会处理这些符号词法分析第一步转换成LF或者CR,之后会被识别为行终结符)
  • Java语言中字符文字始终表示单个字符,有别于C和C++

3.10.5. String Literals

A string literal consists of zero or more characters enclosed in double quotes. Characters such as newlines may be represented by escape sequences (§3.10.7). 字符串文本由零个或多个字符组成,并用双引号括起来。换行符等字符可以用转义序列表示 ( §3.10.7 )。

StringLiteral:

"{StringCharacter} "

StringCharacter:

InputCharacter but not " or \ EscapeSequence

A string literal is always of type String (§4.3.3). 字符串文本的类型始终为 String ( §4.3.3 )。

The content of a string literal is the sequence of characters that begins immediately after the opening " and ends immediately before the matching closing ". 字符串文本的内容是紧接在开始 " 之后开始并紧接在匹配的结束 " 之前结束的字符序列。

It is a compile-time error for a line terminator (§3.4) to appear after the opening " and before the matching closing ". 行终止符 (§3.4) 出现在开始 " 之后和匹配的结束 " 之前是一个编译时错误。

The characters CR and LF are never an InputCharacter; each is recognized as constituting a LineTerminator, so may not appear in a string literal, even in the escape sequence \ LineTerminator. 字符 CR 和 LF 从来都不是 InputCharacter;每个都被视为构成 LineTerminator,因此可能不会出现在字符串文本中,即使在转义序列 \ LineTerminator 中也是如此。

The string represented by a string literal is the content of the string literal with every escape sequence interpreted, as if by execution of String.translateEscapes on the content. 字符串文本表示的字符串是字符串文本的内容,每个转义序列都经过解释,就像在内容上执行 String.translateEscapes 一样。

The following are examples of string literals:

""                    // the empty string
"\""                  // a string containing " alone
"This is a string"    // a string containing 16 characters
"This is a " +        // actually a string-valued constant expression,
    "two-line string"    // formed from two string literals

Because Unicode escapes are processed very early, it is not correct to write "\u000a" for a string literal containing a single linefeed (LF); the Unicode escape \u000a is transformed into an actual linefeed in translation step 1 (§3.3) and the linefeed becomes a LineTerminator in step 2 (§3.4), so the string literal is not valid in step 3. Instead, one should use the escape sequence "\n". Similarly, it is not correct to write "\u000d" for a string literal containing a single carriage return (CR). Instead, use "\r". Finally, it is not possible to write "\u0022" for a string literal containing a double quotation mark ("). 由于 Unicode 转义的处理非常早,因此为包含单个换行符 (LF) 的字符串文本编写 "\u000a" 是不正确的;Unicode 转义 \u000a 在翻译步骤 1 ( §3.3 ) 中转换为实际的换行符,换行符在步骤 2 ( §3.4 ) 中变为 LineTerminator ,因此字符串文本在步骤 3 中无效。相反,应使用转义序列 "\n" 。同样,为包含单个回车符 (CR) 的字符串文本编写 "\u000d" 也是不正确的。请改用 "\r" 。最后,无法为包含双引号 ( " ) 的字符串文本编写 "\u0022"

A long string literal can always be broken up into shorter pieces and written as a (possibly parenthesized) expression using the string concatenation operator + (§15.18.1). 长字符串文本总是可以分解成较短的部分,并使用字符串连接运算符 + ( §15.18.1 ) 写成(可能用括号括起来的)表达式。

At run time, a string literal is a reference to an instance of class String (§4.3.3) that denotes the string represented by the string literal. 在运行时,字符串文本是对类 String ( §4.3.3 ) 实例的引用,该实例表示字符串文本所表示的字符串。

Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.29) - are "interned" so as to share unique instances, as if by execution of the method String.intern (§12.5). 此外,字符串文本始终引用类 String 的同一实例。这是因为字符串文字 - 或者更一般地说,作为常量表达式值的字符串 ( §15.29 ) - 被“实习”以共享唯一的实例,就像通过执行方法 String.intern ( §12.5 ) 一样。

Example 3.10.5-1. String Literals

The program consisting of the compilation unit (§7.3): 该程序由编译单元(§7.3)组成:

package testPackage;
class Test {
    public static void main(String[] args) {
        String hello = "Hello", lo = "lo";
        System.out.println(hello == "Hello");
        System.out.println(Other.hello == hello);
        System.out.println(other.Other.hello == hello);
        System.out.println(hello == ("Hel"+"lo"));
        System.out.println(hello == ("Hel"+lo));
        System.out.println(hello == ("Hel"+lo).intern());
    }
}
class Other { static String hello = "Hello"; }

and the compilation unit: 和编译单元:

package other;
public class Other { public static String hello = "Hello"; }

produces the output: 生成输出:

true
true
true
true
false
true

This example illustrates six points: 此示例说明了六点:

  • String literals in the same class and package represent references to the same String object (§4.3.1). 同一类和包中的字符串文本表示对同一 String 对象的引用 ( §4.3.1 )。
  • String literals in different classes in the same package represent references to the same String object. 同一包中不同类中的字符串文本表示对同一 String 对象的引用。
  • String literals in different classes in different packages likewise represent references to the same String object. 不同包中不同类中的字符串文本同样表示对同一 String 对象的引用。
  • Strings concatenated from constant expressions (§15.29) are computed at compile time and then treated as if they were literals. 从常量表达式(§15.29)连接起来的字符串在编译时被计算出来,然后被视为文字。
  • Strings computed by concatenation at run time are newly created and therefore distinct. 在运行时通过串联计算的字符串是新创建的,因此是不同的。
  • The result of explicitly interning a computed string is the same String object as any pre-existing string literal with the same contents. 显式嵌顿计算字符串的结果与具有相同内容的任何预先存在的字符串文本的 String 对象相同。

总结

  • 字符串文本由零个或多个字符组成(内容区域),并用双引号括起来。换行符等字符可以用转义序列表示
  • CR、LF、双引号出现在内容区域会编译出错,用相应的转义序列代替\r、\n、\"
  • 字符串文本始终被解释为String类型,并可以通过+运算符进行直接拼接
  • 在程序中出现的直接的字符串文本都表示同一个字符串对象(字符串常量池)
  • 只要是运行时需要计算才能确定的字符串文本,不会出现在池中,编译时能确定的都会出现在池中同时对它的引用都是相等的
  • 也可以通过String对象的intern方法将文本存入池中,并返回池中引用

3.10.6. Text Blocks

A text block consists of zero or more characters enclosed by opening and closing delimiters. Characters may be represented by escape sequences (§3.10.7), but the newline and double quote characters that must be represented with escape sequences in a string literal (§3.10.5) may be represented directly in a text block. 文本块由零个或多个字符组成,这些字符由左隔符和右分隔符括起来。字符可以用转义序列(§3.10.7)表示,但是必须在字符串文本(§3.10.5)中用转义序列表示的换行符和双引号字符可以直接在文本块中表示。

TextBlock:

"""{TextBlockWhiteSpace} LineTerminator {TextBlockCharacter}"""

TextBlockWhiteSpace:

WhiteSpace but not LineTerminator

TextBlockCharacter:

InputCharacter but not \ EscapeSequence LineTerminator

The following productions from §3.3, §3.4, and §3.6 are shown here for convenience: 为方便起见,此处显示了 §3.3、§3.4 和 §3.6 中的以下作品:

WhiteSpace:

the ASCII SP character, also known as "space" the ASCII HT character, also known as "horizontal tab" the ASCII FF character, also known as "form feed" LineTerminator

LineTerminator:

the ASCII LF character, also known as "newline" the ASCII CR character, also known as "return" the ASCII CR character followed by the ASCII LF character

InputCharacter:

UnicodeInputCharacter but not CR or LF

UnicodeInputCharacter:

UnicodeEscape RawInputCharacter

UnicodeEscape:

\ UnicodeMarker HexDigit HexDigit HexDigit HexDigit

RawInputCharacter:

any Unicode character

A text block is always of type String (§4.3.3). 文本块的类型始终为 String ( §4.3.3 )。

The opening delimiter is a sequence that starts with three double quote characters ("""), continues with zero or more space, tab, and form feed characters, and concludes with a line terminator. 开始分隔符是以三个双引号字符 ( """ 开头的序列,以零个或多个空格、制表符和表单馈向字符继续,并以行终止符结束。

The closing delimiter is a sequence of three double quote characters. 结束分隔符是三个双引号字符的序列。

The content of a text block is the sequence of characters that begins immediately after the line terminator of the opening delimiter, and ends immediately before the first double quote of the closing delimiter. 文本块的内容是字符序列,从开始分隔符的行终止符之后开始,到结束分隔符的第一个双引号之前结束。

Unlike in a string literal (§3.10.5), it is not a compile-time error for a line terminator to appear in the content of a text block. 与字符串文字 ( §3.10.5 ) 不同,行终止符出现在文本块的内容中不是编译时错误。

Example 3.10.6-1. Text Blocks

When multi-line strings are desired, a text block is usually more readable than a concatenation of string literals. For example, compare these alternative representations of a snippet of HTML: 当需要多行字符串时,文本块通常比字符串文本的串联更具可读性。例如,比较 HTML 代码段的这些替代表示形式:

String html = "<html>\n" +
              "    <body>\n" +
              "        <p>Hello, world</p>\n" +
              "    </body>\n" +
              "</html>\n";

String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
              """;

The following are examples of text blocks: 以下是文本块的示例:

class Test {
    public static void main(String[] args) {
        // The six characters w i n t e r
        String season = """
                        winter""";

        // The seven characters w i n t e r LF
        String period = """
                        winter
                        """;

        // The ten characters H i , SP " B o b " LF
        String greeting = """
                          Hi, "Bob"
                          """;

        // The eleven characters H i , LF SP " B o b " LF
        String salutation = """
                            Hi,
                             "Bob"
                            """;        

        // The empty string (zero length)
        String empty = """
                       """;      

        // The two characters " LF
        String quote = """
                       "
                       """; 

        // The two characters \ LF
        String backslash = """
                           \\
                           """;  
    }
}

Using the escape sequences \n and \" to represent a newline character and a double quote character, respectively, is permitted in a text block, though not usually necessary. The exception is where three consecutive double quote characters appear that are not intended to be the closing delimiter """ - in this case, it is necessary to escape at least one of the double quote characters in order to avoid mimicking the closing delimiter. 在文本块中允许使用转义序列 \n\" 分别表示换行符和双引号字符,但通常不是必需的。例外情况是出现三个连续的双引号字符,这些字符不打算作为右引号 """ - 在这种情况下,必须至少转义一个双引号字符,以避免模仿右引号。

Example 3.10.6-2. Escape sequences in text blocks 例 3.10.6-2.文本块中的转义序列

In the following program, the value of the story variable would be less readable if individual double quote characters were escaped: 在以下程序中,如果对单个双引号字符进行转义,则 story 变量的值的可读性将降低:

class Story1 {
    public static void main(String[] args) {
        String story = """
            "When I use a word," Humpty Dumpty said,
            in rather a scornful tone, "it means just what I
            choose it to mean - neither more nor less."
            "The question is," said Alice, "whether you
            can make words mean so many different things."
            "The question is," said Humpty Dumpty,
            "which is to be master - that's all."
        """;
    }
}

If the program is modified to place the closing delimiter on the last line of the content, then an error occurs because the first three consecutive double quote characters on the last line are translated (§3.2) into the closing delimiter """ and thus a stray double quote character remains: 如果修改程序以将结束分隔符放在内容的最后一行,则会发生错误,因为最后一行的前三个连续双引号字符 (§3.2 ) 被转换为结束分隔符 """ ,因此仍然存在一个杂散的双引号字符:

class Story2 {
    public static void main(String[] args) {
        String story = """
            "When I use a word," Humpty Dumpty said,
            in rather a scornful tone, "it means just what I
            choose it to mean - neither more nor less."
            "The question is," said Alice, "whether you
            can make words mean so many different things."
            "The question is," said Humpty Dumpty,
            "which is to be master - that's all."""";  // error
    }
}

The error can be avoided by escaping the final double quote character in the content: 可以通过转义内容中的最后一个双引号字符来避免该错误:

class Story3 {
    public static void main(String[] args) {
        String story = """
            "When I use a word," Humpty Dumpty said,
            in rather a scornful tone, "it means just what I
            choose it to mean - neither more nor less."
            "The question is," said Alice, "whether you
            can make words mean so many different things."
            "The question is," said Humpty Dumpty,
            "which is to be master - that's all.\"""";  // OK
    }
}

If a text block is intended to denote another text block, then it is recommended to escape the first double quote character of the embedded opening and closing delimiters: 如果一个文本块旨在表示另一个文本块,则建议对嵌入的开始和结束分隔符的第一个双引号字符进行转义:

class Code {
    public static void main(String[] args) {
        String text = """
            The quick brown fox jumps over the lazy dog
        """;

        String code =
            """
            String text = \"""
                The quick brown fox jumps over the lazy dog
            \""";
            """;
    }
}

The string represented by a text block is not the literal sequence of characters in the content. Instead, the string represented by a text block is the result of applying the following transformations to the content, in order: 文本块表示的字符串不是内容中字符的文字序列。相反,由文本块表示的字符串是按顺序对内容应用以下转换的结果:

  1. Line terminators are normalized to the ASCII LF character, as follows: 行终止符规范化为 ASCII LF 字符,如下所示:
    • An ASCII CR character followed by an ASCII LF character is translated to an ASCII LF character. 后跟 ASCII LF 字符的 ASCII CR 字符将转换为 ASCII LF 字符。
    • An ASCII CR character is translated to an ASCII LF character. 将 ASCII CR 字符转换为 ASCII LF 字符。
  2. Incidental white space is removed, as if by execution of String.stripIndent on the characters resulting from step 1. 删除附带的空格,就像在步骤 1 生成的字符上执行 String.stripIndent 一样。
  3. Escape sequences are interpreted, as if by execution of String.translateEscapes on the characters resulting from step 2. 转义序列被解释,就像通过对步骤 2 产生的字符执行 String.translateEscapes 一样。

When this specification says that a text block contains a particular character or sequence of characters, or that a particular character or sequence of characters is in a text block, it means that the string represented by the text block (as opposed to the literal sequence of characters in the content) contains the character or sequence of characters. 当此规范说文本块包含特定字符或字符序列,或者特定字符或字符序列位于文本块中时,这意味着文本块表示的字符串(与内容中的文字字符序列相反)包含字符或字符序列。

Example 3.10.6-3. Order of transformations on text block content 例 3.10.6-3.文本块内容的转换顺序

Interpreting escape sequences last allows programmers to use \n, \f, and \r for vertical formatting of a string without affecting the normalization of line terminators, and to use \b and \t for horizontal formatting of a string without affecting the removal of incidental white space. For example, consider this text block that mentions the escape sequence \r (CR): 最后解释转义序列允许程序员使用 \n\f\r 进行字符串的垂直格式化,而不会影响行终止符的规范化,并使用 \b\t 进行字符串的水平格式设置,而不会影响偶然空格的删除。例如,考虑以下提及转义序列 \r (CR) 的文本块:

String html = """
              <html>\r
                  <body>\r
                      <p>Hello, world</p>\r
                  </body>\r
              </html>\r
              """;

The \r escape sequences are not interpreted until after the line terminators have been normalized to LF. Using Unicode escapes to visualize LF (\u000A) and CR (\u000D), and using | to visualize the left margin, the string represented by the text block is: 在将行终止符归一化为 LF 之前,不会解释 \r 转义序列。使用 Unicode 转义来可视化 LF ( \u000A ) 和 CR ( \u000D ),并使用 | 可视化左边距,文本块表示的字符串为:

|<html>\u000D\u000A
|    <body>\u000D\u000A
|        <p>Hello, world</p>\u000D\u000A
|    </body>\u000D\u000A
|</html>\u000D\u000A

At run time, a text block is a reference to an instance of class String that denotes the string represented by the text block. 在运行时,文本块是对类 String 实例的引用,该实例表示由文本块表示的字符串。

Moreover, a text block always refers to the same instance of class String. This is because the strings represented by text blocks - or, more generally, strings that are the values of constant expressions (§15.29) - are "interned" so as to share unique instances, as if by execution of the method String.intern (§12.5). 此外,文本块总是引用类 String 的同一实例。这是因为由文本块表示的字符串 - 或者更一般地说,作为常量表达式值的字符串 ( §15.29 ) - 被“插入”以共享唯一的实例,就像通过执行方法 String.intern ( §12.5 ) 一样。

Example 3.10.6-4. Text blocks evaluate to String

例 3.10.6-4.文本块的计算结果为 String

Text blocks can be used wherever an expression of type String is allowed, such as in string concatenation (§15.18.1), in the invocation of methods on instances of String, and in annotations with String elements: 文本块可以在允许 String 类型的表达式的地方使用,例如在字符串连接 ( §15.18.1 ) 中,在 String 的实例上调用方法,以及在带有 String 元素的注释中:

System.out.println("ab" + """
                          cde
                          """);

String cde = """
             abcde""".substring(2);

String math = """
              1+1 equals \
              """ + String.valueOf(2);

@Preconditions("""
    rate > 0 &&
    rate <= MAX_REFRESH_RATE
""")
public void setRefreshRate(int rate) { ... }

总结

  • 文本块由字符和字符序列组成,并且由一对分割符号(""")括起来,始终表示为一个String对象
  • 与字符串不同的是可以出现CR、LF、双引号等字符(书写更具灵活性、和可读性)
  • 内容区域如果出现连续三个及以上的双引号需要转义处理
  • 程序书写时文本块可当作字符串对象使用
  • 文本块显示的内容不是最终程序处理的最终字符串内容,需要进行如下顺序转换
    • 1、规范化行终止符号到LF
    • 2、删除代码缩进空格
    • 3、处理转义字符

3.10.7. Escape Sequences

In character literals, string literals, and text blocks (§3.10.4, §3.10.5, §3.10.6), the escape sequences allow for the representation of some nongraphic characters without using Unicode escapes (§3.3), as well as the single quote, double quote, and backslash characters. 在字符文字、字符串文字和文本块 ( §3.10.4 , §3.10.5 , §3.10.6 ) 中,转义序列允许在不使用 Unicode 转义 ( §3.3 ) 的情况下表示一些非图形字符,以及单引号、双引号和反斜杠字符。

EscapeSequence:

\ b (backspace BS, Unicode \u0008) \ s (space SP, Unicode \u0020) \ t (horizontal tab HT, Unicode \u0009) \ n (linefeed LF, Unicode \u000a) \ f (form feed FF, Unicode \u000c) \ r (carriage return CR, Unicode \u000d) \ LineTerminator (line continuation, no Unicode representation) \ " (double quote ", Unicode \u0022) \ ' (single quote ', Unicode \u0027) \ \ (backslash \, Unicode \u005c) OctalEscape (octal value, Unicode \u0000 to \u00ff)

OctalEscape:

\ OctalDigit \ OctalDigit OctalDigit \ ZeroToThree OctalDigit OctalDigit

OctalDigit:

(one of) 0 1 2 3 4 5 6 7

ZeroToThree:

(one of) 0 1 2 3

The OctalDigit production above comes from §3.10.1. Octal escapes are provided for compatibility with C, but can express only Unicode values \u0000 through \u00FF, so Unicode escapes are usually preferred. 上面的 OctalDigit 生产来自 §3.10.1 。提供八进制转义是为了与 C 兼容,但只能表示 Unicode 值 \u0000\u00FF ,因此通常首选 Unicode 转义。

It is a compile-time error if the character following a backslash in an escape sequence is not a LineTerminator or an ASCII b, s, t, n, f, r, ", ', \, 0, 1, 2, 3, 4, 5, 6, or 7. 如果转义序列中反斜杠后面的字符不是 LineTerminator 或 ASCII bstnfr"'\01234567 。这将会产生编译错误

An escape sequence in the content of a character literal, string literal, or text block is interpreted by replacing its \ and trailing character(s) with the single character denoted by the Unicode escape in the EscapeSequence grammar. The line continuation escape sequence has no corresponding Unicode escape, so is interpreted by replacing it with nothing. 通过将字符文本、字符串文本或文本块的内容中的转义序列替换为 EscapeSequence 语法中由 Unicode 转义表示的单个字符来解释其 \ 和尾随字符。行继续转义序列没有相应的 Unicode 转义序列,因此通过将其替换为 nothing 来解释。

The line continuation escape sequence can appear in a text block, but cannot appear in a character literal or a string literal because each disallows a LineTerminator. 行继续转义序列可以出现在文本块中,但不能出现在字符文本或字符串文本中,因为两者都不允许 LineTerminator。

总结

  • 转义序列可以在字符、字符串以及文本块中,表示一些非图形的字符
  • 在这些内容中如果\字符后面没有一个合法的字符组成转义字符将会提示编译错误
  • 通常对于这些非图形的字符表示推荐首选Unicode转义
  • 行继续转义可以出现在文本块中,但是不允许出现字符和字符串文本中

3.10.8. The Null Literal

The null type has one value, the null reference, represented by the null literal null, which is formed from ASCII characters. null 类型有一个值,即 null 引用,由 null 文本 null 表示,该文本由 ASCII 字符组成。

NullLiteral:

null

A null literal is always of the null type (§4.1). 空文字始终是空类型 ( §4.1 )。

3.11. Separators

Twelve tokens, formed from ASCII characters, are the separators (punctuators). 由 ASCII 字符组成的 12 个标记是分隔符(标点符号)。

Separator:

(one of)

(   )   {   }   [   ]   ;   ,   .   ...   @   ::

3.12. Operators

38 tokens, formed from ASCII characters, are the operators. 由 ASCII 字符组成的 38 个标记是运算符。

Operator:

(one of)

=   >   <   !   ~   ?   :   ->
==  >=  <=  !=  &&  ||  ++  --
+   -   *   /   &   |   ^   %   <<   >>   >>>
+=  -=  *=  /=  &=  |=  ^=  %=  <<=  >>=  >>>=