String

类图

成员变量

    /**
     * 存储字符，被 final 修饰，无法修改
     */
    private final char value[];

    /**
     * 存储 String 的 hashcode
     */
    private int hash; // Default to 0

String 类的成员变量主要是上面两个。

常用构造方法

`public String( )`

public String() {
		this.value = "".value;
}

这里直接用 "".value 赋值，而 value 是 "" 这个 String 对象的私有成员变量，为什么可以直接访问呢？

因为 java 的访问控制符是基于类的，而不是基于对象的。所以在同一个类中，可以访问该类不同对象的私有成员变量。

`public String(String original)`

public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
}

这种方式创建出来的字符串实际上是 original 的一份拷贝，新字符串的 value 变量与 original 字符串的 value 变量是同一个内存地址的对象。所以，如果不需要显示拷贝的情况下，没有必要使用这种方式创建对象。

`public String(char value[])`

public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
}

根据字符数组创建字符串，这里使用 Arrays.copyOf 方法可以防止对 value 字符数组的修改影响到创建出来的字符串中的 value 数组。

`public String(char value[], int offset, int count)`

    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

这个方法与上面的构造方法类似，最后给 value 赋值使用的 Arrays.copyOfRange 方法来进行指定范围的拷贝。

常用方法

`public String substring(int beginIndex, int endIndex)`

    public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

该方法用来获取子字符串，截取范围为 [beginIndex, endIndex)，即包括起始索引，不包括终止索引。

最后返回的新字符串使用的 public String(char value[], int offset, int count) 来构造。

`public boolean equals(Object anObject)`

    public boolean equals(Object anObject) {
        // 直接比较内存地址
        if (this == anObject) {
            return true;
        }
        // 判断 anObject 是否属于 String 类
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            // 比较长度是否相等
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                // 逐位判断值是否相等
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

String 的 equals 方法是一个经典的 Object 类的重写方法，其操作主要包括四个步骤

比较两个对象内存地址是否相同（Object 中的 equals 方法实现）
判断传入对象是否属于 String 类
比较长度是否相等
通过循环逐位比较相同索引的值是否相等

`public String replace(char oldChar, char newChar)`

    public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */

            //循环判断字符串中是否有需要被替换的字符
            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            //如果有需要被替换的字符串，则进入该过程
            if (i < len) {
                // 构造新的字符数据 buf，放入已经遍历过的字符
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                // 如果字符串没有全部被遍历，继续遍历；当索引 i 位置上的元素等于 oldChar 时替换为 newChar
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

该方法替换字符步骤如下：

在 while 循环中判断原字符串中是否有需要被替换的字符 oldChar
如果原字符串中有 oldChar，则进入新字符串构建过程
新建 buf[] 数组，将原字符串已经遍历的不等于 oldChar 的字符放入其中
如果原字符串没有全部被遍历，则继续遍历；当索引 i 位置上的元素等于 oldChar 时替换为 newChar
根据新构建的 buf[] 数组返回新的字符串对象

这里只介绍了参数为 char 的字符替换，参数为 String 的替换都是使用 正则表达式 来匹配并替换的。

`public String[] split(String regex)`

    public String[] split(String regex) {
        return split(regex, 0);
    }
    
    public String[] split(String regex, int limit) {
        /* fastpath if the regex is a
         (1)one-char String and this character is not one of the
            RegEx's meta characters ".$|()[{^?*+\\", or
         (2)two-char String and the first char is the backslash and
            the second is not the ascii digit or ascii letter.
         */
        char ch = 0;
        if ((
                // 字符长度为 1 时，匹配是否是特殊字符
                (regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
                // 字符长度为 2 时，匹配第一个字符为'\'，第二个字符非字母与数字
                (regex.length() == 2 && regex.charAt(0) == '\\' && (((ch = regex.charAt(1)) - '0') | ('9' - ch)) < 0
                        && ((ch - 'a') | ('z' - ch)) < 0 && ((ch - 'A') | ('Z' - ch)) < 0))
                // 匹配是否是字符范围
                && (ch < Character.MIN_HIGH_SURROGATE || ch > Character.MAX_LOW_SURROGATE)) {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            // 遍历 String，将分割的部分分别加入 list 中
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // 没有匹配到字符
            if (off == 0) {
                return new String[]{this};
            }

            // list 添加留下来的部分
            if (!limited || list.size() < limit) {
                list.add(substring(off, value.length));
            }

            // 构造结果
            int resultSize = list.size();
            if (limit == 0) {
                //移除尾部空字符串
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        // 其余情况，使用正则表达式来处理
        return Pattern.compile(regex).split(this, limit);
    }

具体步骤都在方法注释上，关注遍历 String 的操作

            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }

当 regex 为单个字符时，已遍历字符索引为 off，next 为 regex 出现的索引。当有元素匹配上 regex 时，off = next + 1，而当有两个连续的 regex 字符出现时，也会出现 next = next + 1。此时 next = off，substring(off, next) 为空字符串。

所以，若字符串中出现连续的单一字符 regex N 次，则后面的 N - 1 个 regex 会导致结果中出现 N - 1 个空字符串。

在 regex 长度大于一时，正常匹配的处理过程也会将连续的 regex处理成空字符串。

其他方法

`public native String intern()`

public native String intern();

intern 在开发中基本上不会使用到，但是在方法分析中经常遇到。

intern 方法的作用在 jdk 的注释中已经解释的很清楚了。

当字符串已经存在常量池中时，返回该字符串在常量池中的内存地址；

如果字符串在常量池中不存在时，将该字符串加入常量池，再返回其在常量池中的内存地址。

用一段代码来解释：

①        String s1 = "Hello";
②        String s2 = "Hello";
③        String s3 = new String("Hello");
④        System.out.println(s1 == s2);//true
⑤        System.out.println(s1 == s3);//false
⑥        s3 = s3.intern();
⑦        System.out.println(s1 == s3);//true

第一步，在栈中声明了一个变量 s1，在常量池中加入了字符串 "Hello"。

第二步，在栈中声明了一个变量 s2，指向常量池中的 "Hello"。

第三步，在堆中创建了一个对象，对象指向常量池中的 "Hello"，栈中声明的变量 s3 指向的是堆中的对象。

因此，s1 == s2 为 true，而 s1 == s3 为 false。

第六步调用了 s3 = s3.intern()，相当于获取了常量池中 "Hello" 的内存地址，并使 s3 指向它。

因此第七步，s1 == s3 会输出 true。

String 的不变性

public final class String
        implements java.io.Serializable, Comparable<String>, CharSequence {

    /**
     * 存储字符，被 final 修饰，无法修改
     */
    private final char value[];

String 被 final 修饰，说明该类不能被继承。

String 中保存数据的是 char 数组 value，value 也被 final 修饰，所以当 String 被赋值之后，内存地址无法再修改。即使可以改变 value 数组中的值，但是 value 被 private 修饰，内部也没有开放对 value 修改的方法，所以 value 产生后，内存地址无法修改。

以上两点确定了 String 的不变性。

总结

String 在开发中使用很简单，但是在算法中会使用到很多它的方法，所以要清楚一些细节，比如 substring 方法的范围开闭问题，比如 split 方法出现连续匹配的问题。

String 中一些方法的返回结果中，有一些是还需要我们自行处理的，比如 split 方法出现的空字符串问题，我们可以使用 Guava 中关于字符串的一些工具类来处理，来得到符合我们需求的结果。

Java 常用类源码解析——String

String

类图

成员变量

常用构造方法

public String( )

public String(String original)

public String(char value[])

public String(char value[], int offset, int count)

常用方法

public String substring(int beginIndex, int endIndex)

public boolean equals(Object anObject)

public String replace(char oldChar, char newChar)

public String[] split(String regex)