记录一次String.substring()问题引言让我们先来看一串代码可以看一下最后关于这个戒断的处理结果：打印出

引言

让我们先来看一串代码

public class Main {
    public static void main(String[] args) {
        String modelOne = "hello";
        System.out.println("modelOne:" + modelOne);

        String modelTwo = "hello👋";
        System.out.println("modelTwo:" + modelTwo);

        // 截断
        String substring = modelTwo.substring(0, modelTwo.length()-1);
        System.out.println("substring:" + substring);
    }
}

可以看一下最后关于这个戒断的处理结果：打印出来的结果是

modelOne:hello
modelTwo:hello👋
substring:hello?

那么为什么会这样呢？？？

分析

首先看下String.length()的源码，其实这个长度是String值的char数组的长度而已。（String的值其实都是存储在这个char[] value 中的）

/**
 * Returns the length of this string.
 * The length is equal to the number of <a href="Character.html#unicode">Unicode
 * code units</a> in the string.
 *
 * @return  the length of the sequence of characters represented by this
 *          object.
 */
public int length() {
    return value.length;
}

debug看下这个value值。可以看到这个特殊字符👋实际上用了两个4位16进制去表示。那么我们用substring去做String类型的截断的时候，我们可能会把特殊字符截断，导致最后String乱码。

如何解决？

String提供了一个offsetByCodePoints方法可以用于处理Unicode代码点，返回值即为代码点位置（可以准确的给出完整的“特殊字符”而不会被中间截断）

public int offsetByCodePoints(int index, int codePointOffset) {
    if (index < 0 || index > value.length) {
        throw new IndexOutOfBoundsException();
    }
    return Character.offsetByCodePointsImpl(value, 0, value.length,
            index, codePointOffset);
}

实践

注意：这种写法可能会导致输出的String.length()长度大于设定的limit

/**
 * @param model 字符串
 * @param limit 最大支持长度
 * @return 截断后的字符
 */
private String safeSubstring(String model, int limit) {
    if (model == null || limit <= 0) {
        return "";
    }
    int end = model.offsetByCodePoints(0, limit);
    return model.substring(0, end);
}