[smali]String/StringBuilder字符串拼接操作

2,689 阅读7分钟

相关demo源码;

基于: macOs:10.13/AS:3.3.2/Android build-tools:28.0.0/jdk: 1.8

1. 缘由

这两天在看 smali, 偶然看到 log 语句中的 String 拼接被优化为了 StringBuilder, 代码如下;

// MainActivity.java
public class MainActivity extends AppCompatActivity implements View.OnClickListener {
    private static final String TAG = "MainActivity";
    private void methodBoolean(boolean showLog) {
        Log.d(TAG, "methodBoolean: " + showLog);
    }
}
# 对应的 smali 代码
.method private methodBoolean(Z)V
    .locals 3
    .param p1, "showLog"    # Z

    .line 51
    const-string v0, "MainActivity" # 定义 TAG 变量值
    new-instance v1, Ljava/lang/StringBuilder; # 创建了一个 StringBuilder
    invoke-direct {v1}, Ljava/lang/StringBuilder;-><init>()V

    # 定义 Log msg参数中第一部分字符串字面量值
    const-string v2, "methodBoolean: "

    # 拼接并输出 String 存入 v1 寄存器中
    invoke-virtual {v1, v2}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    invoke-virtual {v1, p1}, Ljava/lang/StringBuilder;->append(Z)Ljava/lang/StringBuilder;
    invoke-virtual {v1}, Ljava/lang/StringBuilder;->toString()Ljava/lang/String;
    move-result-object v1

    # 调用 Log 方法打印日志
    invoke-static {v0, v1}, Landroid/util/Log;->d(Ljava/lang/String;Ljava/lang/String;)I
    .line 52
    return-void
.end method

想起以前根深蒂固的 "大量字符串拼接时 StringBuilderString 性能更好" 的说法, 顿时好奇是否真是那样, 是否所有场景都那样, 所以想探究下, 简单起见, 源码用 Java 而非 Kotlin 编写;

2. 测试

既然底层会优化为 StringBuilder 那拼接还会有效率差距吗? 测试下

public class MainActivity extends AppCompatActivity implements View.OnClickListener {
    /**
     * String循环拼接测试
     *
     * @param loop 循环次数
     * @param base 拼接字符串
     * @return 耗时, 单位: ms
     */
    private long methodForStr(int loop, String base) {
        long startTs = System.currentTimeMillis();
        String result = "";
        for (int i = 0; i < loop; i++) {
            result += base;
        }
        return System.currentTimeMillis() - startTs;
    }

    /**
     * StringBuilder循环拼接测试
     */
    @Keep
    private long methodForSb(int loop, String base) {
        long startTs = System.currentTimeMillis();
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < loop; i++) {
            sb.append(base);
        }
        String result = sb.toString();
        return System.currentTimeMillis() - startTs;
    }
}

在三星s8+ 上循环拼接 5000 次 smali 字符串,得到两者的耗时大概为 460ms:1ms, 效率差距明显;

3. smali 循环拼接代码分析

既然 String 拼接会转化为 StringBuilder, 理论上来说应该差距不大才对,但实际差距明显, 猜想可能跟for循环有关,我们看下 methodForStr(int loop, String base) 方法的smali代码:

.method private methodForStr(ILjava/lang/String;)J
    .locals 5
    .param p1, "loop"    # I 表示参数 loop
    .param p2, "base"    # Ljava/lang/String;

    .line 73
    invoke-static {}, Ljava/lang/System;->currentTimeMillis()J # 获取循环起始时间戳

    move-result-wide v0

    .line 74
    .local v0, "startTs":J # v0表示 局部变量 startTs ,类型为 long
    const-string v2, ""

    .line 75
    .local v2, "result":Ljava/lang/String; # v2 表示局部变量 result
    const/4 v3, 0x0 # 定义for循环变量 i 的初始化

    .local v3, "i":I
    :goto_0  # for循环体起始处
    if-ge v3, p1, :cond_0  # 若 i >= loop 值,则跳转到 cond_0 标签处,退出循环,否则继续执行下面的代码

    # 以下为for循环体逻辑:
    # 1. 创建 StringBuilder 对象
    # 2. 拼接 result + base 字符串, 然后通过 toString() 得到拼接结果
    # 3. 将结果再赋值给 result 变量
    # 4. 进入下一轮循环
    .line 76
    new-instance v4, Ljava/lang/StringBuilder;
    invoke-direct {v4}, Ljava/lang/StringBuilder;-><init>()V

    invoke-virtual {v4, v2}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    invoke-virtual {v4, p2}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    invoke-virtual {v4}, Ljava/lang/StringBuilder;->toString()Ljava/lang/String;

    move-result-object v2

    # for 循环变量i自加1,然后进行下一轮循环
    .line 75
    add-int/lit8 v3, v3, 0x1 #  将第二个寄存器v3中的值加上0x1,然后放入第一个寄存器v3中, 实现自增长

    goto :goto_0 # 跳转到 goto_0 标签,即: 重新计算循环条件, 执行循环体

    .line 78
    .end local v3    # "i":I
    :cond_0 # 定义标签 cond_0

    # 循环结束后,获取当前时间戳, 并计算耗时
    invoke-static {}, Ljava/lang/System;->currentTimeMillis()J
    move-result-wide v3
    sub-long/2addr v3, v0

    return-wide v3
.end method

根据上面的 smali 代码,可以逆推出其源码应该为:

private long methodForStr(int loop, String base) {
    long startTs = System.currentTimeMillis();
    String result = "";
    for (int i = 0; i < loop; i++) {
        // 每次都在循环体中将 String 的拼接改成了 StringBuilder
        // 这算是负优化吗?
        StringBuilder sb = new StringBuilder();
        sb.append(result);
        sb.append(base);
        result = sb.toString();
    }
    return System.currentTimeMillis() - startTs;
}

4. 源码分析

4.1 String.java

/*
 * Strings are constant; their values cannot be changed after they
 * are created. String buffers support mutable strings.
 * Because String objects are immutable they can be shared
 * */
public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
        // String实际也是char数组,但由于其用private final修饰,所以不可变(当然,还有其他措施共同保证"不可变")
        private final char value[];
    }

类注释描述了其为 immutable ,每个字面量都是一个对象,修改string时,不会在原内存处进行修改,而是重新指向一个新对象:

String str = "a"; // String对象 "a"
str = "a" + "a"; // String对象 "aa"

每次进行 + 运算时,都会生成一个新的 String 对象:

string追加

// 结合第3部分的smali分析,可以发现:
// 每次for循环体中,都会创建一个 `StringBuilder`对象,并生成拼接结果的 `String` 对象;
private long methodForStr(int loop, String base) {
    long startTs = System.currentTimeMillis();
    String result = "";
    for (int i = 0; i < loop; i++) {
        result += base;
    }
    return System.currentTimeMillis() - startTs;
}

在循环体中频繁的创建对象,还会导致大量对象被废弃,触发GC,频繁 stop the world 自然也会导致拼接耗时加长, 如下图:

string拼接gc

4.2 StringBuilder.java

/**
 * A mutable sequence of characters.  This class provides an API compatible
 * with {@code StringBuffer}, but with no guarantee of synchronization.
 * */
public final class StringBuilder
    extends AbstractStringBuilder
    implements java.io.Serializable, CharSequence{}

// StringBuilder 的类注释指明了其实际为一个可变字符数组, 核心逻辑其实都实现在 AbstractStringBuilder 中了
// 我们看下 stringBuilder.append("str") 是怎么实现的
abstract class AbstractStringBuilder implements Appendable, CharSequence {
    char[] value; // 用于实际存储字符串对应的字符序列
    int count; // 已存储的字符个数

    AbstractStringBuilder() {
    }

    // 提供一个合理的初始化容量大小, 有助于减小扩容次数,提高效率
    AbstractStringBuilder(int capacity) {
        value = new char[capacity];
    }

    @Override
    public AbstractStringBuilder append(CharSequence s) {
        if (s == null)
            return appendNull();
        if (s instanceof String)
            return this.append((String)s);
        if (s instanceof AbstractStringBuilder)
            return this.append((AbstractStringBuilder)s);

        return this.append(s, 0, s.length());
    }

    public AbstractStringBuilder append(String str) {
        if (str == null)
            return appendNull();
        int len = str.length();
        ensureCapacityInternal(count + len); // 确保value数组有足够的空间可以存储变量str的所有字符
        str.getChars(0, len, value, count); // 提取变量str中的所有字符,并追加复制到value数组的最后
        count += len;
        return this;
    }

    // 如果当前value数组容量不够,进行自动扩容: 创建新数组,并复制原数组数据
    private void ensureCapacityInternal(int minimumCapacity) {
        if (minimumCapacity - value.length > 0) {
            value = Arrays.copyOf(value,
                    newCapacity(minimumCapacity));
        }
    }
}

// String.java
public final String{
    // 从当前字符串中复制指定区间的字符到数组dst dstBegin位后
    public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
        // 省略部分判断代码
        getCharsNoCheck(srcBegin, srcEnd, dst, dstBegin);
    }

    @FastNative
    native void getCharsNoCheck(int start, int end, char[] buffer, int index);
}

从上面源码可以看出 StringBuilder 每次 append 字符串时,都是在操作同一个 char[] 数组(无需扩容时),不涉及对象的创建;

stringBuilder数组操作

5. 是不是所有字符串拼接场景都该首选 StringBuilder ?

也不尽然, 比如有些是编译时常量, 直接用 String 就可以, 即使用 StringBuilder , AS也会提示改为 String 不然反倒浪费;

对于非循环拼接字符串的场景, 源码是用 String 或者 StringBuilder 没啥区别, 字节码中都转换成 StringBuilder 了;

建议StringBuilder转String

    //  编译时常量测试
    private String methodFixStr() {
        return "a" + "a" + "a" + "a" + "a" + "a";
    }

    private String methodFixSb() {
        StringBuilder sb = new StringBuilder();
        sb.append("a");
        sb.append("a");
        sb.append("a");
        sb.append("a");
        sb.append("a");
        return sb.toString();
    }

对应的smali代码:

.method private methodFixStr()Ljava/lang/String;
    .locals 1

    .line 100
    const-string v0, "aaaaaa" # 编译器直接优化成最终结果了

    return-object v0
.end method

# stringBuilder就没有优化,还是要一步一步进行拼接
# 这也就是 IDE 提示使用 String 的原因吧
.method private methodFixSb()Ljava/lang/String;
    .locals 2

    .line 108
    new-instance v0, Ljava/lang/StringBuilder;
    invoke-direct {v0}, Ljava/lang/StringBuilder;-><init>()V

    .line 109
    .local v0, "sb":Ljava/lang/StringBuilder;
    const-string v1, "a"

    invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;

    .line 110
    const-string v1, "a"
    invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;

    .line 111
    const-string v1, "a"
    invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;

    .line 112
    const-string v1, "a"
    invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;

    .line 113
    const-string v1, "a"
    invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;

    .line 114
    invoke-virtual {v0}, Ljava/lang/StringBuilder;->toString()Ljava/lang/String;

    move-result-object v1
    return-object v1
.end method