Golang: String, rune and byte

48 阅读2分钟

开启掘金成长之旅!这是我参与「掘金日新计划 · 2 月更文挑战」的第 3 天,点击查看活动详情


字符串的遍历

字符串可以用两种遍历方式:for loop和range;

func main() {
	s := "hello"
	for index, word := range s {
		fmt.Printf("On Index %d, Value Type is %T, Value is %v\n", index, word, word)
	}
	fmt.Println("---------------------------------")
	for i := 0; i < len(s); i++ {
		fmt.Printf("On Index %d, Value Type is %T, Value is %v\n", i, s[i], s[i])
	}

}

输出:

On Index 0, Value Type is int32, Value is 104
On Index 1, Value Type is int32, Value is 101
On Index 2, Value Type is int32, Value is 108
On Index 3, Value Type is int32, Value is 108
On Index 4, Value Type is int32, Value is 111
---------------------------------
On Index 0, Value Type is uint8, Value is 104
On Index 1, Value Type is uint8, Value is 101
On Index 2, Value Type is uint8, Value is 108
On Index 3, Value Type is uint8, Value is 108
On Index 4, Value Type is uint8, Value is 111

Go中没有character数据类型,字符使用byte和rune来表示,byte是uint8的别名,rune是int32的别名;

以上的示例可以看到,在for loop中,值的类型是uint8,而range中,值类型的int32,这是为什么呢?

在内存中,字符串存储为只读的byte切片,所以使用for loop 访问s[i]时,值类型是uint8(byte是uint8的别名); 而使用range遍历字符串时,range对每个rune和它在字符串中的偏移量进行解码。第一个值是的起始字节索引,第二个是rune本身。rune代表一个Unicode码点;

偏移量?懵=.=

看下一个例子

        s := "你好"
	s1 := "Hello"
	for index, word := range s {
		fmt.Printf("On Index %d, Value Type is %T,%#U, Value is %x\n", index, word, word, word)
	}
	fmt.Println("---------------------------------")
	for index, word := range s1 {
		fmt.Printf("On Index %d, Value Type is %T,%#U, Value is %x\n", index, word, word, word)
	}

输出

On Index 0, Value Type is int32,U+4F60 '你', Value is 4f60
On Index 3, Value Type is int32,U+597D '好', Value is 597d
---------------------------------
On Index 0, Value Type is int32,U+0048 'H', Value is 48
On Index 1, Value Type is int32,U+0065 'e', Value is 65
On Index 2, Value Type is int32,U+006C 'l', Value is 6c
On Index 3, Value Type is int32,U+006C 'l', Value is 6c
On Index 4, Value Type is int32,U+006F 'o', Value is 6f

Go 字符串文字是 UTF-8 编码的文本,对于英文字符H、e、l、l、o,每个字符可以用一个字节存储;'你'、'好'每个字符需要3个字节存储;解码下一个字符需要“跳跃”的字节数就是这里所说的偏移量。

补充内容:

string byte rune的转换方法

string to byte

  • for loop
  • []byte(str)

byte to string

  • string([]byte)

string to rune

  • range
  • []rune(str)

rune to string string([]rune)