gcc 内联汇编

前言

本文已参与「新人创作礼」活动，一起开启掘金创作之路。

gcc 内联汇编介绍，转载 blog.csdn.net/lhf_tiger/a…

对于基于ARM的RISC处理器，GNU C编译器提供了在C代码中内嵌汇编的功能。这种非常酷的特性提供了C代码没有的功能，比如手动优化软件关键部分的代码、使用相关的处理器指令。这里设想了读者是熟练编写ARM汇编程序读者，因为该片文档不是ARM汇编手册。同样也不是C语言手册。这篇文档假设使用的是GCC 4 的版本，但是对于早期的版本也有效。

一、GCC asm 声明

小例子

让我们以一个简单的例子开始。就像C中的声明一样，下面的声明代码可能出现在你的代码中。

/* NOP 例子 */
asm("mov r0,r0");

==该语句的作用是将r0移动到r0中。换句话讲他并不干任何事。典型的就是NOP指令，作用就是短时的延时。==

请接着阅读和学习这篇文档，因为该声明并不像你想象的和其他的C语句一样。内嵌汇编使用汇编指令就像在纯汇编程序中使用的方法一样。可以在一个asm声明中写多个汇编指令。但是为了增加程序的可读性，最好将每一个汇编指令单独放一行。

asm(
"mov r0, r0\n\t"
"mov r0, r0\n\t"
"mov r0, r0\n\t"
"mov r0, r0"
);

换行符和制表符的使用可以使得指令列表看起来变得美观。你第一次看起来可能有点怪异，但是当C编译器编译C语句的是候，它就是按照上面（换行和制表）生成汇编的。到目前为止，汇编指令和你写的纯汇编程序中的代码没什么区别。但是对比其它的C声明，asm的常量和寄存器的处理是不一样的。通用的内嵌汇编模版是这样的。

asm(code : output operand list : input operand list : clobber list);

==汇编和C语句这间的联系是通过上面asm声明中可选的output operand list和input operand list。== Clobber list后面再讲。

下面是将C语言的一个整型变量传递给汇编，逻辑左移一位后在传递给C语言的另外一个整型变量。

/* Rotating bits example */
asm("mov %[result], %[value], ror #1" : [result] "=r" (y) : [value] "r" (x));

每一个asm语句被冒号（:）分成了四个部分。汇编指令放在第一部分中的" "中间。

"mov %[result], %[value], ror #1"

==接下来是冒号后的可选择的output operand list，每一个条目是由一对[]（方括号）和被他包括的符号名组成，它后面跟着限制性字符串，再后面是圆括号和它括着的C变量。这个例子中只有一个条目。==

[result] "=r" (y)

接着冒号后面是输入操作符列表，它的语法和输入操作列表一样

[value] "r" (x)

破坏符列表，在本例中没有使用

==就像上面的NOP例子，asm声明的4个部分中，只要最尾部没有使用的部分都可以省略。但是有有一点要注意的是，上面的4个部分中只要后面的还要使用，前面的部分没有使用也不能省略，必须空但是保留冒号。==

下面的一个例子就是设置ARM Soc的CPSR寄存器，它有input但是没有output operand。

asm("msr cpsr,%[ps]" : : [ps]"r"(status))

==即使汇编代码没有使用，代码部分也要保留空字符串。下面的例子使用了一个特别的破坏符，目的就是告诉编译器内存被修改过了。== 这里的破坏符在下面的优化部分在讲解。

asm("":::"memory");

为了增加代码的可读性，你可以使用换行，空格，还有C风格的注释。

asm("mov %[result], %[value], ror #1"

           : [result]"=r" (y) /* Rotation result. */
           : [value]"r" (x) /* Rotated value. */
           : /* No clobbers */
);

在代码部分%后面跟着的是后面两个部分方括号中的符号，它指的是相同符号操作列表中的一个条目。 %[result]表示第二部分的C变量y，%[value]表示三部分的C变量x；符号操作符的名字使用了独立的命名空间。这就意味着它使用的是其他的符号表。简单一点就是说你不必关心使用的符号名在C代码中已经使用了。在早期的C代码中，循环移位的例子必须要这么写：

asm("mov %0, %1, ror #1" : "=r" (result) : "r" (value))

在汇编代码中操作数的引用使用的是%后面跟一个数字，%1代表第一个操作数，%2代码第二个操作数，往后的类推。这个方法目前最新的编译器还是支持的。但是它不便于维护代码。试想一下，你写了大量的汇编指令的代码，要是你想插入一个操作数，那么你就不得不从新修改操作数编号。

二、优化C代码

有两种情况决定了你必须使用汇编。

C限制了你更加贴近底层操作硬件，比如，C中没有直接修改程序状态寄存器（PSR）的声明。
就是要写出更加优化的代码。毫无疑问GNU C代码优化器做的很好，但是他的结果和我们手工写的汇编代码相差很远。

这一部分有一点很重要，也是被别人忽视最多的就是：我们在C代码中通过内嵌汇编指令添加的汇编代码，也是要被C编译器的优化器处理的。让我们下面做个试验来看看吧。下面是代码实例。

bigtree@just:~/embedded/basic-C$ arm-linux-gcc -c test.c
bigtree@just:~/embedded/basic-C$ arm-linux-objdump -D test.o
00309DE5    ldr   r3, [sp, #0]    @ x, x
E330A0E1    mov   r3, r3, ror #1  @ tmp, x
04308DE5    str   r3, [sp, #4]    @ tmp, y

编译器选择r3作为循环移位使用。它也完全可以选择为每一个C变量分配寄存器。Load或者store一个值并不显式的进行。下面是其它编译器的编译结果。

E420A0E1 mov r2, r4, ror #1 @ y, x

编译器为每一个操作数选择一个相应的寄存器，将操作过的值cache到r4中，然后传递该值到r2中。这个过程你能理解不？

有的时候这个过程变得更加糟糕。有时候编译器甚至完全抛弃你嵌入的汇编代码。C编译器的这种行为，取决于代码优化器的策略和嵌入汇编所处的上下文。如果在内嵌汇编语句中不使用任何输出部分，那么C代码优化器很有可能将该内嵌语句完全删除。比如NOP例子，我们可以使用它作为延时操作，但是对于编译器认为这影响了程序的执行速速，认为它是没有任何意义的。

上面的解决方法还是有的。那就是使用volatile关键字。它的作用就是禁止优化器优化。将NOP例子修改过后如下：

/* NOP example, revised */
asm volatile("mov r0, r0");

下面还有更多的烦恼等着我们。一个设计精细的优化器可能重新排列代码。看下面的代码：

i++;
if (j == 1)
x += 3;
i++;

优化器肯定是要从新组织代码的，两个i++并没有对if的条件产生影响。更进一步的来讲，i的值增加2，仅仅使用一条ARM汇编指令。因而代码要重新组织如下：

if (j == 1)
    x += 3;
i += 2;

这样节省了一条ARM指令。结果是：这些操作并没有得到许可。</font

这些将对你的代码产生很到的影响，这将在下面介绍。下面的代码是c乘b，其中c和b中的一个或者两个可能会被中断处理程序修改。进入该代码前先禁止中断，执行完该代码后再开启中断。

asm volatile("mrs r12, cpsr\n\t"
    "orr r12, r12, #0xC0\n\t"
    "msr cpsr_c, r12\n\t" ::: "r12", "cc");
c *= b; /* This may fail. */
asm volatile("mrs r12, cpsr\n"
    "bic r12, r12, #0xC0\n"
    "msr cpsr_c, r12" ::: "r12", "cc");

但是不幸的是针对上面的代码，优化器决定先执行乘法然后执行两个内嵌汇编，或相反。这样将会使得我们的代码变得毫无意义。

我们可以使用clobber list帮忙。上面例子中的clobber list如下：

"r12", "cc"

上面的clobber list将会将向编译器传达如下信息，修改了r12和程序状态寄存器的标志位。Btw，直接指明使用的寄存器，将有可能阻止了最好的优化结果。通常你只要传递一个变量，然后让编译器自己选择适合的寄存器。另外寄存器名，cc（condition registor 状态寄存器标志位），memory都是在clobber list上有效的关键字。它用来向编译器指明，内嵌汇编指令改变了内存中的值。这将强迫编译器在执行汇编代码前存储所有缓存的值，然后在执行完汇编代码后重新加载该值。这将保留程序的执行顺序，因为在使用了带有memory clobber的asm声明后，所有变量的内容都是不可预测的。

asm volatile("mrs r12, cpsr\n\t"
    "orr r12, r12, #0xC0\n\t"
    "msr cpsr_c, r12\n\t" :: : "r12", "cc", "memory");
c *= b; /* This is safe. */
asm volatile("mrs r12, cpsr\n"
    "bic r12, r12, #0xC0\n"
    "msr cpsr_c, r12" ::: "r12", "cc", "memory");

使所有的缓存的值都无效，只是局部最优（suboptimal）。你可以有选择性的添加dummy operand 来人工添加依赖。

asm volatile("mrs r12, cpsr\n\t"
    "orr r12, r12, #0xC0\n\t"
    "msr cpsr_c, r12\n\t" : "=X" (b) :: "r12", "cc");
c *= b; /* This is safe. */
asm volatile("mrs r12, cpsr\n"
    "bic r12, r12, #0xC0\n"
    "msr cpsr_c, r12" :: "X" (c) : "r12", "cc");

上面的第一个asm试图修改变量先b，第二个asm试图修改c。这将保留三个语句的执行顺序，而不要使缓存的变量无效。理解优化器对内嵌汇编的影响很重要。如果你读到这里还是云里雾里，最好是在看下个主题之前再把这段文章读几遍^_^。

#三、 Input and output operands

前面我们学到，每一个input和output operand，由被方括号[]中的符号名，限制字符串，圆括号中的C表达式构成。这些限制性字符串有哪些，为什么我们需要他们？你应该知道每一条汇编指令只接受特定类型的操作数。例如：跳转指令期望的跳转目标地址。不是所有的内存地址都是有效的。因为最后的opcode只接受24位偏移。但矛盾的是跳转指令和数据交换指令都希望寄存器中存储的是32位的目标地址。在所有的例子中，C传给operand的可能是函数指针。所以面对传给内嵌汇编的常量、指针、变量，编译器必须要知道怎样组织到汇编代码中。

对于ARM核的处理器，GCC 4 提供了以下的限制。

Constraint	Usage in ARM state	Usage in Thumb state
f	Floating point registers f0 .. f7	Not available
h	Not available	Registers r8..r15
G	Immediate floating point constant	Not available
H	Same a G, but negated	Not available
I	Immediate value in data processing instructions e.g. ORR R0, R0, #operand	Constant in the range 0 .. 255 e.g. SWI operand
J	Indexing constants -4095 .. 4095 e.g. LDR R1, [PC, #operand]	Constant in the range -255 .. -1 e.g. SUB R0, R0, #operand
K	Same as I, but inverted	Same as I, but shifted
L	Same as I, but negated	Constant in the range -7 .. 7 e.g. SUB R0, R1, #operand
l	Same as r	Registers r0..r7 e.g. PUSH operand
M	Constant in the range of 0 .. 32 or a power of 2 e.g. MOV R2, R1, ROR #operand	Constant that is a multiple of 4 in the range of 0 .. 1020 e.g. ADD R0, SP, #operand
m	Any valid memory address
N	Not available	Constant in the range of 0 .. 31 e.g. LSL R0, R1, #operand
O	Not available	Constant that is a multiple of 4 in the range of -508 .. 508e.g. ADD SP, #operand
r	General register r0 .. r15 e.g. SUB operand1, operand2, operand3	Not available
w	Vector floating point registers s0 .. s31	Not available
X	Any operand

限制字符可能要单个modifier指示。要是没有modifier指示的默认为read-only operand。

Modifier	Specifies
=	Write-only operand, usually used for all output operands
+	Read-write operand, must be listed as an output operand
&	A register that should be used for output only

Output operands必须为write-only，相应C表达式的值必须是左值。Input operands必须为read-only。C编译器是没有能力做这个检查。比较严格的规则是：不要试图向input operand写。但是如果你想要使用相同的operand作为input和output。限制性modifier（+）可以达到效果。例子如下：

asm("mov %[value], %[value], ror #1" : [value] "+r" (y))

和上面例子不一样的是，最后的结果存储在input variable中。

可能modifier + 不支持早期的编译器版本。庆幸的是这里提供了其他解决办法，该方法在最新的编译器中依然有效。对于input operators有可能使用单一的数字n在限制字符串中。使用数字n可以告诉编译器使用的第n个operand，operand都是以0开始计数。下面是例子：

asm("mov %0, %0, ror #1" : "=r" (value) : "0" (value))

限制性字符串“0”告诉编译器，使用和第一个output operand使用同样input register。请注意，在相反的情况下不会自动实现。如果我没告诉编译器那样做，编译器也有可能为input和output选择相同的寄存器。第一个例子中就为input和output选择了r3。

在多数情况下这没有什么，但是如果在input使用前output已经被修改过了，这将是致命的。在input和output使用不同寄存器的情况下，你必须使用&modifier来限制output operand。下面是代码示例：

asm volatile("ldr %0, [%1]" "\n\t"
             "str %2, [%1, #4]" "\n\t"
             : "=&r" (rdv)
             : "r" (&table), "r" (wdv)
             : "memory");

在以张表中读取一个值然后在写到该表的另一个位置。

三、MORE

内嵌汇编作为预处理宏

要是经常使用部分汇编，最好的方法是将它以宏的形式定义在头文件中。使用该头文件在严格的ANSI模式下会出现警告。为了避免该类问题，可以使用__asm__代替asm，__volatile__代替volatile。这可以等同于别名。下面就是个例程：

#define BYTESWAP(val) \
    __asm__ __volatile__ ( \
        "eor r3, %1, %1, ror #16\n\t" \
        "bic r3, r3, #0x00FF0000\n\t" \
        "mov %0, %1, ror #8\n\t" \
        "eor %0, %0, r3, lsr #8" \
        : "=r" (val) \
        : "0"(val) \
        : "r3", "cc" \
    );

C 桩函数

宏定义包含的是相同的代码。这在大型routine中是不可以接受的。这种情况下最好定义个桩函数。

unsigned long ByteSwap(unsigned long val)
{
asm volatile (
        "eor r3, %1, %1, ror #16\n\t"
        "bic r3, r3, #0x00FF0000\n\t"
        "mov %0, %1, ror #8\n\t"
        "eor %0, %0, r3, lsr #8"
        : "=r" (val)
        : "0"(val)
        : "r3"
);
return val;
}

替换C变量的符号名

默认的情况下，GCC使用同函数或者变量相同的符号名。你可以使用asm声明，为汇编代码指定一个不同的符号名

unsigned long value asm("clock") = 3686400

这个声明告诉编译器使用了符号名clock代替了具体的值。

替换C函数的符号名

为了改变函数名，你需要一个原型声明，因为编译器不接受在函数定义中出现asm关键字。

extern long Calc(void) asm ("CALCULATE")

调用函数calc()将会创建调用函数CALCULATE的汇编指令。

强制使用特定的寄存器

局部变量可能存储在一个寄存器中。你可以利用内嵌汇编为该变量指定一个特定的寄存器。

void Count(void) {
register unsigned char counter asm("r3");

... some code...
asm volatile("eor r3, r3, r3");
... more code...
}

汇编指令“eor r3, r3, r3”，会将r3清零。 Waring：该例子在到多数情况下是有问题的，因为这将和优化器相冲突。因为GCC不会预留其它寄存器。要是优化器认为该变量在以后一段时间没有使用，那么该寄存器将会被再次使用。但是编译器并没有能力去检查是否和编译器预先定义的寄存器有冲突。如果你用这种方式指定了太多的寄存器，编译器将会在代码生成的时候耗尽寄存器的。

临时使用寄存器

如果你使用了寄存器，而你没有在input或output operand传递，那么你就必须向编译器指明这些。下面的例子中使用r3作为scratch 寄存器，通过在clobber list中写r3，来让编译器得知使用该寄存器。由于ands指令跟新了状态寄存器的标志位，使用cc在clobber list中指明。

asm volatile(
    "ands r3, %1, #3" "\n\t"
    "eor %0, %0, r3" "\n\t"
    "addne %0, #4"
    : "=r" (len)
    : "0" (len)
    : "cc", "r3"
  );

最好的方法是使用桩函数并且使用局部临时变量。

with n is in the mentioned range of 0 to 255 and x is an even number in the range of 0 to 24. Because of rotation, x may be set to 26, 28 or 30, in which case bits 37 to 32 are folded to bits 5 to 0 resp. Last not least, the binary complement of these values may be given, when using mvn instead of mov.

Sometimes you need to jump to a fixed memory address, which may be defined by a preprocessor macro. You can use the following assembly code:

ldr  r3, =JMPADDR
bx   r3

This will work with any legal address value. If the constant fits (for example 0x20000000), then the smart assembler will convert this to

  mov  r3, #0x20000000
    bx   r3

If it doesn't fit (for example 0x00F000F0), then the assembler will load the value from the literal pool.

ldr  r3, .L1
bx   r3
...
.L1: .word 0x00F000F0

With inline assembly it works in the same way. But instead of using ldr, you can simply provide a constant as a register value:

asm volatile("bx %0" : : "r" (JMPADDR));

Depending on the actual value of the constant, either mov, ldr or any of its variants is used. If JMPADDR is defined as 0xFFFFFF00, then the resulting code will be similar to

mvn  r3, #0xFF
bx   r3

The real world is more complicated. It may happen, that we need to load a specific register with a constant. Let's assume, that we want to call a subroutine, but we want to return to another address than the one that follows our branch. This is can be useful when embedded firmware returns from main. In this case we need to load the link register. Here is the assembly code:

 ldr  lr, =JMPADDR
    ldr  r3, main
    bx   r3

Any idea how to implement this in inline assembly? Here is a solution:

asm volatile(
    "mov lr, %1\n\t"
    "bx %0\n\t"
    : : "r" (main), "I" (JMPADDR));

But there is still a problem. We use mov here and this will work as long as the value of JMPADDR fits. The resulting code will be the same than what we get in pure assembly code. If it doesn't fit, then we need ldr instead. But unfortunately there is no way to express

ldr  lr, =JMPADDR

in inline assembly. Instead, we must write

asm volatile(
    "mov lr, %1\n\t"
    "bx %0\n\t"
    : : "r" (main), "r" (JMPADDR));

Compared to the pure assembly code, we end up with an additional statement, using an additional register.

  ldr     r3, .L1
  ldr     r2, .L2
  mov     lr, r2
  bx      r3

寄存器的用途

比较好的方法是分析编译后的汇编列表，并且学习C 编译器生成的代码。下面的列表是编译器将ARM核寄存器的典型用途，知道这些将有助于理解代码。

Register	Alt. Name	Usage
r0	a1	First function argument Integer function result Scratch register
r1	a2	Second function argument Scratch register
r2	a3	Third function argument Scratch register
r3	a4	Fourth function argument Scratch register
r4	v1	Register variable
r5	v2	Register variable
r6	v3	Register variable
r7	v4	Register variable
r8	v5	Register variable
r9	v6 rfp	Register variable Real frame pointer
r10	sl	Stack limit
r11	fp	Argument pointer
r12	ip	Temporary workspace
r13	sp	Stack pointer
r14	lr	Link register Workspace
r15	pc	Program counter

Common pitfalls

Instruction sequence Developers often expect, that a sequence of instructions remains in the final code as specified in the source code. This assumption is wrong and often introduces hard to find bugs. Actually, asm statements are processed by the optimizer in the same way as other C statements. They may be rearranged if dependencies allow this.

The chapter "C code optimization" discusses the details and offers solutions.

Defining a variable as a specific register Even if a variable had been forcibly assigned to a specific register, the resulting code may not work as expected. Consider the following snippet:

int foo(int n1, int n2) {
  register int n3 asm("r7") = n2;
  asm("mov r7, #4");
  return n3;
}

The compiler is instructed to use r7 as a local variable n3, which is initialized by parameter n2. Then the inlined assembly statement sets r7 to 4, which should be finally returned. However, this may go completely wrong. Remember, that compiler cannot recognize, what's happening inside the inline assembly. But the optimizer is smart on the C code, generating the following assembly code.

foo:
  mov r7, #4
  mov r0, r1
  bx  lr

Instead of returning r7, the value of n2 is returned, which had been passed to our function in r1. What happed here? Well, while the final code still contains our inline assembly statement, the C code optimizer decided, that n3 is not required. It directly returns parameter n2 instead.

Just assigning a variable to a fixed register does not mean, that the C compiler will use that variable. We still have to tell the compiler, that a variable is modified inside the inline assembly operation. For the given example, we need to extend the asm statement with an output operator:

asm("mov %0, #4" : "=l" (n3));
Now the C compiler is aware, that n3 is modified and will generate the expected result:

foo:
  push {r7, lr}
  mov  r7, #4
  mov  r0, r7
  pop  {r7, pc}

Executing in Thumb status Be aware, that, depending on the given compile options, the compiler may switch to thumb state. Using inline assembler with instructions that are not available in thumb state will result in cryptic compile errors.

Assembly code size In most cases the compiler will correctly determine the size of the assembler instruction, but it may become confused by assembler macros. Better avoid them. In case you are confused: This is about assembly language macros, not C preprocessor macros. It is fine to use the latter.

Labels Within the assembler instruction you can use labels as jump targets. However, you must not jump from one assembler instruction into another. The optimizer knows nothing about those branches and may generate bad code.

Preprocessor macros Inline assembly instruction cannot contain preprocessor macros, because for the preprocessor these instruction are nothing else but string constants. If your assembly code must refer to values that are defined by macros, see the chapter about "Using constants" above.

四、破坏描述部分详解

破坏描述符用于通知编译器我们使用了哪些寄存器或内存，由逗号格开的字符串组成，每个字符串描述一种情况，一般是寄存器名；除寄存器外还有"memory"。例如："%eax"，"%ebx"，"memory"等。

"memory"比较特殊，可能是内嵌汇编中最难懂部分。为解释清楚它，先介绍一下编译器的优化知识，再看C关键字volatile。最后去看该描述符。

1、编译器优化介绍

内存访问速度远不及CPU处理速度，为提高机器整体性能，在硬件上引入硬件高速缓存Cache，加速对内存的访问。另外在现代CPU中指令的执行并不一定严格按照顺序执行，没有相关性的指令可以乱序执行，以充分利用CPU的指令流水线，提高执行速度。以上是硬件级别的优化。再看软件一级的优化：一种是在编写代码时由程序员优化，另一种是由编译器进行优化。编译器优化常用的方法有：将内存变量缓存到寄存器；调整指令顺序充分利用CPU指令流水线，常见的是重新排序读写指令。对常规内存进行优化的时候，这些优化是透明的，而且效率很好。由编译器优化或者硬件重新排序引起的问题的解决办法是在从硬件（或者其他处理器）的角度看必须以特定顺序执行的操作之间设置内存屏障（memory barrier），linux 提供了一个宏解决编译器的执行顺序问题。 void barrier(void) 这个函数通知编译器插入一个内存屏障，但对硬件无效，编译后的代码会把当前CPU寄存器中的所有修改过的数值存入内存，需要这些数据的时候再重新从内存中读出。

2、C语言关键字volatile

C 语言关键字volatile（注意它是用来修饰变量而不是上面介绍的__volatile__）表明某个变量的值可能在外部被改变，因此对这些变量的存取不能缓存到寄存器，每次使用时需要重新存取。该关键字在多线程环境下经常使用，因为在编写多线程的程序时，同一个变量可能被多个线程修改，而程序通过该变量同步各个线程，例如：

DWORD __stdcall threadFunc(LPVOID signal)
{
int* intSignal=reinterpret_cast(signal);
*intSignal=2;
while(*intSignal!=1)
sleep(1000);
return 0;
}

该线程启动时将intSignal置为2，然后循环等待直到intSignal为1 时退出。显然intSignal的值必须在外部被改变，否则该线程不会退出。但是实际运行的时候该线程却不会退出，即使在外部将它的值改为1，看一下对应的伪汇编代码就明白了：

mov ax,signal
label:
if(ax!=1)
goto label

对于C编译器来说，它并不知道这个值会被其他线程修改。自然就把它cache在寄存器里面。记住，C 编译器是没有线程概念的！这时候就需要用到volatile。volatile 的本意是指：这个值可能会在当前线程外部被改变。也就是说，我们要在threadFunc中的intSignal前面加上volatile关键字，这时候，编译器知道该变量的值会在外部改变，因此每次访问该变量时会重新读取，所作的循环变为如下面伪码所示：

label:
mov ax,signal
if(ax!=1)
goto label

3、Memory

有了上面的知识就不难理解Memory修改描述符了，Memory描述符告知GCC： 1）不要将该段内嵌汇编指令与前面的指令重新排序；也就是在执行内嵌汇编代码之前，它前面的指令都执行完毕 2）不要将变量缓存到寄存器，因为这段代码可能会用到内存变量，而这些内存变量会以不可预知的方式发生改变，因此GCC插入必要的代码先将缓存到寄存器的变量值写回内存，如果后面又访问这些变量，需要重新访问内存。如果汇编指令修改了内存，但是GCC 本身却察觉不到，因为在输出部分没有描述，此时就需要在修改描述部分增加"memory"，告诉GCC 内存已经被修改，GCC 得知这个信息后，就会在这段指令之前，插入必要的指令将前面因为优化Cache 到寄存器中的变量值先写回内存，如果以后又要使用这些变量再重新读取。 ==使用"volatile"也可以达到这个目的，但是我们在每个变量前增加该关键字，不如使用"memory"方便==