【笔记】glibc、maclloc理解1.堆内存管理机制 ptmalloc2

1.堆内存管理机制

ptmalloc2 – glibc

gcc -o mthread mthread.c -lpthread

cat /proc/[线程id]/maps

After malloc and before free in thread/* Per thread arena example. */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>

void* threadFunc(void* arg) {
        printf("Before malloc in thread 1\n");
        getchar();
        char* addr = (char*) malloc(1000);
        printf("After malloc and before free in thread 1\n");
        getchar();
        free(addr);
        printf("After free in thread 1\n");
        getchar();
}

int main() {
        pthread_t t1;
        void* s;
        int ret;
        char* addr;

        printf("Welcome to per thread arena example::%d\n",getpid());
        printf("Before malloc in main thread\n");
        getchar();
        addr = (char*) malloc(1000);
        printf("After malloc and before free in main thread\n");
        getchar();
        free(addr);
        printf("After free in main thread\n");
        getchar();
        ret = pthread_create(&t1, NULL, threadFunc, NULL);
        if(ret)
        {
                printf("Thread creation error\n");
                return -1;
        }
        ret = pthread_join(t1, &s);
        if(ret)
        {
                printf("Thread join error\n");
                return -1;
        }
        return 0;
}

1.Before malloc in main thread :

没有 heap segement的

2.After malloc in main thread：

brk系统调用实现。分配堆栈在数据段之上。

3.After free in main thread：

堆空间没释放。由maclloc管理，chunk添加到main arenas的bin。glibc malloc会先尝试从bins中找到一个满足要求的chunk，如果没有才会向操作系统申请新的堆空间。

4.Before malloc in thread1：

thread1调用malloc之前，并没有heap segement，但是thread1的栈已经分配完毕。

5.After malloc in thread1：

thread1的heap segment已经分配完毕。同时从这个区域的起始地址可以看出，它并不是通过brk分配的，而是通过mmap分配，因为它的区域为b7500000-b7600000共1MB，并不是同程序的data segment相邻。同时，我们还能看出在这1MB中，根据内存属性分为了2部分：0xb7500000-0xb7520000共132KB大小的空间是可读可写属性；后面的是不可读写属性。原来，这里只有可读写的132KB空间才是thread1的堆空间，即thread1 arena。

2.Arena理解

1.Arena数量限制

For 32 bit systems: Number of arena = 2 * number of cores. For 64 bit systems: Number of arena = 8 * number of cores.

2.多Arean的管理

单核心pc装了32位操作系统，运行多线程app。4线程（1主线程+3用户线程）。

线程个数>arena个数，此时glibc malloc确保4个线程共享3（2*核心数+1=3）个arena。

当main thread首次调用malloc的时候，glibc malloc会直接分配一个main arena，不需要任何附加条件。

当user1和user2 thread首次malloc的时候，glibc malloc会为每个线程创建一个新的thread arena。此时thread和arena是一对一的。

当user3 thread首次maclloc的时候，出现问题了。此时glibc malloc能维护的arena个数已经达到上限，无法在此为user3 thread创建arena，所以需要复用已经分配好的arena。

1）glibc malloc循环遍历所有可用的arenas，尝试对可用的arena加锁，如果成功lock，就返回

2）如果没找到可用的arena，会把user3 thread的malloc阻塞，直到有可用的arena为止

3）如果user3 thread再次调用malloc，会尝试使用最近访问的arena，可用直接返回，不可用阻塞线程。

3.多线程堆管理

1.三种数据结构：

heap_info:Heap Header

一个thread arena（不包含mainc thread）可以包含多个heaps，每个heap都有他自己的header。当heap不够用时，malloc会通过mmap申请新的堆空间，新的堆空间会被添加到当前thread arena中。

typedef struct _heap_info
{
  mstate ar_ptr; /* Arena for this heap. */
  struct _heap_info *prev; /* Previous heap. */
  size_t size;   /* Current size in bytes. */
  size_t mprotect_size; /* Size in bytes that has been mprotected
                           PROT_READ|PROT_WRITE.  */
  /* Make sure the following data is properly aligned, particularly
     that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
     MALLOC_ALIGNMENT. */
  char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;

malloc_state:Arena Header

每个thread只含有一个Arena Header。包含bins、top chunk、最后一个remainder chunk等。

struct malloc_state
{
  /* Serialize access.  */
  mutex_t mutex;

  /* Flags (formerly in max_fast).  */
  int flags;

  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;

  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;

  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];

  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];

  /* Linked list */
  struct malloc_state *next;

  /* Linked list for free arenas.  */
  struct malloc_state *next_free;

  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

malloc_chunk:Chunk Header

一个heap分成多个chunk，当用户调用malloc传递size参数时候，会根据size调整每个chunk的大小。

struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

不同于thread arena，main arena的arena header并不是sbrk heap segment的一部分，而是一个全局变量！因此它属于libc.so的data segment。

2.heap segment与arena关系

只有一个heap segment的main arena和thread arena的内存分布图:

一个thread arena中含有多个heap segments的情况：

thread arena只含有一个malloc_state，有两个heap_info。由于heap segements通过mmap分配，两者在内存分布上不相邻，为了便于管理，libc malloc会把第二个heap_info的prev分配给第一个heap_info的ar_ptr（结构体起始位置）上，第一个heap_info的ar_ptr指向malloc_state，形成一个单链表。

Chunk理解

glibc malloc会把整个堆内存空间分成连续的，大小不一定的chunk，所以chunk就是最小操作单位。总过分为4种chunk：

Allocated chunk
Free chunk
Top chunk
Last Remainder chunk

简单来说就是两种，一种已经分配给用户使用的chunk，另一种未使用的chunk。

在里面特定位置的某些标识符来区分。

核心目的：高效分配和回收chunk，所以就有不同的算法。

隐式链表：
```
把一些边界信息（标识各个块的边界，以及已分配块和空闲块）作为chunk的一部分，嵌入到chunk内部。
```

    每个chunk的大小必须为8的整倍数，所以chunk size的后3位是无效的，为了充分利用内存，堆管理器利用这3bit作为chunk标识位，比如0bit标记该chunk是否已经被分配。

allocated chunkd的padding部分主要是用于内存对其的

把整个堆内存组织成一个连续的已分配或者未分配的序列，就是隐式链表，内存结构如下：

该链表隐式得由每个chunk的size字段连接起来，在分配的时候遍历整个堆内存的chunk，分析每个chunk的size字段找到合适的chunk。缺点就是内存回收时效率太低，没办法进行相邻多个free chunk的合并。只切割不合并会产生内存碎片，所以进化了一下，变成了带边界的chunk合并。

进化-带边界标记的合并技术

每个chunk的最后加了个Footer，就是该chunk header的副本。每个chunk的Footer都在bk的header前4字节，通过footer，很容易找到fd chunk的起始位置和分配状态，好合并了。

但是，每个chunk都包含一个header和footer，如果app频繁进行小内存申请和释放，会造成大量性能损耗。同时，考虑到只有对free chunk合并的时候采用footer，对allocated chunk不需要。可以优化一下：把fd chunk的已分配/空闲的标识位存在当前chunk的size字段first or second bit上，可以通过当前chunk的size字段找到fd chunk为free chunk。

超进化-支持多线程

需要新的标识位来标识当前chunk是否属于thread arena，以及该chunk是mmap来的还是brk来的。

PREV_INUSE(P): 表示前一个chunk是否为allocated。

IS_MMAPPED(M)：表示当前chunk是否是通过mmap系统调用产生的。

NON_MAIN_ARENA(N)：表示当前chunk是否是thread arena。

Top Chunk

当一个chunk处于arena最高地址的时候，就叫top chunk。

不属于任何bin。

当系统当前的free chunk都无法满足用户请求的内存大小的时候，这个top chunk才会分配给用户使用。

if(top chunk size > user apply){

top chunk 一分为2:

1)user apply size

2)new top chunk

}else{

扩展new heap->

1）在main arena通过sbrk扩展heap

2）在thread arena通过mmap扩展heap

}

Last Remainder Chunk

当用户请求的是一个small chunk，且该请求无法被small bin、unsorted bin满足的时候，就通过binmaps遍历bin查找最合适的chunk，如果该chunk有剩余部分的话，就将该剩余部分变成一个新的chunk加入到unsorted bin中，另外，再将该新的chunk变成新的last remainder chunk。

此类型的chunk用于提高连续malloc(small chunk)的效率，主要是提高内存分配的局部性。

当用户请求一个small chunk，且该请求无法被small bin满足，那么就转而交由unsorted bin处理。同时，假设当前unsorted bin中只有一个chunk的话——就是last remainder chunk，那么就将该chunk分成两部分：前者分配给用户，剩下的部分放到unsorted bin中，并成为新的last remainder chunk。这样就保证了连续malloc(small chunk)中，各个small chunk在内存分布中是相邻的，即提高了内存分配的局部性。

【笔记】glibc、maclloc理解