内存对齐

1.为什么需要内存对齐？一台64位的计算机，一次最多能读取8字节，32位计算机一次最多读取4字节。如上图，如果在存储数据时，不进行内存对齐。假设现在计算机一次只能读取4字节，此时需要读取内存中编号[1-4]的数据，此时计算机需要先读取[0-4)，将位置0剔除，然后再去取[4-7)将5-7剔除，然后将字节合并，得到结果。可以看出，取出这段数据，计算机（CPU）花费了5步完成。那为什么不直接去读取1-4呢？如果直接访问，就需要CPU 实时获得需要访问的数据的大小，这样以来，对CPU 的消耗就更严重。出于减轻访问数据对CPU的资源消耗，就定义了内存对齐的准则。

2.内存对齐有什么规则？针对Struct类型谈谈内存对齐准则

#import <Foundation/Foundation.h>
#import "LGPerson.h"
#import <objc/runtime.h>
#import <malloc/malloc.h>

struct LGStruct1 {
    double a;   //8       0-7
    char b;     //1       8
    int c;      //4       9 10 11 12 13  14 15
    short d;    //2       16 17
}struct1;  // 24

struct LGStruct2 {
    double a;   // 8        0-7
    int b;      // 4        8 9 10 11
    char c;     // 1        12
    short d;    // 2        13 14 15
}struct2; // 24

struct LGStruct3 {
    double a;   // 8        0-7
    int b;      // 4        8-11
    char c;     // 1        12
    short d;    // 2        13 14 15
    int e;      // 4        16 17 18 19
    struct LGStruct1 str;   //20 24-47
}struct3;


int main(int argc, const char * argv[]) {
    @autoreleasepool {
        // insert code here...
        NSLog(@"Hello, World!");
        LGPerson * p1 = [LGPerson alloc];
//        p1.name = @"time";
//        p1.age = 10;
      
        NSLog(@"%lu-%lu-%lu", sizeof(struct1), sizeof(struct2), sizeof(struct3));
        
        NSLog(@"%@ - %lu - %lu - %lu", p1, sizeof(p1), class_getInstanceSize([LGPerson class]), malloc_size((__bridge const void *)p1));
        
    }
    return 0;
}

打印结果
KCObjcBuild[33595:715910] 24-16-48
KCObjcBuild[33595:715910] <LGPerson: 0x101147ae0> - 8 - 24 - 32

取结构体成员中占字节最大的数据类型的字节数，为对齐基数。（如果有结构体嵌套，同理，如果子成员中包含比父元素更大的字节数类型，就去子元素占字节最大的数据类型的字节数，为对齐基数）
元素存储的字节位置，要是当前数据类型所占字节数的整数倍（如上LGStruct1，当存储int c 时，由于当前存储字节数是9，不是4的整数倍，所有9，10，11空余，从12开始存储）
最后结构体整体所占字节数要是对齐基数的整数倍（比如 LGStruct1，元素对齐后，需要18字节存储，由于18不是8的整数倍，所以想上取整为24字节）

运行代码打发第一行打印的结果，符号上面的struct内存对齐准则。

第二行打印的结果，分别是p1指针，p1指针的大小，LGPerson对象的大小，p1实际开辟的内存大小

@interface LGPerson : NSObject
//isa 8 0-7
@property(nonatomic)NSString * name; //8   8 9 10 11 12 13 14 15
@property(nonatomic, assign)int age; //4   16-19
@end

很奇怪，按照之前结构体内存对齐原则，LGPerson(class中包含一个isa指针)所需的字节大小应该为24，为什么实际开辟的32字节呢。

static ALWAYS_INLINE id
_class_createInstanceFromZone(Class cls, size_t extraBytes, void *zone,
                              int construct_flags = OBJECT_CONSTRUCT_NONE,
                              bool cxxConstruct = true,
                              size_t *outAllocatedSize = nil)
{
    ASSERT(cls->isRealized());

    // Read class's info bits all at once for performance
    bool hasCxxCtor = cxxConstruct && cls->hasCxxCtor();
    bool hasCxxDtor = cls->hasCxxDtor();
    bool fast = cls->canAllocNonpointer();
    size_t size;

    size = cls->instanceSize(extraBytes);
    if (outAllocatedSize) *outAllocatedSize = size;

    id obj;
    if (zone) {
        obj = (id)malloc_zone_calloc((malloc_zone_t *)zone, 1, size);
    } else {
        obj = (id)calloc(1, size);
    }
    if (slowpath(!obj)) {
        if (construct_flags & OBJECT_CONSTRUCT_CALL_BADALLOC) {
            return _objc_callBadAllocHandler(cls);
        }
        return nil;
    }

    if (!zone && fast) {
        obj->initInstanceIsa(cls, hasCxxDtor);
    } else {
        // Use raw pointer isa on the assumption that they might be
        // doing something weird with the zone or RR.
        obj->initIsa(cls);
    }

    if (fastpath(!hasCxxCtor)) {
        return obj;
    }

    construct_flags |= OBJECT_CONSTRUCT_FREE_ONFAILURE;
    return object_cxxConstructFromClass(obj, cls, construct_flags);
}

我们知道OC 对象Alloc流程中，创建对象分为3步。

计算对象所需的内存大小 size = cls->instanceSize(extraBytes);
开辟内存空间 obj = (id)calloc(1, size);
关联类 obj->initIsa(cls);

我们发现根据struct对齐规则得出的字节大小与实际开辟的内存大小不一样，那与内存开辟有关的代码，也就是(id)calloc(1, size)。可能它内部修改了size 的大小。

点击进入，发现是macOS11.1系统库中的函数。遇到此类问题，需要怀着深挖探索的精神。我们可以尝试去apple open source查看是否有开源代码。自然是有的。

我们下载编译libmalloc

点击calloc进入，查看调用流程

calloc

calloc(size_t num_items, size_t size)
{
	return _malloc_zone_calloc(default_zone, num_items, size, MZ_POSIX);
}

_malloc_zone_calloc

static void *
_malloc_zone_calloc(malloc_zone_t *zone, size_t num_items, size_t size,
		malloc_zone_options_t mzo)
{
	MALLOC_TRACE(TRACE_calloc | DBG_FUNC_START, (uintptr_t)zone, num_items, size, 0);

	void *ptr;
	if (malloc_check_start) {
		internal_check();
	}
	ptr = zone->calloc(zone, num_items, size);

	if (os_unlikely(malloc_logger)) {
		malloc_logger(MALLOC_LOG_TYPE_ALLOCATE | MALLOC_LOG_TYPE_HAS_ZONE | MALLOC_LOG_TYPE_CLEARED, (uintptr_t)zone,
				(uintptr_t)(num_items * size), 0, (uintptr_t)ptr, 0);
	}

	MALLOC_TRACE(TRACE_calloc | DBG_FUNC_END, (uintptr_t)zone, num_items, size, (uintptr_t)ptr);
	if (os_unlikely(ptr == NULL)) {
		malloc_set_errno_fast(mzo, ENOMEM);
	}
	return ptr;
}

我们需要关注的是这个ptr指针，它是在ptr = zone->calloc(zone, num_items, size); 返回的。点击zone->calloc发现它是一个函数指针。

void 	*(* MALLOC_ZONE_FN_PTR(calloc))(struct _malloc_zone_t *zone, size_t num_items, size_t size); /* same as malloc, but block returned is set to zero */

我可以通过lldb，直接调用这个函数

我们从控制台输出发现它其实调用了default_zone_calloc 方法，我们检索这个函数

static void *
default_zone_calloc(malloc_zone_t *zone, size_t num_items, size_t size)
{
	zone = runtime_default_zone();
	
	return zone->calloc(zone, num_items, size);
}

断点运行，发现有调用了zone->calloc,同样它也是一个函数指针，我们同理，在控制台直接调用，查看具体调用了什么函数

继续检索nano_calloc

static void *
nano_calloc(nanozone_t *nanozone, size_t num_items, size_t size)
{
	size_t total_bytes;

	if (calloc_get_size(num_items, size, 0, &total_bytes)) {
		return NULL;
	}

	if (total_bytes <= NANO_MAX_SIZE) {
		void *p = _nano_malloc_check_clear(nanozone, total_bytes, 1);
		if (p) {
			return p;
		} else {
			/* FALLTHROUGH to helper zone */
		}
	}
	malloc_zone_t *zone = (malloc_zone_t *)(nanozone->helper_zone);
	return zone->calloc(zone, 1, total_bytes);
}

然后断点运行至 _nano_malloc_check_clear 这个函数，并返回指针。我们关注点应该是字节的大小是在哪里发生的改变，继续查看

segregated_size_to_fit 我们外部传进来的size 传入了这个函数，

static MALLOC_INLINE size_t
segregated_size_to_fit(nanozone_t *nanozone, size_t size, size_t *pKey)
{
	size_t k, slot_bytes;

	if (0 == size) {
		size = NANO_REGIME_QUANTA_SIZE; // Historical behavior
	}
	k = (size + NANO_REGIME_QUANTA_SIZE - 1) >> SHIFT_NANO_QUANTUM; // round up and shift for number of quanta
	slot_bytes = k << SHIFT_NANO_QUANTUM;							// multiply by power of two quanta size
	*pKey = k - 1;													// Zero-based!

	return slot_bytes;
}

#define NANO_REGIME_QUANTA_SIZE (1 << SHIFT_NANO_QUANTUM) // 16

#define SHIFT_NANO_QUANTUM 4

在这里判断size是否为0，显然不是，然后将（size+16-1）>> 4。这里将size的值修改了，将size变为16的整数倍。

总结：

1.内存对齐的意义在于减轻CPU在访问数据是的压力，通过空间换时间的方案，加快访问效率。

2.iOS Struct字节对齐的规则，取结构体元素中所占字节数最大的数据类型的字节数，为对齐基数，每个元素存储的首地址，一定是当前数据类型所占字节数的整数倍，结构体整体所占字节数要是对齐基数的整数倍

3.OC 类在开辟内存时，成员是已8字节对齐计算对象所需的内存大小，而类实际是将对象所需的内存大小进行16字节对齐，已16字节对齐的大小进行内存开辟。

iOS Struct的内存对齐

内存对齐

总结：