手写一个C/C++内存检查工具

250 阅读11分钟

文末附有全部实现代码

背景

写C/C++代码的时候我们经常碰到一些内存问题,这类问题具有偶现、不稳定的特点,往往需要借助内存检查工具去定位。如今内存检查工具比较成熟的有valgrind、windows平台的CRT等,尽管如此,在工作中,我还是需要遇到了需要手动实现一个C内存检查工具的场景:

  1. 内网环境管控严格,不能随便引入外部工具。
  2. 现有工具不够好用,不能仅针对某个模块去做内存检查。有时候别的模块造成的问题不会在第一时间被触发,最后问题往往集中体现在运算量比较大的模块里,导致分析效率低下。
  3. 现有工具往往是全量检查,对所有内存操作都做了检查,运行较慢。

先讲原理

  1. 实现增强版的内存申请、释放函数,对内存块进行封装,记录内存申请/释放的文件名、行数,用于后续的检查报告打印。
  2. 增强版内存函数在申请内存时,在目标内存的头尾额外申请一段内存,并在头尾部放入魔法数,后续通过检查魔法数是否被修改来检测内存越界
  3. 增强版内存函数在释放内存时,记录内存块被释放的次数,释放前检查这个计数器即可检测出double free的问题。在程序结束前检查这个计数器,即可检查出内存泄漏问题。
  4. 用链表将所有申请的内存块存储起来,mallocList存储未释放的内存,freedList存储已释放的内存,便于进行全量内存检查。
  5. 将增强类实现为线程级单例,用宏定义替换掉原生的内存申请接口,便于在项目中使用。
  6. 再考虑靠后的情况下。可以考虑直接封装内存读写操作,并在读写前检查是否越界。

一步步实现

魔法数检测内存越界

对于内

  1. 分别定义两个用于填充头尾的魔法数。
  2. 实现一个增强的内存分配函数,在申请内存大小size的基础上,多申请8个字节的内存,头尾各分配4个字节用于存放魔法数。所以增强函数实际申请了size + 8个字节的内存。
  3. 由于头尾的魔法数标记各占用了4个字节,所以我们将ptr + 4作为申请成功的内存地址返回。
  4. 实现检查函数:检查每块已分配内存的头尾部的魔法数标记是否被篡改,即可检查出是否发生内存越界修改。 原理如下图: image.png
    核心代码如下,这里我用宏简化了函数,后续直接用EnhancedMalloc替换malloc即可:
#define MAGIC_HEAD_NUMBER 0xDEADBEEF
#define MAGIC_TAIL_NUMBER 0xD8675309
#define EnhancedMalloc(size) EnhancedMemoryManager::getInstance().enhancedMalloc(size, __FILE__, __LINE__)

void *EnhancedMemoryManager::enhancedMalloc(size_t size, const char* file_name, int line_number) {
    void* data = malloc(size + 8);
    memset(data, 0, size + 8);
    if (data == nullptr) {
        printf("Alloc failed.");
        return nullptr;
    }
    // 分配魔法数
    unsigned char* magic_ptr = (unsigned char*)data;
    magic_ptr[0] = (unsigned char)MAGIC_HEAD_NUMBER;
    magic_ptr[1] = (unsigned char)(MAGIC_HEAD_NUMBER >> 8);
    magic_ptr[2] = (unsigned char)(MAGIC_HEAD_NUMBER >> 16);
    magic_ptr[3] = (unsigned char)(MAGIC_HEAD_NUMBER >> 24);
    magic_ptr[size + 4] = (unsigned char)MAGIC_TAIL_NUMBER;
    magic_ptr[size + 4 + 1] = (unsigned char)(MAGIC_TAIL_NUMBER >> 8);
    magic_ptr[size + 4 + 2] = (unsigned char)(MAGIC_TAIL_NUMBER >> 16);
    magic_ptr[size + 4 + 3] = (unsigned char)(MAGIC_TAIL_NUMBER >> 24);

    MemoryBlock *block = new MemoryBlock(file_name, line_number);
    block->size = size;
    block->data = data;

    this->mallocList.push_back(block);
    return (void*) &magic_ptr[4];
}

freed计数器检测内存释放情况

对于重复释放内存错误,我通过计数的方式检测,其原理非常简单。
1大部在M泄露oryManager中把每块内存都用链表存起来,同时记录他们被释放的次数。

  1. 实现增强的内存释放函数,在释放前找到这块内存的记录。检查它的释放计数器是否大于等于1.
  2. 是,则说明本次释放为重复释放,直接打印警告日志,不做释放操作。否则,正常释放,计数器+1.
  3. 程序结束时,可以通过检查计数器,检测是否存在内存泄漏。

核心代码如下:

#define EnhancedFree(ptr) EnhancedMemoryManager::getInstance().enhancedFree(ptr)

void EnhancedMemoryManager::enhancedFree(void * ptr) {
    auto iterator = std::find_if(this->mallocList.begin(), this->mallocList.end(), [ptr](const MemoryBlock *block) {
        return block->data == ((unsigned char *) ptr - 4);
    });

    if (iterator == this->mallocList.end()) {
        iterator = std::find_if(this->freedList.begin(), this->freedList.end(), [ptr](const MemoryBlock *block) {
            return block->data == ((unsigned char*)ptr - 4);
        });
    }

    if (iterator == this->mallocList.end()) {
        return;
    }

    MemoryBlock *targetBlock = *iterator;
    // 标记为freed
    targetBlock->freed += 1;
    if (targetBlock->freed > 1) {
        printf("%sDouble free detected.\n", targetBlock->getAllocPos());
        return;
    }
    // 释放前检查是否存在内存越界。
    check(targetBlock);

    this->mallocList.remove(targetBlock);
    this->freedList.push_back(targetBlock);
    free(targetBlock->data);
}

用链表存储内存块,便于全量检查

这块实现比较简单暴力,我直接用STL容器做的存储,就不做过多赘述了,直接上核心代码:

class EnhancedMemoryManager {
private:
    // 屏蔽构造方法
    EnhancedMemoryManager() {};
    EnhancedMemoryManager(const EnhancedMemoryManager&) = delete;
    EnhancedMemoryManager& operator = (const MemoryBlock&) = delete;
public:
    std::list<MemoryBlock*> mallocList; // 存储未释放的内存块
    std::list<MemoryBlock*> freedList; // 存储已释放的内存块
    void* enhancedMalloc(size_t, const char*, int);
    void enhancedFree(void*);
    void check(MemoryBlock*);
    void checkAll();
    void clear();
    static EnhancedMemoryManager& getInstance() {
        // 线程私有单例
        static thread_local EnhancedMemoryManager enhancedMemoryManager;
        return enhancedMemoryManager;
    }
    void traverse();
};

void EnhancedMemoryManager::checkAll() {
    if (this->mallocList.empty() && this->freedList.empty()) {
        printf("Memory OK: No malloc or free ever been called.");
        return;
    }
    for (std::list<MemoryBlock*>::iterator it = mallocList.begin(); it != mallocList.end() ; ++it) {
        check(*it);
    }
    for (std::list<MemoryBlock*>::iterator it = freedList.begin(); it != freedList.end() ; ++it) {
        check(*it);
    }
}

封装内存访问,直接防止越界

image.png 通过魔法数检查内存泄漏的方式代价较小,但是当内存越界的步长超过头尾部魔法数的范围时,有可能存在漏检查。
所以如果怀疑存在“大步长的内存泄漏”,还有一种代价比较大的方法,那就是记录下每一块内存的大小,通过函数访问这块内存,并在函数中直接判断是否越界。这种方法适合实在没办法的情况下使用。
由于我的是C++项目,我按照以上思路简单做了一种C++的实现,不依赖MemoryManager。核心代码如下:

template<typename T>
class SimBuffer {
public:
    int size{};
    T* data;
    char const* file_name{};
    int line_number = 0;

    SimBuffer(int size, const char* file_name, int line_number);
    // 判断是否越界
    T& operator[](int index);
    ~SimBuffer();
};

template<typename T>
SimBuffer<T>::SimBuffer(int size, const char* file_name, int line_number) {
    this->file_name = file_name;
    this->line_number = line_number;
    this->data = (T*)EnhancedMemoryManager::getInstance().enhancedMalloc(sizeof(T) * size, file_name, line_number);
    this->size = size;
}

template<typename T>
T& SimBuffer<T>::operator[](int index) {
    printf("DEBUG: using override [].\n");
    if (index < 0 || index >= this->size) {
        fprintf(stderr, "[Allocated at %s:%d]index out of bound, index=%d, size=%d.\n", this->file_name, this->line_number, index, this->size);
        throw std::runtime_error("i am an exception");
    }
    return data[index];
}

template<typename T>
SimBuffer<T>::~SimBuffer() {
    printf("DEBUG: releasing %s:%d.\n", this->file_name, this->line_number);
    EnhancedMemoryManager::getInstance().enhancedFree(this->data);
    file_name = nullptr;
}

测试代码:

int main(int argc, char *argv[]) {
    SimBuffer<unsigned short> *simP5, *simP6 = nullptr;
    simP5 = new SimBuffer<unsigned short>(10, __FILE__, __LINE__);
    simP6 = new SimBuffer<unsigned short>(10, __FILE__, __LINE__);

    (*simP5)[10] = 10;
    (*simP6)[-1] = 9;

    delete simP5;
    delete simP6;
}

// 输出如下:
// [Allocated at E:\repo\CPlayground\Main.cpp:203]index out of bound, index=10, size=10.
// terminate called after throwing an instance of 'std::runtime_error'

对于C项目,可以采用宏定义的方式重载运算符。

#include <stdio.h>
#include <assert.h>

// 定义一个结构体来模拟数组
typedef struct {
    int *array;
    size_t size;
} Array;

// 定义一个宏来模拟数组的索引操作
#define ARRAY_INDEX(arr, index) (assert((index) < (arr).size), (arr).array[(index)])

int main() {
    int data[] = {10, 20, 30, 40, 50};
    Array arr = {data, sizeof(data) / sizeof(data[0])};

    // 使用宏来访问数组元素
    printf("Element at index 2: %d\n", ARRAY_INDEX(arr, 10));

    // 尝试访问越界元素,将会触发断言
    // printf("Element at index 10: %d\n", ARRAY_INDEX(arr, 10));

    return 0;
}

但是这两种方法都需要对现有的项目“大改”,不太友好。不过暂时也没想到更好的办法,如果读者有更好的想法,随时欢迎交流。

总结

在本文,我实现了一个简易的C/C++内存检查工具EnhancedMemoryManager,参考的是Windows CRT库内存检查的核心思路。

  1. 它能够检测出内存写越界、重复释放、内存泄漏这三类内存问题,并且详细报出哪个文件哪一行申请的内存有问题。
  2. 由于魔法数检查的方式不能检测出读越界,同时存在漏报的问题,又提出了一种代价较大的直接检测越界的方法,EnhancedMemoryManager可以和这种检测方法搭配使用。
  3. 直接检测越界的方法入侵性很强,暂时没想到更好的办法~

参考文章

  1. malloc和free实现原理
  2. malloc函数实现
  3. CRT库如何追踪内存泄漏
  4. linux下抓取C++ throw调用栈
  5. GDB调试入门
  6. 如何调试glibc代码
  7. gdb官方文档

全部实现代码

头文件

#ifndef CPLAYGROUND_MEMORYMANAGER_H
#define CPLAYGROUND_MEMORYMANAGER_H

#include <stddef.h>
#include <thread>
#include <string>
#include <sstream>
#include <iostream>
#include <list>

#define EnhancedMalloc(size) EnhancedMemoryManager::getInstance().enhancedMalloc(size, __FILE__, __LINE__)
#define EnhancedFree(ptr) EnhancedMemoryManager::getInstance().enhancedFree(ptr)

class MemoryBlock {
public:
    char const* file_name;
    int line_number = 0;
    int freed = 0;
    size_t size;
    // 1. 前后插入魔法数
    // 2. 释放时realData全释放,记录freed。
    // 3. 只有clearAlloc时,才会释放所有MemoryBlock
    void* data;

    MemoryBlock(char const*, int);
    ~MemoryBlock();
    const char* getAllocPos();
    bool operator == (const MemoryBlock &other) const;
    bool operator == (const void* ptr) const;
};


class EnhancedMemoryManager {
private:
    // 屏蔽构造方法
    EnhancedMemoryManager() {};
    EnhancedMemoryManager(const EnhancedMemoryManager&) = delete;
    EnhancedMemoryManager& operator = (const MemoryBlock&) = delete;
public:
    std::list<MemoryBlock*> mallocList;
    std::list<MemoryBlock*> freedList;
    void* enhancedMalloc(size_t, const char*, int);
    void enhancedFree(void*);
    void check(MemoryBlock*);
    void checkAll();
    void clear();
    static EnhancedMemoryManager& getInstance() {
        // 线程私有单例
        static thread_local EnhancedMemoryManager enhancedMemoryManager;
        return enhancedMemoryManager;
    }
    void traverse();
};

template<typename T>
class SimBuffer {
public:
    int size{};
    T* data;
    char const* file_name{};
    int line_number = 0;

    SimBuffer(int size, const char* file_name, int line_number);
    T& operator[](int index);
    ~SimBuffer();
};

#endif //CPLAYGROUND_MEMORYMANAGER_H

cpp文件

#include "MemoryManager.h"
#include <iostream>
#include <sstream>
#include <malloc.h>
#ifdef __linux__
#include <execinfo.h>
#endif
#include <stdio.h>
#include <memory.h>
#include <algorithm>

#define MAGIC_HEAD_NUMBER 0xDEADBEEF
#define MAGIC_TAIL_NUMBER 0xD8675309

#ifdef __linux__
void print_stacktrace() {
    const int max_frames = 128;
    void* frame[max_frames];
    int frame_count = backtrace(frame, max_frames);

    char** symbols = backtrace_symbols(frame, frame_count);
    if (symbols) {
        for (int i = 0; i < frame_count; ++i) {
            printf("%s\n", symbols[i]);
        }
       for (int i = 1; i < frame_count; ++i)
       {
           printf("[bt] #%d %s\n", i, symbols[i]);

           /* find first occurence of '(' or ' ' in message[i] and assume
            * everything before that is the file name. (Don't go beyond 0 though
            * (string terminator)*/
           size_t p = 0;
           while(symbols[i][p] != '(' && symbols[i][p] != ' '
                 && symbols[i][p] != 0)
               ++p;

           char syscom[256];
           sprintf(syscom,"addr2line %p -e %.*s", trace[i], p, symbols[i]);
           //last parameter is the file name of the symbol
           system(syscom);
       }
        free(symbols);
    }
}
#endif


const char* MemoryBlock::getAllocPos() {
    std::stringstream ss;
    ss << "[Allocated at " << this->file_name << ":" << this->line_number << "] ";
    std::string result = ss.str();
    char *str = new char[result.size() + 1];
    std::copy(result.begin(), result.end(), str);
    str[result.size()] = '\0';
//        print_stacktrace();
    return str;
}


MemoryBlock::MemoryBlock(const char * file_name, int line_number) {
    this->file_name = file_name;
    this->line_number = line_number;
}

MemoryBlock::~MemoryBlock() {
    printf("~MemoryBlock been called.\n");
    if (this->freed < 1) {
        EnhancedMemoryManager::getInstance().enhancedFree(this->data);
    }
}

bool MemoryBlock::operator==(const MemoryBlock &other) const {
    if (this->data == other.data) {
        return true;
    }
    return false;
}

bool MemoryBlock::operator==(const void *ptr) const {
    if (this->data == ptr) {
        return true;
    }
    return false;
}

void EnhancedMemoryManager::check(MemoryBlock *block) {
    if (block->freed > 1) {
        printf("%sDouble free detected.\n", block->getAllocPos());
//        print_stacktrace();
        return;
    }
    // 检查魔法数
    unsigned char* magic_ptr = (unsigned char*)(block->data);
    bool beforeFlag = false, afterFlag = false;
    for (int i = 0; i < 3; ++i) {
        if (!beforeFlag) {
            unsigned char tmpChar = MAGIC_HEAD_NUMBER >> 8 * i;
            if (magic_ptr[i] != tmpChar) {
                beforeFlag = true;
            }
        }
        if (!afterFlag) {
            unsigned char tmpChar = MAGIC_TAIL_NUMBER >> 8 * i;
            if (magic_ptr[block->size + 4 + i] != tmpChar) {
                afterFlag = true;
            }
        }
    }

    if (beforeFlag) {
        printf("%sMemory clobbered before allocated block.\n", block->getAllocPos());
    }
    if (afterFlag) {
        printf("%sMemory clobbered after allocated block.\n", block->getAllocPos());
    }
}

void *EnhancedMemoryManager::enhancedMalloc(size_t size, const char* file_name, int line_number) {
    void* data = malloc(size + 8);
    memset(data, 0, size + 8);
    if (data == nullptr) {
        printf("Alloc failed.");
        return nullptr;
    }
    // 分配魔法数
    unsigned char* magic_ptr = (unsigned char*)data;
    magic_ptr[0] = (unsigned char)MAGIC_HEAD_NUMBER;
    magic_ptr[1] = (unsigned char)(MAGIC_HEAD_NUMBER >> 8);
    magic_ptr[2] = (unsigned char)(MAGIC_HEAD_NUMBER >> 16);
    magic_ptr[3] = (unsigned char)(MAGIC_HEAD_NUMBER >> 24);
    magic_ptr[size + 4] = (unsigned char)MAGIC_TAIL_NUMBER;
    magic_ptr[size + 4 + 1] = (unsigned char)(MAGIC_TAIL_NUMBER >> 8);
    magic_ptr[size + 4 + 2] = (unsigned char)(MAGIC_TAIL_NUMBER >> 16);
    magic_ptr[size + 4 + 3] = (unsigned char)(MAGIC_TAIL_NUMBER >> 24);

    MemoryBlock *block = new MemoryBlock(file_name, line_number);
    block->size = size;
    block->data = data;

    this->mallocList.push_back(block);
    return (void*) &magic_ptr[4];
}

void EnhancedMemoryManager::enhancedFree(void * ptr) {
    auto iterator = std::find_if(this->mallocList.begin(), this->mallocList.end(), [ptr](const MemoryBlock *block) {
        return block->data == ((unsigned char *) ptr - 4);
    });

    if (iterator == this->mallocList.end()) {
        iterator = std::find_if(this->freedList.begin(), this->freedList.end(), [ptr](const MemoryBlock *block) {
            return block->data == ((unsigned char*)ptr - 4);
        });
    }

    if (iterator == this->mallocList.end()) {
        return;
    }

    MemoryBlock *targetBlock = *iterator;
    // 标记为freed
    targetBlock->freed += 1;
    if (targetBlock->freed > 1) {
        printf("%sDouble free detected.\n", targetBlock->getAllocPos());
        return;
    }
    // 释放前检查是否存在内存越界。
    check(targetBlock);

    this->mallocList.remove(targetBlock);
    this->freedList.push_back(targetBlock);
    free(targetBlock->data);
}

void EnhancedMemoryManager::checkAll() {
    if (this->mallocList.empty() && this->freedList.empty()) {
        printf("Memory OK: No malloc or free ever been called.");
        return;
    }
    for (std::list<MemoryBlock*>::iterator it = mallocList.begin(); it != mallocList.end() ; ++it) {
        check(*it);
    }
    for (std::list<MemoryBlock*>::iterator it = freedList.begin(); it != freedList.end() ; ++it) {
        check(*it);
    }
}

void EnhancedMemoryManager::clear() {
    this->mallocList.clear();
    this->freedList.clear();
}

void EnhancedMemoryManager::traverse() {
    printf("=> Allocations have NOT been freed.\n");
    for (auto it = this->mallocList.begin(); it != this->mallocList.end() ; ++it) {
        printf("%s \n", (*it)->getAllocPos());
    }
    printf("=> Allocations have been FREED.\n");
    for (auto it = this->freedList.begin(); it != this->freedList.end() ; ++it) {
        printf("%s \n", (*it)->getAllocPos());
    }
}

template<typename T>
SimBuffer<T>::SimBuffer(int size, const char* file_name, int line_number) {
    this->file_name = file_name;
    this->line_number = line_number;
    this->data = (T*)EnhancedMemoryManager::getInstance().enhancedMalloc(sizeof(T) * size, file_name, line_number);
    this->size = size;
}

template<typename T>
T& SimBuffer<T>::operator[](int index) {
    printf("DEBUG: using override [].\n");
    if (index < 0 || index >= this->size) {
        fprintf(stderr, "[Allocated at %s:%d]index out of bound, index=%d, size=%d.\n", this->file_name, this->line_number, index, this->size);
        throw std::runtime_error("i am an exception");
    }
    return data[index];
}

template<typename T>
SimBuffer<T>::~SimBuffer() {
    printf("DEBUG: releasing %s:%d.\n", this->file_name, this->line_number);
    EnhancedMemoryManager::getInstance().enhancedFree(this->data);
    file_name = nullptr;
}

template class SimBuffer<unsigned short>;

测试代码

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <thread>
#include "MemoryManager.h"
#include "hook_cxa_throw-lys.h"

#define VARIABLE_NAME(var) #var
#define IS_IN(lib)	 (IN_MODULE == MODULE_##lib)

int main(int argc, char *argv[]) {
    TestStruct *p1, *p2, *tmpp = NULL;
    setvbuf(stdout, NULL, _IONBF, 0);
    EnhancedMemoryManager &memoryManager = EnhancedMemoryManager::getInstance();
    p1 = (TestStruct*)EnhancedMalloc(sizeof(TestStruct) * 10);
    // p1 = (TestStruct*)memoryManager.enhancedMalloc(sizeof(TestStruct) * 10, __FILE__, __LINE__);
    p2 = (TestStruct*)memoryManager.enhancedMalloc(sizeof(TestStruct) * 10, __FILE__, __LINE__);

    tmpp = p1 + 10;
    tmpp[0].id = 10; // 内存块后越界

    tmpp = p2 - 1;
    tmpp[0].id = 10; // 内存块前越界

    EnhancedFree(p1);
    memoryManager.enhancedFree(p1);
    memoryManager.enhancedFree(p2);

    p1 = (TestStruct*)memoryManager.enhancedMalloc(sizeof(TestStruct) * 10, __FILE__, __LINE__);
    p2 = (TestStruct*)memoryManager.enhancedMalloc(sizeof(TestStruct) * 10, __FILE__, __LINE__);
    memoryManager.traverse();

    TestStruct *p3, *p4;
    p3 = (TestStruct*)memoryManager.enhancedMalloc(sizeof(TestStruct) * 10, __FILE__, __LINE__);
    p4 = (TestStruct*)memoryManager.enhancedMalloc(sizeof(TestStruct) * 10, __FILE__, __LINE__);

    memoryManager.enhancedFree(p3);
    memoryManager.enhancedFree(p1);
    memoryManager.enhancedFree(p4);
    memoryManager.traverse();

    p3 = (TestStruct*)memoryManager.enhancedMalloc(sizeof(TestStruct) * 10, __FILE__, __LINE__);
    p4 = (TestStruct*)memoryManager.enhancedMalloc(sizeof(TestStruct) * 10, __FILE__, __LINE__);

    //检测内存是否越界
    printf("checkAllPtr starts...\n");
    memoryManager.checkAll();
    printf("checkAllPtr ends...\n");

    printf("================================ new ==============================.\n");

    simP5 = new SimBuffer<unsigned short>(10, __FILE__, __LINE__);
    simP6 = new SimBuffer<unsigned short>(10, __FILE__, __LINE__);

    (*simP5)[10] = 10;
    (*simP6)[-1] = 9;

    delete simP5;
    delete simP6;


    return 0;
}