AI视觉实战1:实时人脸检测

1,727 阅读6分钟

AI实战1:实时人脸检测

1. 背景

AI在视觉领域最常用的就是人脸检测、人脸识别、活体检测、人体与行为分析、图像识别、图像增强等,而且目前都是比较成熟的技术,不论商业化的Paas平台还是开源的模型,都几乎一抓一大把。一般的,AI开发过程有以下几步:

  1. 特征分析
  2. 数据采集
  3. 数据标注
  4. 模型训练
  5. 模型推理

推理可以在云端也可以在客户端,端云各有各的场景,比如一般把人脸检测放到客户端,把人脸识别放到云端。本系列我们主要介绍视觉方向模型推理的工程实践。

2. 项目介绍

我们基于谷歌开源项目mediapipe提供的的模型,在客户端部署运行进行推理,mediapipe提供了一下能力:

  1. 人脸检测(Face Detection)
  2. 三维人脸网络模型(Face Mesh)
  3. 虹膜检测(Iris)
  4. 手势(Hands)
  5. 姿态(Pose)
  6. 全身姿态(Holistic)
  7. 头发分隔(Hair Segmentation)
  8. 对象检测(Object Detection)
  9. 物体追踪(Box Tracking)
  10. 即时移动检测(Instant Motion Tracking)
  11. Objectron
  12. KNIFT
  13. ...

mediapipe提供了bazel build -c opt --config=android_arm64 mediapipe/examples/android/src/java/com/google/mediapipe/apps/handtrackinggpu:handtrackinggpu编译出来即可运行。我们这里移动端开发框架我们基于开源项目github.com/terryky/and… NDK运行和测量TensorFlow Lite GPU Delegate的性能。整体基于NativeActivity框架在进行摄像头采集后画面渲染和性能数据渲染。本文我们跑通实时人脸识别模型。移动端开发框架我们基于开源项目github.com/terryky/and… NDK运行和测量TensorFlow Lite GPU Delegate的性能。整体基于NativeActivity框架在进行摄像头采集后画面渲染和性能数据渲染。本文我们跑通实时人脸识别模型。

3. 了解NativeActivity

NativeActivity是为单独使用C|C++开发app提供的基类。纯C++开发Android应用,最后还是需要一个Java层的壳子,在Android提供的开发框架中,已经使用java开发好了一个中间类,我们使用C++开发的Native库之所以能运行,就是因为被这个中间类使用JNI的方式调用了,这个中间类就是NativeActivity。这个NativeActivity类的核心功能,就是在特定事件发生时,调用我们使用C++开发的Native库里的回调函数。比如在我们熟悉的生命周期函数NativeActivity.onStart中,调用C++开发的Native库的onStartNative函数:

protected void onStart() {
        super.onStart();
        onStartNative(mNativeHandle);
}

Native层Android为我们提供了两个接口:

  1. native_activity.h
  2. android_native_app_glue.h

android_native_app_glue.h封装了native_activity.h,我们直接实现void android_main(struct android_app* state)方法即可。

NativeActivity更多具体信息可以参考Android官方文档:GameActivity  |  Android 开发者  |  Android Developers

4. 运行模型

我们选择的模型:storage.googleapis.com/mediapipe-a…

  1. 加载模型;
  2. 摄像头预览纹理转换为RGBA
  3. 将图像数据feed到模型引擎进行推理
  4. 解析渲染结果

4.1 加载模型

首先我们要将模型文件读取到内存,我们的模型文件放置在Android工程的asset路径下,将文件加载到内存std::vector<uint8_t> m_tflite_model_buf;

bool
asset_read_file (AAssetManager *assetMgr, char *fname, std::vector<uint8_t>&buf) 
{
    AAsset* assetDescriptor = AAssetManager_open(assetMgr, fname, AASSET_MODE_BUFFER);
    if (assetDescriptor == NULL)
    {
        return false;
    }

    size_t fileLength = AAsset_getLength(assetDescriptor);

    buf.resize(fileLength);
    int64_t readSize = AAsset_read(assetDescriptor, buf.data(), buf.size());

    AAsset_close(assetDescriptor);

    return (readSize == buf.size());
}


asset_read_file (m_app->activity->assetManager,
                    (char *)BLAZEFACE_MODEL_PATH, m_tflite_model_buf);

tflite提供了FlatBufferModel::BuildFromBuffer加载模型,返回tflite::FlatBufferModel类型的指针:

std::unique_ptr<tflite::FlatBufferModel> model = FlatBufferModel::BuildFromBuffer(model_buf, model_size)

加载完模型,通过模型创建推理引擎解释器tflite::Interpreter,tflite提供了InterpreterBuilder工具来构建tflite::Interpreter

class InterpreterBuilder {
 public:
  InterpreterBuilder(const FlatBufferModel& model,
                     const OpResolver& op_resolver);

需要传入模型model及OpResolver,OpResolver是个抽象接口,返回给定操作码或自定义操作名的tflite注册器。这是将flatbuffer模型中引用的操作被映射到可执行函数指针(TfLiteRegistrations)的机制。InterpreterBuilder重载了括号操作符:

TfLiteStatus operator()(std::unique_ptr<Interpreter>* interpreter);
TfLiteStatus operator()(std::unique_ptr<Interpreter>* interpreter,
                          int num_threads);

构建完InterpreterBuilder后创建tflite::Interpreter:

std::unique_ptr<tflite::FlatBufferModel> model;
std::unique_ptr<tflite::Interpreter>     interpreter;
tflite::ops::builtin::BuiltinOpResolver  resolver;
InterpreterBuilder(*model, resolver)(&interpreter)

InterpreterBuilder重载的括号操作符有两个,第二个有个线程数量的参数,我们也可以通过tflite::InterpreterSetNumThreads手动设置:

    int num_threads = std::thread::hardware_concurrency();
    char *env_tflite_num_threads = getenv ("FORCE_TFLITE_NUM_THREADS");
    if (env_tflite_num_threads)
    {
        num_threads = atoi (env_tflite_num_threads);
        DBG_LOGI ("@@@@@@ FORCE_TFLITE_NUM_THREADS=%d\n", num_threads);
    }
    DBG_LOG ("@@@@@@ TFLITE_NUM_THREADS=%d\n", num_threads);
    interpreter->SetNumThreads(num_threads);

接下来分配tensor空间:

  // Update allocations for all tensors. This will redim dependent tensors
  // using the input tensor dimensionality as given. This is relatively
  // expensive. This *must be* called after the interpreter has been created
  // and before running inference (and accessing tensor buffers), and *must be*
  // called again if (and only if) an input tensor is resized. Returns status of
  // success or failure.
  TfLiteStatus AllocateTensors();

接下来解析引擎获取模型配置(主要是输入输出张量):

int
tflite_get_tensor_by_name (std::unique_ptr<tflite::Interpreter> interpreter, int io, const char *name, tflite_tensor_t *ptensor)
{
    memset (ptensor, 0, sizeof (*ptensor));

    int tensor_idx;
    int io_idx = -1;
    int num_tensor = (io == 0) ? interpreter->inputs ().size() :
                                 interpreter->outputs().size();

    for (int i = 0; i < num_tensor; i ++)
    {
        tensor_idx = (io == 0) ? interpreter->inputs ()[i] :
                                 interpreter->outputs()[i];

        const char *tensor_name = interpreter->tensor(tensor_idx)->name;
        if (strcmp (tensor_name, name) == 0)
        {
            io_idx = i;
            break;
        }
    }

    if (io_idx < 0)
    {
        DBG_LOGE ("can't find tensor: "%s"\n", name);
        return -1;
    }

    void *ptr = NULL;
    TfLiteTensor *tensor = interpreter->tensor(tensor_idx);
    switch (tensor->type)
    {
    case kTfLiteUInt8:
        ptr = (io == 0) ? interpreter->typed_input_tensor <uint8_t>(io_idx) :
                          interpreter->typed_output_tensor<uint8_t>(io_idx);
        break;
    case kTfLiteFloat32:
        ptr = (io == 0) ? interpreter->typed_input_tensor <float>(io_idx) :
                          interpreter->typed_output_tensor<float>(io_idx);
        break;
    case kTfLiteInt64:
        ptr = (io == 0) ? interpreter->typed_input_tensor <int64_t>(io_idx) :
                          interpreter->typed_output_tensor<int64_t>(io_idx);
        break;
    default:
        DBG_LOGE ("ERR: %s(%d)\n", __FILE__, __LINE__);
        return -1;
    }

    ptensor->idx    = tensor_idx;
    ptensor->io     = io;
    ptensor->io_idx = io_idx;
    ptensor->type   = tensor->type;
    ptensor->ptr    = ptr;
    ptensor->quant_scale = tensor->params.scale;
    ptensor->quant_zerop = tensor->params.zero_point;

    for (int i = 0; (i < 4) && (i < tensor->dims->size); i ++)
    {
        ptensor->dims[i] = tensor->dims->data[i];
    }

    return 0;
}

static tflite_tensor_t      s_detect_tensor_input;
static tflite_tensor_t      s_detect_tensor_scores;
static tflite_tensor_t      s_detect_tensor_bboxes;

tflite_get_tensor_by_name (&s_detect_interpreter, 0, "input",          &s_detect_tensor_input);
tflite_get_tensor_by_name (&s_detect_interpreter, 1, "regressors",     &s_detect_tensor_bboxes);
tflite_get_tensor_by_name (&s_detect_interpreter, 1, "classificators", &s_detect_tensor_scores);

根据模型配置可以读取支持输入图片宽高:

int det_input_w = s_detect_tensor_input.dims[2];
int det_input_h = s_detect_tensor_input.dims[1];

4.2 摄像头预览纹理转换为RGBA

将摄像头读取的纹理数据转换成RGBA模型才能识别,我们将纹理转换为内存数据:

    unsigned char *buf_ui8 = NULL;
    static unsigned char *pui8 = NULL;

    if (pui8 == NULL)
        pui8 = (unsigned char *)malloc(w * h * 4);

    buf_ui8 = pui8;

    draw_2d_texture_ex (srctex, 0, win_h - h, w, h, RENDER2D_FLIP_V);

    glPixelStorei (GL_PACK_ALIGNMENT, 4);
    glReadPixels (0, 0, w, h, GL_RGBA, GL_UNSIGNED_BYTE, buf_ui8);

需要想将摄像头读取的纹理绘制到帧缓存区,再通过OpenGL函数glReadPixels将纹理读取到内存缓存。

注意:glReadPixels是耗时操作

4.3 将图像数据feed到模型引擎进行推理

先通过上面获取的引起输入张量s_detect_tensor_input获取引起分配的输入缓存:

void *
get_blazeface_input_buf (int *w, int *h)
{
    *w = s_detect_tensor_input.dims[2];
    *h = s_detect_tensor_input.dims[1];
    return s_detect_tensor_input.ptr;
}

将上面获取的图片内容转换成float,赋给输入张量:

float mean = 128.0f;
    float std  = 128.0f;
    for (y = 0; y < h; y ++)
    {
        for (x = 0; x < w; x ++)
        {
            int r = *buf_ui8 ++;
            int g = *buf_ui8 ++;
            int b = *buf_ui8 ++;
            buf_ui8 ++;          /* skip alpha */
            *buf_fp32 ++ = (float)(r - mean) / std;
            *buf_fp32 ++ = (float)(g - mean) / std;
            *buf_fp32 ++ = (float)(b - mean) / std;
        }
    }

4.4 解析渲染结果

接下来调用解释器的Invoke()方法执行推理:

    if (interpreter->Invoke() != kTfLiteOk)
    {
        DBG_LOGE ("ERR: %s(%d)\n", __FILE__, __LINE__);
        return -1;
    }

接下来解析检测结果:

static int
decode_bounds (std::list<face_t> &face_list, float score_thresh, int input_img_w, int input_img_h)
{
    face_t face_item;
    float  *scores_ptr = (float *)s_detect_tensor_scores.ptr;

    int i = 0;
    for (auto itr = s_anchors.begin(); itr != s_anchors.end(); i ++, itr ++)
    {
        fvec2 anchor = *itr;
        float score0 = scores_ptr[i];
        float score = 1.0f / (1.0f + exp(-score0));

        if (score > score_thresh)
        {
            float *p = get_bbox_ptr (i);

            /* boundary box */
            float sx = p[0];
            float sy = p[1];
            float w  = p[2];
            float h  = p[3];

            float cx = sx + anchor.x;
            float cy = sy + anchor.y;

            cx /= (float)input_img_w;
            cy /= (float)input_img_h;
            w  /= (float)input_img_w;
            h  /= (float)input_img_h;

            fvec2 topleft, btmright;
            topleft.x  = cx - w * 0.5f;
            topleft.y  = cy - h * 0.5f;
            btmright.x = cx + w * 0.5f;
            btmright.y = cy + h * 0.5f;

            face_item.score    = score;
            face_item.topleft  = topleft;
            face_item.btmright = btmright;

            /* landmark positions (6 keys) */
            for (int j = 0; j < kFaceKeyNum; j ++)
            {
                float lx = p[4 + (2 * j) + 0];
                float ly = p[4 + (2 * j) + 1];
                lx += anchor.x;
                ly += anchor.y;
                lx /= (float)input_img_w;
                ly /= (float)input_img_h;

                face_item.keys[j].x = lx;
                face_item.keys[j].y = ly;
            }

            face_list.push_back (face_item);
        }
    }
    return 0;
}

face_t封装了识别结果中的得分、左上、右下坐标:

typedef struct _face_t
{
    float score;
    fvec2 topleft;
    fvec2 btmright;
    fvec2 keys[kFaceKeyNum];
} face_t;

通过坐标我们可以在识别到的“人脸”上绘制一个框:

a948595849d7481c392833e108ec633c17e1744e.gif

5. 总结

本文介绍了常见的AI开发步骤,以及常用的AI视觉应用。通过人脸检测功能,了解了tensorflow lite加载模型、输入数据、执行推理、获取结果等常用接口。

本文正在参加 人工智能创作者扶持计划