基于Google MidiaPipe 手势识别模型的前端交互实现基于Google MidiaPipe 手势识别模型的前端

MidiaPipe手势识别Web实现

官方网址系列:

google MediaPipe网址 ai.google.dev/edge/mediap…

本地复现官网示例

将官方示例的所有代码直接复制下来,亲测可以运行.

我创建了一个style标签,将css内容复制进style内,再创建了一个script标签,将js复制进script内.注意,需要加上type=module.其次,官方示例的js内有几个变量加上了ts类型标注.需要删掉他们才可以运行.还有,需要🪜.否则会非常慢.

我亲测可以运行的的html文件在文末附录1,按需自取. 相信有前端基础的同学可以很容易的理解Google官方提供的js代码的含义.这篇文章着重记录关于摄像头交互的部分. 排除大部分的八股,对于要实现业务逻辑的开发者来说,最重要的就是244行左右的这段代码中的results变量:

if (video.currentTime !== lastVideoTime) {
    lastVideoTime = video.currentTime;
    results = gestureRecognizer.recognizeForVideo(video, nowInMs);
}

它就记录着AI识别得到的结果.我们根据这个结果来实现各种交互.可以打印出来查看.以及官方给出的代码中后续对results的操作也可以参考.

定制模型

注意到js示例中的一个函数:

const createGestureRecognizer = async () => {
    const vision = await FilesetResolver.forVisionTasks(
        "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3/wasm"
    );
    gestureRecognizer = await GestureRecognizer.createFromOptions(vision, {
        baseOptions: {
            modelAssetPath:
                "https://storage.googleapis.com/mediapipe-models/gesture_recognizer/gesture_recognizer/float16/1/gesture_recognizer.task",
            delegate: "GPU"
        },
        runningMode: runningMode
    });
    demosSection.classList.remove("invisible");
};
createGestureRecognizer();

其中的modelAssetPath就是识别手势的模型路径.我们可以在自己的IDE中ctrl(command)单击打开下载下来.是一个二进制文件.

官方的模型默认只支持识别7种手势:

0 - Unrecognized gesture, label: Unknown
1 - Closed fist, label: Closed_Fist
2 - Open palm, label: Open_Palm
3 - Pointing up, label: Pointing_Up
4 - Thumbs down, label: Thumb_Down
5 - Thumbs up, label: Thumb_Up
6 - Victory, label: Victory
7 - Love, label: ILoveYou

资料来源 ai.google.dev/edge/mediap…

如果我们需要识别更多的手势,需要自己训练自己的模型.

使用云计算训练

点击下面的网址(如果Google官方将其移动到别处,请读者自行查找) 定制自己的模型 ai.google.dev/edge/mediap…

iShot_2024-08-15_16.26.32.png 用Colab来训练模型.

准备训练图片文件

模型制作器中用于手势识别的数据集需要以下格式：<dataset_path>/<label_name>/<img_name>.*。此外，其中一个标签名称（label_names）必须是none。none标签代表未被归类为其他手势之一的任何手势。

上面是我从官网复制的关于数据格式的描述.也就是我们需要遵守如下的文件结构:

data
   ├── label_name1
   ├── label_name2
   ├── label_name3
   └── none

data文件夹的每个子文件夹的名字就是你的手势的名字.这些子文件夹内就存放你拍摄的手势图片.而且,必须有一个none文件夹里面存放各种各样乱七八糟的手势.一定得出现手,手势也一定得乱七八糟,不能是随便什么图片.否则在画面没有手时它也会识别为none手势的!⚠️

对ai训练数据的准备想来都是一件麻烦事,附录2提供了一种解决办法的源代码,通过python的openCV库调用本机摄像头,实时拍摄你自己的手势作为训练数据.这部分的内容比较简单,我不详细介绍,读者可以自行查阅资料或者询问大语言模型.

导入图片文件

点进Colab后, 我们会看到一个含有许多python代码块的界面. 点开左侧的文件夹icon,打开文件.我们需要将我们的图片文件上传到此处.右键上传即可.

iShot_2024-08-15_16.29.41.png

这里需要注意,训练数据要求比较严格.我使用的Mac系统会到处创建.DS_Store文件,这些文件对于模型训练程序是不合法的文件.读者上传上去后要记得检查是否有除了图片意外的乱七八糟的文件.否则训练程序会报错,无法完成训练.

修改代码

回到代码块界面,往下翻,找到右侧这一代码块.官方文档给了一个的训练数据的示例,这些训练数据是从互联网上下载的,我们可以将这个网址复制到浏览器下载到本地查看.是剪刀石头布的手势图片.

iShot_2024-08-15_16.34.32.png

我们可以复制这里的none文件夹作为我们训练数据的none文件夹

我们要训练自己的手势,当然需要修改这里的代码. 将上面红框的代码块改为如下:

dataset_path = "[路径]"

这里的[路径]需要替换为你上传到云端的路径.可以在左侧文件界面对你上传的文件右键复制路径得到路径然后再在顶部菜单栏选择代码执行程序 > 全部运行即可运行所有代码.

获取模型 运行结束后浏览器会下载二进制模型.在云端文件也可以看到我们训练出来的模型.其对应的python代码在这里:

files.download('exported_model/gesture_recognizer.task')

将其复制到我们的项目文件夹,修改js导入模型的路径即可.

我的项目

gitee链接 gitee.com/tan-xuan0/t… 是一款基于手势识别来控制俄罗斯方块运动的交互游戏.也基于electron,可以编译为跨平台应用. 游戏部分的代码我借鉴了这个作者 juejin.cn/post/706079… 的代码.如有侵权,可以给我留言或者发我邮件1917443804@qq.com联系我.

iShot_2024-08-15_17.29.35.png

附录1

<!-- Copyright 2023 The MediaPipe Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. -->
<link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet">
<script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>

<h1>Recognize hand gestures using the MediaPipe HandGestureRecognizer task</h1>

<section id="demos" class="invisible">
    <h2>Demo: Recognize gestures</h2>
    <p><em>Click on an image below</em> to identify the gestures in the image.</p>

    <div class="detectOnClick">
        <img src="https://assets.codepen.io/9177687/idea-gcbe74dc69_1920.jpg" crossorigin="anonymous"
            title="Click to get recognize!" />
        <p class="classification removed">
    </div>
    <div class="detectOnClick">
        <img src="https://assets.codepen.io/9177687/thumbs-up-ga409ddbd6_1.png" crossorigin="anonymous"
            title="Click to get recognize!" />
        <p class="classification removed">
    </div>

    <h2><br>Demo: Webcam continuous hand gesture detection</h2>
    <p>Use your hand to make gestures in front of the camera to get gesture classification. </br>Click <b>enable
            webcam</b> below and grant access to the webcam if prompted.</p>

    <div id="liveView" class="videoView">
        <button id="webcamButton" class="mdc-button mdc-button--raised">
            <span class="mdc-button__ripple"></span>
            <span class="mdc-button__label">ENABLE WEBCAM</span>
        </button>
        <div style="position: relative;">
            <video id="webcam" autoplay playsinline></video>
            <canvas class="output_canvas" id="output_canvas" width="1280" height="720"
                style="position: absolute; left: 0px; top: 0px;"></canvas>
            <p id='gesture_output' class="output">
        </div>
    </div>
</section>
<script type="module">// Copyright 2023 The MediaPipe Authors.

    // Licensed under the Apache License, Version 2.0 (the "License");
    // you may not use this file except in compliance with the License.
    // You may obtain a copy of the License at

    //      http://www.apache.org/licenses/LICENSE-2.0

    // Unless required by applicable law or agreed to in writing, software
    // distributed under the License is distributed on an "AS IS" BASIS,
    // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    // See the License for the specific language governing permissions and
    // limitations under the License.
    import {
        GestureRecognizer,
        FilesetResolver,
        DrawingUtils
    } from "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3";

    const demosSection = document.getElementById("demos");
    let gestureRecognizer;
    let runningMode = "IMAGE";
    let enableWebcamButton;
    let webcamRunning = false;
    const videoHeight = "360px";
    const videoWidth = "480px";

    // Before we can use HandLandmarker class we must wait for it to finish
    // loading. Machine Learning models can be large and take a moment to
    // get everything needed to run.
    const createGestureRecognizer = async () => {
        const vision = await FilesetResolver.forVisionTasks(
            "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3/wasm"
        );
        gestureRecognizer = await GestureRecognizer.createFromOptions(vision, {
            baseOptions: {
                modelAssetPath:
                    "https://storage.googleapis.com/mediapipe-models/gesture_recognizer/gesture_recognizer/float16/1/gesture_recognizer.task",
                delegate: "GPU"
            },
            runningMode: runningMode
        });
        demosSection.classList.remove("invisible");
    };
    createGestureRecognizer();

    /********************************************************************
    // Demo 1: Detect hand gestures in images
    ********************************************************************/

    const imageContainers = document.getElementsByClassName("detectOnClick");

    for (let i = 0; i < imageContainers.length; i++) {
        imageContainers[i].children[0].addEventListener("click", handleClick);
    }

    async function handleClick(event) {
        if (!gestureRecognizer) {
            alert("Please wait for gestureRecognizer to load");
            return;
        }

        if (runningMode === "VIDEO") {
            runningMode = "IMAGE";
            await gestureRecognizer.setOptions({ runningMode: "IMAGE" });
        }
        // Remove all previous landmarks
        const allCanvas = event.target.parentNode.getElementsByClassName("canvas");
        for (var i = allCanvas.length - 1; i >= 0; i--) {
            const n = allCanvas[i];
            n.parentNode.removeChild(n);
        }

        const results = gestureRecognizer.recognize(event.target);

        // View results in the console to see their format
        console.log(results);
        if (results.gestures.length > 0) {
            const p = event.target.parentNode.childNodes[3];
            p.setAttribute("class", "info");

            const categoryName = results.gestures[0][0].categoryName;
            const categoryScore = parseFloat(
                results.gestures[0][0].score * 100
            ).toFixed(2);
            const handedness = results.handednesses[0][0].displayName;

            p.innerText = `GestureRecognizer: ${categoryName}\n Confidence: ${categoryScore}%\n Handedness: ${handedness}`;
            p.style =
                "left: 0px;" +
                "top: " +
                event.target.height +
                "px; " +
                "width: " +
                (event.target.width - 10) +
                "px;";

            const canvas = document.createElement("canvas");
            canvas.setAttribute("class", "canvas");
            canvas.setAttribute("width", event.target.naturalWidth + "px");
            canvas.setAttribute("height", event.target.naturalHeight + "px");
            canvas.style =
                "left: 0px;" +
                "top: 0px;" +
                "width: " +
                event.target.width +
                "px;" +
                "height: " +
                event.target.height +
                "px;";

            event.target.parentNode.appendChild(canvas);
            const canvasCtx = canvas.getContext("2d");
            const drawingUtils = new DrawingUtils(canvasCtx);
            for (const landmarks of results.landmarks) {
                drawingUtils.drawConnectors(
                    landmarks,
                    GestureRecognizer.HAND_CONNECTIONS,
                    {
                        color: "#00FF00",
                        lineWidth: 5
                    }
                );
                drawingUtils.drawLandmarks(landmarks, {
                    color: "#FF0000",
                    lineWidth: 1
                });
            }
        }
    }

    /********************************************************************
    // Demo 2: Continuously grab image from webcam stream and detect it.
    ********************************************************************/

    const video = document.getElementById("webcam");
    const canvasElement = document.getElementById("output_canvas");
    const canvasCtx = canvasElement.getContext("2d");
    const gestureOutput = document.getElementById("gesture_output");

    // Check if webcam access is supported.
    function hasGetUserMedia() {
        return !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
    }

    // If webcam supported, add event listener to button for when user
    // wants to activate it.
    if (hasGetUserMedia()) {
        enableWebcamButton = document.getElementById("webcamButton");
        enableWebcamButton.addEventListener("click", enableCam);
    } else {
        console.warn("getUserMedia() is not supported by your browser");
    }

    // Enable the live webcam view and start detection.
    function enableCam(event) {
        if (!gestureRecognizer) {
            alert("Please wait for gestureRecognizer to load");
            return;
        }

        if (webcamRunning === true) {
            webcamRunning = false;
            enableWebcamButton.innerText = "ENABLE PREDICTIONS";
        } else {
            webcamRunning = true;
            enableWebcamButton.innerText = "DISABLE PREDICTIONS";
        }

        // getUsermedia parameters.
        const constraints = {
            video: true
        };

        // Activate the webcam stream.
        navigator.mediaDevices.getUserMedia(constraints).then(function (stream) {
            video.srcObject = stream;
            video.addEventListener("loadeddata", predictWebcam);
        });
    }

    let lastVideoTime = -1;
    let results = undefined;
    async function predictWebcam() {
        const webcamElement = document.getElementById("webcam");
        // Now let's start detecting the stream.
        if (runningMode === "IMAGE") {
            runningMode = "VIDEO";
            await gestureRecognizer.setOptions({ runningMode: "VIDEO" });
        }
        let nowInMs = Date.now();
        if (video.currentTime !== lastVideoTime) {
            lastVideoTime = video.currentTime;
            results = gestureRecognizer.recognizeForVideo(video, nowInMs);
        }

        canvasCtx.save();
        canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
        const drawingUtils = new DrawingUtils(canvasCtx);

        canvasElement.style.height = videoHeight;
        webcamElement.style.height = videoHeight;
        canvasElement.style.width = videoWidth;
        webcamElement.style.width = videoWidth;

        if (results.landmarks) {
            for (const landmarks of results.landmarks) {
                drawingUtils.drawConnectors(
                    landmarks,
                    GestureRecognizer.HAND_CONNECTIONS,
                    {
                        color: "#00FF00",
                        lineWidth: 5
                    }
                );
                drawingUtils.drawLandmarks(landmarks, {
                    color: "#FF0000",
                    lineWidth: 2
                });
            }
        }
        canvasCtx.restore();
        if (results.gestures.length > 0) {
            gestureOutput.style.display = "block";
            gestureOutput.style.width = videoWidth;
            const categoryName = results.gestures[0][0].categoryName;
            const categoryScore = parseFloat(
                results.gestures[0][0].score * 100
            ).toFixed(2);
            const handedness = results.handednesses[0][0].displayName;
            gestureOutput.innerText = `GestureRecognizer: ${categoryName}\n Confidence: ${categoryScore} %\n Handedness: ${handedness}`;
        } else {
            gestureOutput.style.display = "none";
        }
        // Call this function again to keep predicting when the browser is ready.
        if (webcamRunning === true) {
            window.requestAnimationFrame(predictWebcam);
        }
    }
</script>
<style>
    /* Copyright 2023 The MediaPipe Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

    @use "@material";

    body {
        font-family: roboto;
        margin: 2em;
        color: #3d3d3d;
        --mdc-theme-primary: #007f8b;
        --mdc-theme-on-primary: #f1f3f4;
    }

    h1 {
        color: #007f8b;
    }

    h2 {
        clear: both;
    }

    video {
        clear: both;
        display: block;
        transform: rotateY(180deg);
        -webkit-transform: rotateY(180deg);
        -moz-transform: rotateY(180deg);
        height: 280px;
    }

    section {
        opacity: 1;
        transition: opacity 500ms ease-in-out;
    }

    .removed {
        display: none;
    }

    .invisible {
        opacity: 0.2;
    }

    .detectOnClick {
        position: relative;
        float: left;
        width: 48%;
        margin: 2% 1%;
        cursor: pointer;
    }

    .videoView {
        position: absolute;
        float: left;
        width: 48%;
        margin: 2% 1%;
        cursor: pointer;
        min-height: 500px;
    }

    .videoView p,
    .detectOnClick p {
        padding-top: 5px;
        padding-bottom: 5px;
        background-color: #007f8b;
        color: #fff;
        border: 1px dashed rgba(255, 255, 255, 0.7);
        z-index: 2;
        margin: 0;
    }

    .highlighter {
        background: rgba(0, 255, 0, 0.25);
        border: 1px dashed #fff;
        z-index: 1;
        position: absolute;
    }

    .canvas {
        z-index: 1;
        position: absolute;
        pointer-events: none;
    }

    .output_canvas {
        transform: rotateY(180deg);
        -webkit-transform: rotateY(180deg);
        -moz-transform: rotateY(180deg);
    }

    .detectOnClick {
        z-index: 0;
        font-size: calc(8px + 1.2vw);
    }

    .detectOnClick img {
        width: 45vw;
    }

    .output {
        display: none;
        width: 100%;
        font-size: calc(8px + 1.2vw);
    }
</style>

附录2

import cv2 
import os
 
# 视频文件路径
# video_path = '61.mp4'

# 输出图片的文件夹路径
output_folder = 'Left'
 
# 确保输出文件夹存在
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
 
# 打开视频文件
cap = cv2.VideoCapture(0)
begin = 30
frame_count = begin
while True:
    # 逐帧读取视频
    success, frame = cap.read()
    if not success:
        break  # 如果没有更多帧，则退出循环
 
    # 构建输出图片的文件名
    output_filename = os.path.join(output_folder, f'frame_{frame_count:04d}.jpg')
 
    # 翻转图片
    frame = cv2.flip(frame,1)
    if frame_count > 10:
        # 保存帧为图片
        cv2.imwrite(output_filename, frame)
        print("保存")
     # imshow方法展示窗口，第一个参数为窗口的名字，第二个参数为帧数
    cv2.imshow("frame", frame);
    # 延迟一毫秒
    cv2.waitKey(250)
    frame_count += 1
    if frame_count > 30 + begin :
        break
# 释放视频对象
cap.release()
 
print(f'共保存了 {frame_count} 帧图片。')