19-苹果官方ARKit+Vision+Core ML联合使用Demo解读

358 阅读3分钟

说明

ARKit文章目录

本文是对WWDC2018后推出的UsingVisionInRealTimeWithARKitDemo的解读.苹果在这个demo中演示了如何在ARKit中联合使用Vision和Core ML,并提出了一些注意事项.

主要内容

本Demo的主要内容是,从ARKit中获取视频帧,利用Vision中的ML请求,图像传给mlmodel模型,处理后得到的结果也是经过Vision框架返回,再在AR场景中放置SKNode节点展示出来.

最核心的代码只有两三个方法:

    // MARK: - ARSessionDelegate
    
    // Pass camera frames received from ARKit to Vision (when not already processing one)
    // 将从ARKit中得到的相机视频帧传到Vision(当没有正在处理的帧时)
    /// - Tag: ConsumeARFrames
    func session(_ session: ARSession, didUpdate frame: ARFrame) {
        // Do not enqueue other buffers for processing while another Vision task is still running.
        // 当另一个Vision任务正在运行时,不要将新的buffers添加到队列中
        // The camera stream has only a finite amount of buffers available; holding too many buffers for analysis would starve the camera.
        // 相机stream只有一个有限数量的buffers;持有太多buffers来分析,将会卡住相机.
        guard currentBuffer == nil, case .normal = frame.camera.trackingState else {
            return
        }
        
        // Retain the image buffer for Vision processing.
        // 强引用图片buffer以供Vision处理.
        self.currentBuffer = frame.capturedImage
        classifyCurrentImage()
    }

    // Run the Vision+ML classifier on the current image buffer.
    // 在当前图片buffer上运行Vision+ML分类器.
    /// - Tag: ClassifyCurrentImage
    private func classifyCurrentImage() {
        // Most computer vision tasks are not rotation agnostic so it is important to pass in the orientation of the image with respect to device.
        // 大部分计算型vision任务需要明确知道图片的朝向信息,所以需要将当前图片的朝向传入.
        let orientation = CGImagePropertyOrientation(UIDevice.current.orientation)
        
        let requestHandler = VNImageRequestHandler(cvPixelBuffer: currentBuffer!, orientation: orientation)
        visionQueue.async {
            do {
                // Release the pixel buffer when done, allowing the next buffer to be processed.
                // 处理完成后释放pixel buffer,允许处理下一个buffer.
                defer { self.currentBuffer = nil }
                try requestHandler.perform([self.classificationRequest])
            } catch {
                print("Error: Vision request failed with error \"\(error)\"")
            }
        }
    }

其中比较重要的就是:

  1. 一次只处理一个buffer;
  2. Vision在处理图片时需要传入朝向信息;

另外,如果你对Vision框架不熟悉的话,可能不知道:Vision框架中有一个专门的类,可以处理Core ML模型,这大大增强了Vision框架的使用范围,不止能完成默认的人脸识别,二维码识别,矩形识别等,还可以使用自己或别人训练出的任意ML模型,简直是神器!!

如果你不知道的话,相关代码如下,非常简单:

    // Vision classification request and model
    // Vision分类请求和模型
    /// - Tag: ClassificationRequest
    private lazy var classificationRequest: VNCoreMLRequest = {
        do {
            // Instantiate the model from its generated Swift class.
            // 从相关的Swift类中实例化模型.
            let model = try VNCoreMLModel(for: Inceptionv3().model)
            let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
                self?.processClassifications(for: request, error: error)
            })
            
            // Crop input images to square area at center, matching the way the ML model was trained.
            //  剪切输入的图像,只保留中间的正文形区域,以匹配ML模型训练的情形.
            request.imageCropAndScaleOption = .centerCrop
            
            // Use CPU for Vision processing to ensure that there are adequate GPU resources for rendering.
            // 让Vision只使用CPU处理任务,以便GPU能有更多资源用于渲染.
            request.usesCPUOnly = true
            
            return request
        } catch {
            fatalError("Failed to load Vision ML model: \(error)")
        }
    }()



    // Handle completion of the Vision request and choose results to display.
    // 处理Vision请求的结果, 并选择要展示的结果.
    /// - Tag: ProcessClassifications
    func processClassifications(for request: VNRequest, error: Error?) {
        guard let results = request.results else {
            print("Unable to classify image.\n\(error!.localizedDescription)")
            return
        }
        // The `results` will always be `VNClassificationObservation`s, as specified by the Core ML model in this project.
        // `results`一定是`VNClassificationObservation`,因为在本项目的Core ML模型中指定了类型.
        let classifications = results as! [VNClassificationObservation]
        
        // Show a label for the highest-confidence result (but only above a minimum confidence threshold).
        // 展示一个label来显示最可能的结果(至少达到最小置信阈值).
        if let bestResult = classifications.first(where: { result in result.confidence > 0.5 }),
            let label = bestResult.identifier.split(separator: ",").first {
            identifierString = String(label)
            confidence = bestResult.confidence
        } else {
            identifierString = ""
            confidence = 0
        }
        
        DispatchQueue.main.async { [weak self] in
            self?.displayClassifierResults()
        }
    }

具体代码请参见github.com/XanderXu/AR…及ReadMe文件.