说明
本文是对WWDC2018后推出的UsingVisionInRealTimeWithARKitDemo的解读.苹果在这个demo中演示了如何在ARKit中联合使用Vision和Core ML,并提出了一些注意事项.
主要内容
本Demo的主要内容是,从ARKit中获取视频帧,利用Vision中的ML请求,图像传给mlmodel模型,处理后得到的结果也是经过Vision框架返回,再在AR场景中放置SKNode节点展示出来.
最核心的代码只有两三个方法:
// MARK: - ARSessionDelegate
// Pass camera frames received from ARKit to Vision (when not already processing one)
// 将从ARKit中得到的相机视频帧传到Vision(当没有正在处理的帧时)
/// - Tag: ConsumeARFrames
func session(_ session: ARSession, didUpdate frame: ARFrame) {
// Do not enqueue other buffers for processing while another Vision task is still running.
// 当另一个Vision任务正在运行时,不要将新的buffers添加到队列中
// The camera stream has only a finite amount of buffers available; holding too many buffers for analysis would starve the camera.
// 相机stream只有一个有限数量的buffers;持有太多buffers来分析,将会卡住相机.
guard currentBuffer == nil, case .normal = frame.camera.trackingState else {
return
}
// Retain the image buffer for Vision processing.
// 强引用图片buffer以供Vision处理.
self.currentBuffer = frame.capturedImage
classifyCurrentImage()
}
// Run the Vision+ML classifier on the current image buffer.
// 在当前图片buffer上运行Vision+ML分类器.
/// - Tag: ClassifyCurrentImage
private func classifyCurrentImage() {
// Most computer vision tasks are not rotation agnostic so it is important to pass in the orientation of the image with respect to device.
// 大部分计算型vision任务需要明确知道图片的朝向信息,所以需要将当前图片的朝向传入.
let orientation = CGImagePropertyOrientation(UIDevice.current.orientation)
let requestHandler = VNImageRequestHandler(cvPixelBuffer: currentBuffer!, orientation: orientation)
visionQueue.async {
do {
// Release the pixel buffer when done, allowing the next buffer to be processed.
// 处理完成后释放pixel buffer,允许处理下一个buffer.
defer { self.currentBuffer = nil }
try requestHandler.perform([self.classificationRequest])
} catch {
print("Error: Vision request failed with error \"\(error)\"")
}
}
}
其中比较重要的就是:
- 一次只处理一个buffer;
- Vision在处理图片时需要传入朝向信息;
另外,如果你对Vision框架不熟悉的话,可能不知道:Vision框架中有一个专门的类,可以处理Core ML模型,这大大增强了Vision框架的使用范围,不止能完成默认的人脸识别,二维码识别,矩形识别等,还可以使用自己或别人训练出的任意ML模型,简直是神器!!
如果你不知道的话,相关代码如下,非常简单:
// Vision classification request and model
// Vision分类请求和模型
/// - Tag: ClassificationRequest
private lazy var classificationRequest: VNCoreMLRequest = {
do {
// Instantiate the model from its generated Swift class.
// 从相关的Swift类中实例化模型.
let model = try VNCoreMLModel(for: Inceptionv3().model)
let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
self?.processClassifications(for: request, error: error)
})
// Crop input images to square area at center, matching the way the ML model was trained.
// 剪切输入的图像,只保留中间的正文形区域,以匹配ML模型训练的情形.
request.imageCropAndScaleOption = .centerCrop
// Use CPU for Vision processing to ensure that there are adequate GPU resources for rendering.
// 让Vision只使用CPU处理任务,以便GPU能有更多资源用于渲染.
request.usesCPUOnly = true
return request
} catch {
fatalError("Failed to load Vision ML model: \(error)")
}
}()
// Handle completion of the Vision request and choose results to display.
// 处理Vision请求的结果, 并选择要展示的结果.
/// - Tag: ProcessClassifications
func processClassifications(for request: VNRequest, error: Error?) {
guard let results = request.results else {
print("Unable to classify image.\n\(error!.localizedDescription)")
return
}
// The `results` will always be `VNClassificationObservation`s, as specified by the Core ML model in this project.
// `results`一定是`VNClassificationObservation`,因为在本项目的Core ML模型中指定了类型.
let classifications = results as! [VNClassificationObservation]
// Show a label for the highest-confidence result (but only above a minimum confidence threshold).
// 展示一个label来显示最可能的结果(至少达到最小置信阈值).
if let bestResult = classifications.first(where: { result in result.confidence > 0.5 }),
let label = bestResult.identifier.split(separator: ",").first {
identifierString = String(label)
confidence = bestResult.confidence
} else {
identifierString = ""
confidence = 0
}
DispatchQueue.main.async { [weak self] in
self?.displayClassifierResults()
}
}
具体代码请参见github.com/XanderXu/AR…及ReadMe文件.