The paper about dynamic indoor perception review

180 阅读6分钟

空间感知和三维环境理解对于真实世界中高级任务的执行起到了关键的促进作用。为了执行高级指令,如:“在高楼的二楼寻找幸存者”。机器人需要建立语义概念(幸存者、地板、建筑物)转化为空间表示(即度量映射),生成度量-语义空间表示,其超越了通常由SLAM和视觉惯性里程计(VIO)【15】通道建立的映射模型。此外,将低级的障碍回避和运动规划与高级的任务规划连接起来,需要构建一个世界模型来捕捉不同抽象层次的现实。例如,虽然任务规划可能有效地描述一系列动作(例如,到达建筑入口、走楼梯、进入每个房间),但运动规划通常依赖于细粒度的映射表示(map representation)(例如,网格或体素模型)。理想情况下,空间感知应该能够建立抽象一致的层次结构,以同时提供运动和任务规划。当自动化系统部署在拥挤的环境中时,这个问题变得更加具挑战性。从自动驾驶汽车到工厂车间的协作机器人,单纯识别障碍物对于安全有效的导航/行动来说是不够的,而对场景中的动态实体(特别是人类)进行推理并预测它们的行为或意图【24】变得至关重要。
现有的方法并不能同时解决这些问题(度量-语义理解、可操作的层次抽象化、动态实体的建模)。早期在机器人技术中的映射表示方面的工作(例如,【16,28,50,51,103,113】)研究了层次表示,但主要是在二维和假设静态环境中;此外,这些工作都是在“深度学习革命”之前提出的,因此他们无法提供高级的语义理解。另一方面,快速增长的关于度量-语义映射(例如,【8,12,30,68,88,96,100】)的文献主要集中在“平面”表示(目标群(object constellations),度量-语义网格或体素模型)的本质不是分层的。最近的工作【5,41】尝试通过设计更丰富的表示形式(即三维场景图),以弥补这一缺陷。场景图是计算机图形和游戏应用中常用的一种数据结构,由图形模型组成,其中节点表示场景中的实体,边表示节点之间的空间或逻辑关系。虽然【5,41】的工作率先在机器人技术和视觉中使用三维场景图(之前的视觉工作集中在图像空间【17,33,35,116】中定义的二维场景图),但它们也有重要的缺点。Kim等人【41】只捕获对象,并错过了多个层次的抽象。Armeni等人【5】提供了一个层次模型,它对于可视化和知识组织是有用的,但不能捕获可操作的信息,如可遍历性,这是机器人导航的关键。最后,【41】和【5】都不能考虑或建模环境中的动态实体。

Reference

【5】Armeni I, He Z Y, Gwak J Y, et al. 3d scene graph: A structure for unified semantics, 3d space, and camera[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 5664-5673.
【8】Behley J, Garbade M, Milioto A, et al. Semantickitti: A dataset for semantic scene understanding of lidar sequences[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 9297-9307.
【12】Bowman S L, Atanasov N, Daniilidis K, et al. Probabilistic data association for semantic slam[C]//2017 IEEE international conference on robotics and automation (ICRA). IEEE, 2017: 1722-1729.
【15】Cadena C, Carlone L, Carrillo H, et al. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age[J]. IEEE Transactions on robotics, 2016, 32(6): 1309-1332.
【16】Chatila R, Laumond J P. Position referencing and consistent world modeling for mobile robots[C]//Proceedings. 1985 IEEE International Conference on Robotics and Automation. IEEE, 1985, 2: 138-145.
【17】Choi W, Chao Y W, Pantofaru C, et al. Understanding indoor scenes using 3d geometric phrases[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013: 33-40.
【24】Everett M, Chen Y F, How J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018: 3052-3059.
【28】Galindo C, Saffiotti A, Coradeschi S, et al. Multi-hierarchical semantic maps for mobile robotics[C]//2005 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2005: 2278-2283.
【30】Grinvald M, Furrer F, Novkovic T, et al. Volumetric instance-aware semantic mapping and 3D object discovery[J]. IEEE Robotics and Automation Letters, 2019, 4(3): 3037-3044.
【33】Huang S, Qi S, Zhu Y, et al. Holistic 3d scene parsing and reconstruction from a single rgb image[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 187-203.
【35】Jiang C, Qi S, Zhu Y, et al. Configurable 3d scene synthesis and 2d image rendering with per-pixel ground truth using stochastic grammars[J]. International Journal of Computer Vision, 2018, 126(9): 920-941.
【41】Kim U H, Park J M, Song T J, et al. 3-d scene graph: A sparse and semantic representation of physical environments for intelligent agents[J]. IEEE transactions on cybernetics, 2019, 50(12): 4921-4933.
【50】Kuipers B. Modeling spatial knowledge[J]. Cognitive science, 1978, 2(2): 129-153.
【51】Kuipers B. The spatial semantic hierarchy[J]. Artificial intelligence, 2000, 119(1-2): 191-233.
【68】McCormac J, Handa A, Davison A, et al. Semanticfusion: Dense 3d semantic mapping with convolutional neural networks[C]//2017 IEEE International Conference on Robotics and automation (ICRA). IEEE, 2017: 4628-4635.
【88】Rosinol A, Abate M, Chang Y, et al. Kimera: an open-source library for real-time metric-semantic localization and mapping[C]//2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020: 1689-1696.
【96】Salas-Moreno R F, Newcombe R A, Strasdat H, et al. Slam++: Simultaneous localisation and mapping at the level of objects[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2013: 1352-1359.
【100】Tateno K, Tombari F, Navab N. Real-time and scalable incremental segmentation on dense slam[C]//2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015: 4465-4472.
【103】Vasudevan S, Gächter S, Nguyen V, et al. Cognitive maps for mobile robots—an object based approach[J]. Robotics and Autonomous Systems, 2007, 55(5): 359-371.
【113】Zender H, Mozos O M, Jensfelt P, et al. Conceptual spatial representations for indoor mobile robots[J]. Robotics and Autonomous Systems, 2008, 56(6): 493-502.
【116】Zhao Y, Zhu S C. Scene parsing by integrating function, geometry and appearance models[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2013: 3119-3126.