Mujoco学习笔记

1,892 阅读7分钟

Mujoco

Mujoco最快速的入门方式就是先把下面的组成部分简单看一下,然后把Colab的example文件过一遍基本就能清楚整个流程了,连接在下方

资料

tutorial:mujoco.readthedocs.io/en/latest/o…

Colab startcode:colab.research.google.com/github/goog…

advanced Colab startcode:colab.research.google.com/github/goog…

常来作为连续空间强化学习算法的基准测试环境。它是一系列环境的集合(共有 20 个子环境),常用的子环境有 Ant, Half Cheetah, Hopper, Huanmoid, Walker2D 等等,下图所示为其中 Hopper 游戏

MuJoCo代码库被组织成对应于不同主要功能领域的子目录:

组成部分

引擎

模拟器(或物理引擎)是用C语言编写的,它负责所有运行时的计算。

解析器

XML解析器是用C++编写的。它可以解析MJCF模型和URDF模型,将它们转换为内部的mjCModel C++对象,不直接暴露给用户。

编译器

编译器是用C++编写的。它接收由解析器构建的mjCModel C++对象,并将其转换为运行时使用的mjModel C结构。

抽象的可视化器

抽象的可视化器是用C语言编写的,它生成一个代表仿真状态的抽象几何实体的列表,并提供实际渲染所需的所有信息。它还为摄像机和扰动控制提供了抽象的鼠标钩子。

3D可视化和3D渲染的差异是明显的,3D可视化所包含的信息比3D渲染包含的信息更多,包括但不限于零件的形状、尺寸、体积和重量等。常见的文件格式有STL、DWG、OBJ、3DS等。

除此之外,两者使用了不同的专用工具,比如3D可视化软件有SketchUp,AutoCAD,SolidWorks,Blender和Rhino3D,3D渲染工具包括V-Ray,Blender,3Delight和RenderMan。 首先进行可视化,然后进行渲染

image.png

OpenGL渲染器

渲染器是用C语言编写的,基于固定功能的OpenGL。它不具备最先进的渲染引擎的所有功能(如果需要的话,可以用这样的引擎来代替),但尽管如此,它还是提供了高效和信息丰富的3D渲染。

渲染器是3D引擎的核心部分,是高级全局照明渲染插件。它完成将3D物体绘制到屏幕上的任务。渲染器分为硬件渲染器和软件渲染器组成

渲染(Rendering)是指将三维模型或场景转换成二维图像的过程。在计算机图形学中,渲染是一个重要的概念,它是计算机图形学中最重要的技术之一。在渲染过程中,计算机会模拟光照、阴影、材质、纹理等视觉效果,并将其应用到场景或模型上,以生成逼真的图像或动画。

UI框架

UI框架(MuJoCo 2.0中的新框架)是用C语言编写的,UI元素用OpenGL渲染。它有自己的事件机制和用于键盘和鼠标输入的抽象挂钩。代码样本将它与GLFW一起使用,但它也可以与其他窗口库一起使用。

文档解析

描述机器人的文件 xml。

mujoco.readthedocs.io/en/latest/X…

xml

compiler: This element is used to set options for the built-in parser and compiler

visual:

option

asset

worldbody

tendon

Grouping element for tendon definitions. The attributes of fixed tendons are a subset of the attributes of spatial tendons, thus we document them only once under spatial tendons. Tendons can be used to impose length limits, simulate spring, damping and dry friction forces, as well as attach actuators to them.

sensor

actuator

position: This element creates a position servo

general:This element creates a general actuator, providing full access to all actuator components and allowing the user to specify them independently

general/gear: This attribute scales the length (and consequently moment arms, velocity and force) of the actuator

body

geom: This element creates a geom, and attaches it rigidly to the body within which the geom is defined. Multiple geoms can be attached to the same body. At runtime they determine the appearance and collision properties of the body

site:This element creates a site, which is a simplified and restricted kind of geom. A small subset of the geom attributes are available here

例子用法

MjModel

MuJoCo's mjModel, contains the model description, i.e., all quantities which do not change over time. The complete description of mjModel can be found at the end of the header file mjmodel.h. Note that the header files contain short, useful inline comments, describing each field.

Examples of quantities that can be found in mjModel are ngeom, the number of geoms in the scene and geom_rgba, their respective colors:

model = mujoco.MjModel.from_xml_string(xml) 

The from_xml_string() method invokes the model compiler, which creates a binary mjModel instance 除此之外还有from_xml_path() from_binary_path()

model.geom('body_name').id

Note that the 0th body is always the world. It cannot be renamed.

MjData

mjData contains the state and quantities that depend on it. The state is made up of time, generalized positions and generalized velocities. These are respectively data.time, data.qpos and data.qvel. In order to make a new mjData, all we need is our mjModel

data = mujoco.MjData(model)

data.geom_xpo 一开始是为0的,只有用前向运动学转换一下各部位的位置和姿态才能正确显示

mujoco.mj_kinematics(model, data)

Wait, why are both of our geoms at the origin? Didn't we offset the green sphere? The answer is that derived quantities in mjData need to be explicitly propagated (see below). In our case, the minimal required function is mj_kinematics, which computes global Cartesian poses for all objects (excluding cameras and lights).


Render

In order to render we'll need to instantiate a Renderer object and call its render method.

renderer = mujoco.Renderer(model)
media.show_image(renderer.render())
mujoco.mj_forward(model, data)
renderer.update_scene(data)
media.show_image(renderer.render())

mmm, why the black pixels?

Answer: For the same reason as above, we first need to propagate the values in mjData. This time we'll call mj_forward, which invokes the entire pipeline up to the computation of accelerations i.e., it computes x˙=f(x), where x is the state. This function does more than we actually need, but unless we care about saving computation time, it's good practice to call mj_forward since then we know we are not missing anything.

We also need to update the mjvScene which is an object held by the renderer describing the visual scene. We'll later see that the scene can include visual objects which are not part of the physical model

simulation

model.opt.gravity = (0,0,10) 反向重力

或者在xml文件里定义

<mujoco>
  <option gravity="0 0 10"/>
  ...
</mujoco>

MuJoCo uses a representation known as the "Lagrangian", "generalized" or "additive" representation, whereby objects have no degrees of freedom unless explicitly added using joints

控制状态变量使机器人运动起来

mujoco.mj_resetData(model, data)
data.joint('root').qvel = 10 主轴转速10
data.ctrl = np.random.randn(model.nu)可以控制joint

LQR

LQR算法的核心就是线性系统加损失函数

线性系统就是 x_t+h =Ax+Bu

因为大部分系统很明显不是线性的,所以需要线性化,即y = f(x,u) A = δf/δx,B=δf/δu

损失函数 J(x,u) = xTQx+uTRu,控制Q,R大小可以决定具体状态大小,如果某状态对应的Q的元素大,则期望该状态小,Q,R是正定矩阵

总的损失函数为V(x0)=Σ J(xt,π(xt))

最后求出来的π(x)就是一系列控制u的大小,具体的解就是

π*(x) = argmin V^π(x) = -Fx

{\displaystyle F=(R+B^{T}PB)^{-1}(B^{T}PA+N^{T}),}

{\displaystyle P={\mathcal {A}}^{T}P{\mathcal {A}}-{\mathcal {A}}^{T}PB\left(R+B^{T}PB\right)^{-1}B^{T}P{\mathcal {A}}+{\mathcal {Q}}}

当然这只是LQR一个简单的版本,具体的数学细节并没有概括,但用来学习Mujoco已经足够了,毕竟目标要明确,不要因为这个分心

初始化的状态并不能控制机器人的运动,现在以LQR为例介绍机器人的状态控制

前面说到系统需要线性化,要线性化就要找到参考点,参考点x好确定,就用初始状态就行,但是u怎么确定呢? 这里就要用到inverse_dynamic。

首先要用forward求出机器人期望状态,然后初始状态希望静止就是说加速度为0,则把data.qacc置为0

mujoco.mj_resetDataKeyframe(model, data, 1)
mujoco.mj_forward(model, data)
data.qacc = 0  # Assert that there is no the acceleration.
mujoco.mj_inverse(model, data)

但是有个问题,用这种方法求出的第三个自由很大,我们希望静止状态是由关节控制而非期待,所以要给机器人一个初始的位移

mujoco.mj_resetDataKeyframe(model, data, 1)
mujoco.mj_forward(model, data)
data.qacc = 0
data.qpos[2] += best_offset
qpos0 = data.qpos.copy()  # Save the position setpoint.
mujoco.mj_inverse(model, data)
qfrc0 = data.qfrc_inverse.copy()

现在求出来的是力,但是我们能控制的是驱动器actuator,所以要转化为actuator的量,用力乘moment矩阵的伪逆,求出来后给data赋值即可

ctrl0 = np.atleast_2d(qfrc0) @ np.linalg.pinv(data.actuator_moment)
ctrl0 = ctrl0.flatten()  # Save the ctrl setpoint.

接下来就是经典的选LQR参数的时候了,主要就是Q,R,下面是选参数时候的一些标准

  • The free joint will get a coefficient of 0, as that is already taken care of by the CoM cost term.
  • The joints required for balancing on the left leg, i.e. the left leg joints and the horizontal abdominal joints, should stay quite close to their initial values.
  • All the other joints should have a smaller coefficient, so that the humanoid will, for example, be able to flail its arms in order to balance.

选完之后,可以调用包求解P,K

P = scipy.linalg.solve_discrete_are(A, B, Q, R)

Compute the feedback gain matrix K.

K = np.linalg.inv(R + B.T @ P @ B) @ B.T @ P @ A

LQR control law: data.ctrl = ctrl0 - K @ dx