前言
在上一篇《从零开始,手把手教你搭建Lerobot机械臂》中,我们已经完成了机械臂的拼搭、校准、以及数据采集。
相信很多小伙伴(特别是 Mac用户)满怀激情地走到第8步,却被现实狠狠泼了一盆冷水——本地训练。
- 用CPU硬抗? 训练一次等到地老天荒,热情都耗尽了。
- 买4090显卡? 动辄上万的价格,为了一个业余爱好,确实让人望而却步。
看似完美的方案是:租一台带 NVIDIA RTX 4090 的云端服务器,几块钱一小时,速度快又实惠。
但在实际操作中:云端训练其实门槛也不低——从Linux环境配置到依赖安装,再到数据集上传和模型下载,面对控制台那“一大坨”复杂的指令,很容易让人放弃。
最近我发现 算力自由 (GPUFree) 平台解决了不少痛点——他们已经直接打包好了 lerobot+ act/pi0.5/smolval 的官方镜像环境!这意味着最让人头秃的环境配置环节直接由平台搞定。
今天这篇教程,就手把手教大家利用 GPUFree 的预置环境,配合我写的500行网页代码,只需几块钱,让你的轻松搞定lerobot远程训练!
准备工作
在开始之前,请确保你已经完成了上一篇教程的“数据采集”步骤,并且在你的本地电脑上有一个数据集文件夹(例如 ~/.cache/huggingface/lerobot/mytest/so100_test)。
注意:~/.cache/ 是系统盘 30GB且不可扩容,放到数据盘50GB且能扩容,建议放到数据盘回好一点
远端训练
1. 租用云端显卡
随便找一个能直接使用的云端显卡,比如算力自由 (gpufree.cn),或者是其他的 这里以算力自由为例,我们在市场中选择RTX 4090
- 选择显卡:在市场中筛选 RTX 4090。
- 配置镜像:
○ 操作系统:Ubuntu 20.04 或 22.04 均可
○ Python版本:建议选择预装 Miniconda 或 Pytorch 的镜像(省去安装conda的时间)。
- 创建实例:点击立即租赁。建议先充值10块钱,足够我们训练好几次了。
- 启动实例:平台带显卡和不带显卡启动两种方式启动。训练的时候需要带卡启动,其余时候都选择无卡启动
2. 命令行方式操作(较麻烦)
可以参考这个文档操作,确实比其他算力平台简化了不少。
3. 用WEBUI训练
算力中心提供的 lerobot 镜像已预置好完整环境,省去了复杂的配置流程,让你可以专注于上传数据、开始训练和下载模型。 然而实际操作中,数据集上传、长时间训练以及模型下载往往耗时较长;频繁登录服务器执行命令也相当麻烦。更让人头疼的是,训练结束后若忘记关闭服务器,可能导致持续计费,一觉醒来账户余额清零。 为此,用cursor vibe coding了一个简洁的网页工具,让我在同一个页面内即可完成全部关键操作,大幅提升效率与使用体验。
网页功能亮点:
- 一键上传:直接选择本地数据集文件夹,快速上传至服务器指定位置。
- 训练配置与启动:选择已上传的数据目录,并选取训练算法(如 ACT、Pi0.5、Smolval 等),即可开始训练。
- 训练后自动关机:训练完成后支持自动关闭服务器实例,有效避免闲置计费,节省成本。
- 便捷下载模型:训练结束后,可直接从网页选择生成好的模型文件,通过 HTTP 链接下载至本地,并使用一行命令解压到模型目录,快速完成部署。
3.1启动WEB站点
在控制面部打开JupyterLab,在root根目录下新建remote_train.ipynb文件,拷贝下面代码,点击运行。 ssh登陆服务器,创建py文件运行也可以,看个人爱好。 程序使用7001端口启动,是因为平台会把7001端口映射成公网可访问地址,其次重新容器后地址都不一样。
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
LeRobot 远程训练平台
支持数据集上传、模型训练、模型下载
"""
import os
import time
import subprocess
import threading
import shutil
import re
from flask import Flask, request, jsonify, send_file, render_template_string
from werkzeug.utils import secure_filename
app = Flask(__name__)
# 训练进程管理
training_processes = {}
training_logs = {}
training_configs = {}
# 目录配置
UPLOAD_FOLDER = './upload_temp'
DOWNLOAD_FOLDER = './output'
DOWNLOAD_TEMP_FOLDER = './download_temp'
MAX_CONTENT_LENGTH = 10 * 1024 * 1024 * 1024
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.config['DOWNLOAD_FOLDER'] = DOWNLOAD_FOLDER
app.config['MAX_CONTENT_LENGTH'] = MAX_CONTENT_LENGTH
def ensure_directories():
"""确保目录存在"""
for folder in [UPLOAD_FOLDER, DOWNLOAD_FOLDER, DOWNLOAD_TEMP_FOLDER]:
if not os.path.exists(folder):
os.makedirs(folder, exist_ok=True)
@app.route('/list_datasets', methods=['GET'])
def list_datasets():
"""列出数据集"""
ensure_directories()
try:
datasets = []
abs_path = os.path.abspath(UPLOAD_FOLDER)
if os.path.exists(abs_path):
for name in os.listdir(abs_path):
folder = os.path.join(abs_path, name)
if os.path.isdir(folder):
is_valid = all(os.path.isdir(os.path.join(folder, d)) for d in ['meta', 'data', 'videos'])
datasets.append({
'name': name, 'path': folder, 'is_valid': is_valid,
'modified_time': time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(os.path.getmtime(folder))),
'timestamp': os.path.getmtime(folder)
})
datasets.sort(key=lambda x: x['timestamp'], reverse=True)
return jsonify({'success': True, 'datasets': datasets, 'total': len(datasets)})
except Exception as e:
return jsonify({'error': str(e)}), 500
def extract_output_dir(command):
match = re.search(r'--output_dir[=\s]+([^\s\\]+)', command)
return match.group(1) if match else None
def pack_model(output_dir, task_id):
ensure_directories()
if not output_dir or not os.path.exists(output_dir):
return None, f"目录不存在: {output_dir}"
try:
folder_name = os.path.basename(output_dir.rstrip('/'))
zip_name = f"{folder_name}_{time.strftime('%Y%m%d_%H%M%S')}"
zip_path = os.path.join(DOWNLOAD_TEMP_FOLDER, zip_name)
shutil.make_archive(zip_path, 'zip', output_dir)
return f"{zip_name}.zip", None
except Exception as e:
return None, str(e)
@app.route('/start_training', methods=['POST'])
def start_training():
"""开始训练"""
global training_processes, training_logs, training_configs
try:
data = request.get_json()
if not data or 'command' not in data:
return jsonify({'error': '缺少训练命令'}), 400
command = data['command']
task_id = data.get('task_id', f"train_{int(time.time())}")
shutdown_after = data.get('shutdown_after', False)
output_dir = extract_output_dir(command)
if task_id in training_processes and training_processes[task_id].poll() is None:
training_processes[task_id].terminate()
training_configs[task_id] = {'output_dir': output_dir, 'shutdown_after': shutdown_after}
training_logs[task_id] = []
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
text=True, bufsize=1, env={**os.environ, 'PYTHONUNBUFFERED': '1'})
training_processes[task_id] = process
def collect_logs():
try:
for line in iter(process.stdout.readline, ''):
if line:
training_logs[task_id].append({'time': time.strftime('%H:%M:%S'), 'message': line.rstrip()})
if len(training_logs[task_id]) > 1000:
training_logs[task_id] = training_logs[task_id][-1000:]
process.wait()
config = training_configs.get(task_id, {})
if process.returncode == 0 and config.get('output_dir'):
training_logs[task_id].append({'time': time.strftime('%H:%M:%S'), 'message': '[系统] 开始打包模型...'})
zip_name, err = pack_model(config['output_dir'], task_id)
msg = f'[系统] 打包完成: {zip_name}' if zip_name else f'[系统] 打包失败: {err}'
training_logs[task_id].append({'time': time.strftime('%H:%M:%S'), 'message': msg})
if config.get('shutdown_after'):
training_logs[task_id].append({'time': time.strftime('%H:%M:%S'), 'message': '[系统] 即将关机...'})
time.sleep(3)
os.system('shutdown -h now')
except Exception as e:
training_logs[task_id].append({'time': time.strftime('%H:%M:%S'), 'message': f'[错误] {e}'})
threading.Thread(target=collect_logs, daemon=True).start()
return jsonify({'success': True, 'task_id': task_id, 'pid': process.pid})
except Exception as e:
return jsonify({'error': str(e)}), 500
@app.route('/training_status', methods=['GET'])
def training_status():
"""获取训练状态"""
task_id = request.args.get('task_id')
last_index = int(request.args.get('last_index', 0))
if not task_id:
return jsonify({'error': '缺少task_id'}), 400
is_running = task_id in training_processes and training_processes[task_id].poll() is None
exit_code = None if is_running else (training_processes[task_id].returncode if task_id in training_processes else None)
logs = training_logs.get(task_id, [])
return jsonify({
'success': True, 'task_id': task_id, 'is_running': is_running,
'exit_code': exit_code, 'logs': logs[last_index:], 'last_index': len(logs)
})
@app.route('/stop_training', methods=['POST'])
def stop_training():
"""停止训练"""
try:
task_id = request.get_json().get('task_id')
if not task_id or task_id not in training_processes:
return jsonify({'error': '任务不存在'}), 404
process = training_processes[task_id]
if process.poll() is None:
process.terminate()
try:
process.wait(timeout=5)
except:
process.kill()
training_logs[task_id].append({'time': time.strftime('%H:%M:%S'), 'message': '[系统] 训练已停止'})
return jsonify({'success': True})
except Exception as e:
return jsonify({'error': str(e)}), 500
@app.route('/list_models', methods=['GET'])
def list_models():
"""列出模型"""
ensure_directories()
try:
models = []
abs_path = os.path.abspath(DOWNLOAD_TEMP_FOLDER)
if os.path.exists(abs_path):
for name in os.listdir(abs_path):
fp = os.path.join(abs_path, name)
if os.path.isfile(fp) and name.endswith('.zip'):
size = os.path.getsize(fp)
models.append({
'name': name, 'size_mb': f"{size/(1024*1024):.2f} MB",
'modified_time': time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(os.path.getmtime(fp))),
'timestamp': os.path.getmtime(fp)
})
models.sort(key=lambda x: x['timestamp'], reverse=True)
return jsonify({'success': True, 'models': models})
except Exception as e:
return jsonify({'error': str(e)}), 500
@app.route('/download_model', methods=['GET'])
def download_model():
"""下载模型"""
filename = request.args.get('filename')
if not filename:
return jsonify({'error': '缺少文件名'}), 400
fp = os.path.join(DOWNLOAD_TEMP_FOLDER, secure_filename(filename))
if not os.path.exists(fp):
return jsonify({'error': '文件不存在'}), 404
return send_file(fp, as_attachment=True, download_name=filename, mimetype='application/zip')
@app.route('/upload_folder', methods=['POST'])
def upload_folder():
"""上传文件夹"""
ensure_directories()
if 'files' not in request.files:
return jsonify({'error': '没有文件'}), 400
files = request.files.getlist('files')
paths = request.form.getlist('paths')
if not files:
return jsonify({'error': '没有选择文件'}), 400
original_root = paths[0].split('/')[0] if paths and '/' in paths[0] else ''
root_folder = f"{original_root}_{time.strftime('%Y%m%d_%H%M%S')}" if original_root else f"upload_{time.strftime('%Y%m%d_%H%M%S')}"
total_size, uploaded = 0, []
start = time.time()
for i, file in enumerate(files):
if file.filename == '':
continue
try:
rel_path = paths[i] if i < len(paths) else file.filename
if original_root and rel_path.startswith(original_root + '/'):
rel_path = root_folder + rel_path[len(original_root):]
target = os.path.join(UPLOAD_FOLDER, rel_path)
os.makedirs(os.path.dirname(target), exist_ok=True)
file.save(target)
size = os.path.getsize(target)
total_size += size
uploaded.append({'path': rel_path, 'size': size})
except:
pass
duration = max(time.time() - start, 0.001)
size_mb = total_size / (1024 * 1024)
return jsonify({
'success': True, 'root_folder': root_folder,
'upload_path': os.path.abspath(os.path.join(UPLOAD_FOLDER, root_folder)),
'total_files': len(uploaded), 'total_size': f"{size_mb:.2f} MB",
'speed': f"{size_mb/duration:.2f} MB/s", 'duration': f"{duration:.2f}s"
})
# HTML 模板 (Tailwind CSS)
UPLOAD_PAGE_HTML = '''
<!DOCTYPE html>
<html lang="zh-CN" class="dark">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>LeRobot 训练平台</title>
<script src="https://cdn.tailwindcss.com"></script>
<script>tailwind.config={darkMode:'class',theme:{extend:{colors:{d1:'#0d1117',d2:'#161b22',d3:'#21262d',border:'#30363d'}}}}</script>
<style>body{font-family:ui-monospace,monospace}input[type=file]{display:none}::-webkit-scrollbar{width:6px}::-webkit-scrollbar-thumb{background:#30363d;border-radius:3px}</style>
</head>
<body class="bg-d1 text-gray-300 min-h-screen p-5">
<div class="grid grid-cols-1 lg:grid-cols-2 xl:grid-cols-3 gap-5 max-w-[1800px] mx-auto">
<!-- 上传区域 -->
<div class="bg-d2 border border-border rounded-xl p-6 shadow-2xl">
<h1 class="text-2xl font-bold mb-2 bg-gradient-to-r from-blue-400 to-purple-500 bg-clip-text text-transparent">📁 数据集上传</h1>
<p class="text-gray-500 text-sm mb-5">选择本地训练数据文件夹上传</p>
<div id="uploadZone" class="border-2 border-dashed border-border rounded-lg p-8 text-center cursor-pointer bg-d3 hover:border-blue-500 hover:bg-blue-500/5 transition-all mb-4">
<div class="text-4xl mb-3">📂</div>
<div class="text-lg mb-1">点击选择文件夹</div>
<div class="text-gray-500 text-sm">需包含 meta/、data/、videos/</div>
</div>
<div class="flex items-center gap-2 bg-yellow-500/10 border border-yellow-500 rounded-lg p-3 mb-4 text-sm">
<span>💡</span><span>按 <kbd class="bg-d3 px-2 py-0.5 rounded text-blue-400">Ctrl+H</kbd> 显示隐藏文件夹</span>
</div>
<input type="file" id="folderInput" webkitdirectory directory multiple />
<div id="selectedInfo" class="hidden bg-d3 border border-border rounded-lg p-4 mb-4">
<h3 class="text-green-400 text-sm mb-2">✓ 已选择文件</h3>
<div id="fileList" class="max-h-48 overflow-y-auto text-sm text-gray-500"></div>
</div>
<div id="progressContainer" class="hidden my-4">
<div class="h-2 bg-d3 rounded overflow-hidden"><div id="progressFill" class="h-full bg-gradient-to-r from-blue-500 to-purple-500 w-0 transition-all"></div></div>
<div class="flex justify-between text-xs text-gray-500 mt-2"><span id="progressPercent">0%</span><span id="progressDetail">准备中...</span></div>
</div>
<div class="flex gap-3 flex-wrap">
<button id="uploadBtn" disabled class="px-5 py-2.5 rounded-lg font-medium bg-gradient-to-r from-blue-500 to-purple-500 text-white disabled:opacity-50 disabled:cursor-not-allowed hover:shadow-lg hover:shadow-blue-500/30 transition-all">🚀 开始上传</button>
<button id="clearBtn" class="hidden px-4 py-2 rounded-lg bg-d3 border border-border hover:bg-border transition-all">🗑️ 清除</button>
</div>
<div id="result" class="hidden mt-4 p-4 rounded-lg"></div>
<div class="mt-6 pt-4 border-t border-border text-sm text-gray-500">
<p>📦 本地数据: <code class="bg-d3 px-1.5 rounded text-yellow-500">~/.cache/huggingface/lerobot/</code></p>
</div>
</div>
<!-- 训练区域 -->
<div class="bg-d2 border border-border rounded-xl p-6 shadow-2xl">
<h1 class="text-2xl font-bold mb-2 bg-gradient-to-r from-green-400 to-blue-500 bg-clip-text text-transparent">🚀 模型训练</h1>
<p class="text-gray-500 text-sm mb-5">选择数据集和算法开始训练</p>
<div class="mb-4">
<label class="block text-sm text-gray-500 mb-1.5">选择数据集</label>
<div class="flex gap-2">
<select id="datasetSelect" class="flex-1 bg-d3 border border-border rounded-lg px-3 py-2.5 text-sm focus:border-blue-500 outline-none"><option>加载中...</option></select>
<button id="refreshDatasets" class="px-3 py-2 bg-d3 border border-border rounded-lg hover:bg-border">🔄</button>
</div>
</div>
<div class="mb-4">
<label class="block text-sm text-gray-500 mb-1.5">选择算法</label>
<select id="algorithmSelect" class="w-full bg-d3 border border-border rounded-lg px-3 py-2.5 text-sm focus:border-blue-500 outline-none">
<option value="act">ACT</option><option value="diffusion">Diffusion</option><option value="smolvla">SmolVLA</option>
<option value="pi05">Pi0.5 (单卡)</option><option value="pi05_multi">Pi0.5 (多卡)</option>
</select>
</div>
<div class="bg-d1 border border-border rounded-lg p-3 mb-4">
<label class="block text-xs text-gray-500 mb-2">训练命令 (可编辑)</label>
<textarea id="commandTextarea" class="w-full h-48 bg-d3 border border-border rounded-lg p-3 text-xs resize-y focus:border-blue-500 outline-none" placeholder="选择数据集和算法后自动生成..."></textarea>
</div>
<div class="flex items-center gap-3 flex-wrap">
<button id="startTrainingBtn" disabled class="px-5 py-2.5 rounded-lg font-medium bg-gradient-to-r from-blue-500 to-purple-500 text-white disabled:opacity-50 disabled:cursor-not-allowed">▶️ 开始训练</button>
<button id="stopTrainingBtn" class="hidden px-4 py-2 rounded-lg bg-red-500 text-white hover:bg-red-600">⏹️ 停止</button>
<label class="ml-auto flex items-center gap-1.5 text-sm text-yellow-500 cursor-pointer"><input type="checkbox" id="shutdownAfter" class="w-4 h-4"><span>⚡ 训练后关机</span></label>
</div>
<div class="bg-d1 border border-border rounded-lg mt-4">
<div class="flex justify-between items-center px-3 py-2 border-b border-border">
<span class="text-sm">📋 训练日志</span>
<span id="trainingStatus" class="text-xs px-2.5 py-1 rounded-full bg-d3 text-gray-500">空闲</span>
</div>
<div id="logContent" class="h-72 overflow-y-auto p-3 text-xs font-mono"><div class="text-gray-500">等待开始训练...</div></div>
</div>
</div>
<!-- 下载区域 -->
<div class="bg-d2 border border-border rounded-xl p-6 shadow-2xl">
<h1 class="text-2xl font-bold mb-2 bg-gradient-to-r from-yellow-500 to-red-500 bg-clip-text text-transparent">📦 模型下载</h1>
<p class="text-gray-500 text-sm mb-5">下载训练完成的模型</p>
<div class="flex justify-between items-center mb-4">
<span class="text-sm text-gray-500">可用模型</span>
<button id="refreshModels" class="px-3 py-1.5 text-sm bg-d3 border border-border rounded-lg hover:bg-border">🔄 刷新</button>
</div>
<div id="modelList" class="max-h-[500px] overflow-y-auto"><div class="text-center text-gray-500 py-10">加载中...</div></div>
</div>
</div>
<script>
const $ = id => document.getElementById(id);
const uploadZone=$('uploadZone'),folderInput=$('folderInput'),selectedInfo=$('selectedInfo'),fileList=$('fileList'),
uploadBtn=$('uploadBtn'),clearBtn=$('clearBtn'),progressContainer=$('progressContainer'),progressFill=$('progressFill'),
progressPercent=$('progressPercent'),progressDetail=$('progressDetail'),result=$('result'),
datasetSelect=$('datasetSelect'),algorithmSelect=$('algorithmSelect'),commandTextarea=$('commandTextarea'),
startTrainingBtn=$('startTrainingBtn'),stopTrainingBtn=$('stopTrainingBtn'),shutdownAfter=$('shutdownAfter'),
trainingStatus=$('trainingStatus'),logContent=$('logContent'),modelList=$('modelList');
let selectedFiles=[],relativePaths=[],lastUploadedDataset=null,currentTaskId=null,logPollingInterval=null,lastLogIndex=0;
// 算法模板
const templates={
act:(n,p)=>`lerobot-train --dataset.repo_id=mylerobot --dataset.root=${p} --policy.type=act --output_dir=~/data/output/act_${n}_model --job_name=${n}_job --policy.device=cuda --wandb.enable=false --steps=1000 --batch_size=16 --save_freq=10000 --policy.push_to_hub=false`,
diffusion:(n,p)=>`lerobot-train --dataset.repo_id=mylerobot --dataset.root=${p} --policy.type=diffusion --output_dir=~/data/output/diffusion_${n}_model --job_name=${n}_job --policy.device=cuda --wandb.enable=false --steps=1000 --batch_size=16 --save_freq=10000 --policy.push_to_hub=false`,
smolvla:(n,p)=>`lerobot-train --dataset.repo_id=mylerobot --dataset.root=${p} --policy.type=smolvla --output_dir=~/data/output/smolvla_${n}_model --job_name=${n}_job --policy.device=cuda --wandb.enable=false --steps=1000 --batch_size=16 --save_freq=10000 --policy.push_to_hub=false`,
pi05:(n,p)=>`lerobot-train --dataset.repo_id=mylerobot --dataset.root=${p} --policy.type=pi05 --output_dir=~/data/output/pi05_${n}_model --job_name=${n}_job --policy.device=cuda --wandb.enable=false --steps=1000 --batch_size=16 --save_freq=10000 --policy.pretrained_path=~/data/models/pi05_base --policy.gradient_checkpointing=true --policy.dtype=bfloat16 --policy.push_to_hub=false`,
pi05_multi:(n,p)=>`accelerate launch --multi_gpu --num_processes=2 --mixed_precision=bf16 $(which lerobot-train) --dataset.repo_id=mylerobot --dataset.root=${p} --policy.type=pi05 --output_dir=~/data/output/pi05_${n}_model --job_name=${n}_job --policy.device=cuda --wandb.enable=false --steps=1000 --batch_size=16 --save_freq=10000 --policy.pretrained_path=/root/data/models/pi05_base --policy.gradient_checkpointing=true --policy.dtype=bfloat16 --policy.push_to_hub=false`
};
// 上传功能
uploadZone.onclick=()=>folderInput.click();
folderInput.onchange=e=>{selectedFiles=Array.from(e.target.files);relativePaths=selectedFiles.map(f=>f.webkitRelativePath);updateFileList()};
function updateFileList(){
if(!selectedFiles.length){selectedInfo.classList.add('hidden');uploadBtn.disabled=true;clearBtn.classList.add('hidden');return}
selectedInfo.classList.remove('hidden');clearBtn.classList.remove('hidden');
const size=(selectedFiles.reduce((s,f)=>s+f.size,0)/1024/1024).toFixed(2);
const hasMeta=relativePaths.some(p=>p.split('/')[1]==='meta');
const hasData=relativePaths.some(p=>p.split('/')[1]==='data');
const hasVideos=relativePaths.some(p=>p.split('/')[1]==='videos');
const valid=hasMeta&&hasData&&hasVideos;
uploadBtn.disabled=!valid;
let html=valid?'<div class="text-green-400 p-2 bg-green-500/10 rounded mb-2">✓ 目录结构正确</div>':
`<div class="text-red-400 p-2 bg-red-500/10 rounded mb-2">⚠ 缺少: ${[!hasMeta&&'meta',!hasData&&'data',!hasVideos&&'videos'].filter(Boolean).join(', ')}</div>`;
html+=`<div class="mb-2">${selectedFiles.length}个文件, ${size} MB</div>`;
relativePaths.slice(0,15).forEach(p=>html+=`<div class="py-0.5 border-b border-border/50 truncate">${p}</div>`);
if(relativePaths.length>15)html+=`<div class="text-yellow-500 py-1">...还有${relativePaths.length-15}个</div>`;
fileList.innerHTML=html;
}
clearBtn.onclick=()=>{selectedFiles=[];relativePaths=[];folderInput.value='';updateFileList();result.classList.add('hidden')};
uploadBtn.onclick=async()=>{
if(!selectedFiles.length)return;
uploadBtn.disabled=true;progressContainer.classList.remove('hidden');result.classList.add('hidden');
const start=Date.now(),total=selectedFiles.reduce((s,f)=>s+f.size,0);
const form=new FormData();
selectedFiles.forEach((f,i)=>{form.append('files',f);form.append('paths',relativePaths[i])});
const xhr=new XMLHttpRequest();
xhr.upload.onprogress=e=>{if(e.lengthComputable){
const pct=Math.round(e.loaded/e.total*100);progressFill.style.width=pct+'%';progressPercent.textContent=pct+'%';
const spd=((e.loaded/1024/1024)/((Date.now()-start)/1000)).toFixed(2);
progressDetail.textContent=`${(e.loaded/1024/1024).toFixed(1)}/${(e.total/1024/1024).toFixed(1)} MB (${spd} MB/s)`;
}};
xhr.onload=()=>{
const dur=(Date.now()-start)/1000,spd=(total/1024/1024/dur).toFixed(2);
if(xhr.status===200){
const r=JSON.parse(xhr.responseText);
result.className='block mt-4 p-4 rounded-lg bg-green-500/10 border border-green-500 text-green-400';
result.innerHTML=`<b>✓ 上传成功!</b><br>路径: <code class="bg-d3 px-1 rounded text-xs">${r.upload_path}</code><br>文件: ${r.total_files} | 大小: ${r.total_size} | 速度: ${spd} MB/s`;
lastUploadedDataset=r.root_folder;loadDatasets();
}else{result.className='block mt-4 p-4 rounded-lg bg-red-500/10 border border-red-500 text-red-400';result.innerHTML='上传失败'}
uploadBtn.disabled=false;
};
xhr.open('POST','/upload_folder');xhr.send(form);
};
// 训练功能
async function loadDatasets(){
try{
const r=await(await fetch('/list_datasets')).json();
datasetSelect.innerHTML=r.datasets.length?'':'<option>暂无数据集</option>';
r.datasets.forEach(d=>{const o=document.createElement('option');o.value=d.path;o.dataset.name=d.name;o.textContent=d.name+(d.is_valid?' ✓':' ⚠');datasetSelect.appendChild(o)});
if(lastUploadedDataset){for(let i=0;i<datasetSelect.options.length;i++)if(datasetSelect.options[i].dataset.name===lastUploadedDataset){datasetSelect.selectedIndex=i;break}lastUploadedDataset=null}
updateCommand();
}catch(e){datasetSelect.innerHTML='<option>加载失败</option>'}
}
function updateCommand(){
const path=datasetSelect.value,name=datasetSelect.options[datasetSelect.selectedIndex]?.dataset?.name||'',algo=algorithmSelect.value;
if(!path){commandTextarea.value='';startTrainingBtn.disabled=true;return}
commandTextarea.value=templates[algo]?.(name,path)||'';startTrainingBtn.disabled=false;
}
datasetSelect.onchange=updateCommand;algorithmSelect.onchange=updateCommand;
$('refreshDatasets').onclick=loadDatasets;
startTrainingBtn.onclick=async()=>{
const cmd=commandTextarea.value.trim();if(!cmd)return;
if(shutdownAfter.checked&&!confirm('确定训练后关机?'))return;
startTrainingBtn.disabled=true;currentTaskId='train_'+Date.now();lastLogIndex=0;
try{
const r=await(await fetch('/start_training',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({command:cmd,task_id:currentTaskId,shutdown_after:shutdownAfter.checked})})).json();
if(r.success){
startTrainingBtn.classList.add('hidden');stopTrainingBtn.classList.remove('hidden');shutdownAfter.disabled=true;
trainingStatus.className='text-xs px-2.5 py-1 rounded-full bg-blue-500/20 text-blue-400';
trainingStatus.textContent=shutdownAfter.checked?'训练中(完成后关机)':'训练中...';
logContent.innerHTML='<div>训练已启动...</div>';startLogPolling();
}else{alert(r.error);startTrainingBtn.disabled=false}
}catch(e){alert(e);startTrainingBtn.disabled=false}
};
stopTrainingBtn.onclick=async()=>{
if(!currentTaskId)return;
await fetch('/stop_training',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({task_id:currentTaskId})});
stopLogPolling();trainingStatus.className='text-xs px-2.5 py-1 rounded-full bg-red-500/20 text-red-400';trainingStatus.textContent='已停止';
startTrainingBtn.classList.remove('hidden');startTrainingBtn.disabled=false;stopTrainingBtn.classList.add('hidden');shutdownAfter.disabled=false;
};
function startLogPolling(){
if(logPollingInterval)clearInterval(logPollingInterval);
logPollingInterval=setInterval(async()=>{
if(!currentTaskId)return;
try{
const r=await(await fetch(`/training_status?task_id=${currentTaskId}&last_index=${lastLogIndex}`)).json();
if(r.logs?.length){r.logs.forEach(l=>{const d=document.createElement('div');d.innerHTML=`<span class="text-blue-400">[${l.time}]</span> ${l.message.replace(/</g,'<')}`;logContent.appendChild(d)});logContent.scrollTop=logContent.scrollHeight;lastLogIndex=r.last_index}
if(!r.is_running){
stopLogPolling();
trainingStatus.className=`text-xs px-2.5 py-1 rounded-full ${r.exit_code===0?'bg-green-500/20 text-green-400':'bg-red-500/20 text-red-400'}`;
trainingStatus.textContent=r.exit_code===0?'完成':`退出:${r.exit_code}`;
startTrainingBtn.classList.remove('hidden');startTrainingBtn.disabled=false;stopTrainingBtn.classList.add('hidden');shutdownAfter.disabled=false;loadModels();
}
}catch(e){}
},1000);
}
function stopLogPolling(){if(logPollingInterval){clearInterval(logPollingInterval);logPollingInterval=null}}
// 下载功能
async function loadModels(){
try{
const r=await(await fetch('/list_models')).json();
modelList.innerHTML=r.models?.length?'':'<div class="text-center text-gray-500 py-10">📭 暂无模型</div>';
r.models?.forEach(m=>{
const d=document.createElement('div');
d.className='flex justify-between items-center p-3 bg-d3 border border-border rounded-lg mb-2 hover:border-blue-500 transition-all';
d.innerHTML=`<div class="min-w-0 flex-1"><div class="truncate">📦 ${m.name}</div><div class="text-xs text-gray-500">${m.size_mb} | ${m.modified_time}</div></div><button onclick="location.href='/download_model?filename=${encodeURIComponent(m.name)}'" class="ml-3 px-3 py-1.5 bg-green-500 text-white text-sm rounded-lg hover:bg-green-600">⬇️</button>`;
modelList.appendChild(d);
});
}catch(e){modelList.innerHTML='<div class="text-center text-red-400 py-10">加载失败</div>'}
}
$('refreshModels').onclick=loadModels;
loadDatasets();loadModels();
</script>
</body>
</html>
'''
@app.route('/', methods=['GET'])
def index():
"""首页"""
return render_template_string(UPLOAD_PAGE_HTML, upload_folder=UPLOAD_FOLDER, download_folder=DOWNLOAD_FOLDER)
if __name__ == '__main__':
ensure_directories()
print("LeRobot 训练平台启动中...")
print(f"数据集目录: {UPLOAD_FOLDER}")
print(f"模型目录: {DOWNLOAD_TEMP_FOLDER}")
print("访问: http://0.0.0.0:7001")
app.run(host='0.0.0.0', port=7001, debug=True, use_reloader=False)
web服务启动成功后,回到控制页面,点击“自定义服务->7001端口,获取最新的地址。
在浏览器中打开链接,看到下面页面就说明可以正常使用了。
3.2 上传采集数据
本地采集的数据会放在~/.cache/huggingface/lerobot/ 目录下,linux或mac用户得按 Ctrl+H 才能让浏览器的上传组件显示隐藏文件夹,这里只能上传符合数据结构要求的目录。
注意:~/.cache/ 是系统盘 30GB且不可扩容,放到数据盘50GB且能扩容,建议放到数据盘回好一点
点击开始上传,上传速度普遍在3m/s左右。
等几分钟,上传完成后,文件会保存到
/root/data/upload_temp目录下
3.3 训练
选择刚才上传的(或之前的)采集数据,选择合适的算法,按需调整参数,点击开始训练即可,
训练完成后,会把模型压缩成zip包存储在/root/data/download_temp目录下。
为了避免浪费钱,可以选择训练后自动关机。下次可以用无卡模式启动容器下载模型。
下载模型
页面最右侧会显示压缩好的模型文件,选择模型下载到本地。
最后,把模型解压到本地output目录下,测试脚本替换成你的解压目录就能正常运行拉
lerobot-record \
--robot.type=so100_follower \
--robot.port=/dev/ttyACM1 \
--robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
--robot.id=my_awesome_follower_arm \
--display_data=false \
--dataset.repo_id=${HF_USER}/eval_so100 \
--dataset.single_task="Put lego brick into the transparent box" \
# <- Teleop optional if you want to teleoperate in between episodes \
# --teleop.type=so100_leader \
# --teleop.port=/dev/ttyACM0 \
# --teleop.id=my_awesome_leader_arm \
--policy.path={你的解压目录}
总结
折腾了大半天,从环境配置、数据传输到模型训练,终于让机械臂在云端跑起来了!坐在电脑前喝着咖啡,看着模型训练日志哗哗滚动,那种感觉真的很奇妙——几个小时前还在本地缓慢爬行的训练,现在在4090上火力全开。
最让我感慨的是,整个web界面工具从构思到实现,不过几个小时。从最初觉得“要写个复杂的服务器管理工具”到最终500行代码搞定上传、训练、下载全流程,AI辅助编程真的把开发门槛降到了难以置信的程度。
租个云端显卡,点几下鼠标,训练完成自动关机——整个过程流畅得不像传统的“深度学习项目”。
几块钱的成本,换来的是解放出来的时间和精力,还有实实在在跑起来的模型。这种感觉,大概就是技术带来的自由感吧。