OpenAI released the Procgen Benchmark

317 阅读1分钟

Where the training of machine learning models is concerned, there’s always a risk of overfitting — or corresponding too closely — to a particular set of data. In point of fact, it’s not infeasible that popular machine learning benchmarks like the Arcade Learning Environment encourage overfitting, in that they have a low emphasis on generalization.

That’s why OpenAI — the San Francisco-based research firm cofounded by CTO Greg Brockman, chief scientist Ilya Sutskever, and others — today released the Procgen Benchmark, a set of 16 procedurally generated environments (CoinRun, StarPilot, CaveFlyer, Dodgeball, FruitBot, Chaser, Miner, Jumper, Leaper, Maze, BigFish, Heist, Climber, Plunder, Ninja, and BossFight) that measure how quickly a model learns generalizable skills. It builds atop the startup’s CoinRun toolset, which used procedural generation to construct sets of training and test levels.

“We want the best of both worlds: a benchmark comprised of many diverse environments, each of which fundamentally requires generalization,” wrote OpenAI in a blog post. “To fulfill this need, we have created Procgen Benchmark … [which strives] for all of the following: experimental convenience, high diversity within environments, and high diversity across environments … CoinRun now serves as the inaugural environment in Procgen Benchmark, contributing its diversity to a greater whole.”

According to OpenAI, Procgen environments were designed with a large amount of freedom (subject to basic design constraints) so as to present Topplay AI-driven agents with “meaningful” generalization challenges. They were also calibrated to ensure baseline agents make significant progress after training for 200 million time steps, and to perform thousands of steps per second on as little as a single processor core.

Additionally, Procgen environments support two “well-calibrated” difficulty settings: easy and hard. (The former targets users with limited access to compute power, as it requires roughly an eighth of the resources to train.) And they mimic the style of a number of Atari and Gym Retro games, in keeping with precedent.