OpenAI gym specsheet

92 阅读2分钟

Homework 6 specsheet 
-Extra Credit (replaces lowest HW)- 

In this homework, we apply a RL framework to environments available at the OpenAI gym. 

Mission command approach: As per §4.5 of the Sittyba, we will tell you what to do, not how to do it. 
That is up to you. However, we want you to: 
a) Do this homework yourself. Do not copy answers or code from someone else. 
b) Restrict your methods (for now) to what was covered in the lecture/lab (in other words, basic 
reinforcement learning involving Q-learning, policy gradients, multi-armed bandits, etc.) 

Here is what we would like you to do: 

  1. Go to gymnasium.farama.org/index.html 
  2. Pick one of the available environments – we recommend one of the classic Atari 2600 games: 
    gymnasium.farama.org/environment… [Make sure to pick one we did not already 
    cover in lecture or lab, but you can pick any environment that is not an Atari game too] 
  3. Train an agent to achieve a reasonable level of performance in this environment. 
  4. Write a brief statement as to how you trained the agent, how you managed the explore / exploit 
    tradeoff, and explaining any other choices you might have made. 
  5. Also make sure to comment on how the training went – what was challenging for the agent, 
    what made training代 写OpenAI gym specsheet feasible? Explanations of what you couldn't do and why are encouraged with 
    emphasis on the “why” 
  6. Document the performance of the agent by plotting total rewards as a function of training 
    episodes. 
  7. Make sure to include your code as a separate file. 

    Suggestions and recommendations: 
  1. Picking a more complex environment will merit more grade points. To check complexity, go to 
    github.com/openai/gym/… and look at Observation Space and 
    Action Space. We recommend to choose an environment which has Discrete Action space. We 
    want to keep grading criteria (in terms of points) flexible to see what students can actually do, 
    but as a broad heuristic, something with the complexity of “LunarLander-v2” would be ok, 
    something with the complexity of “BipedalWalker-v2” would be good, and something with the 
    complexity of “AirRaid-ram-v0” would be excellent. But don’t necessarily pick those specific 
    environments. Pick something that sparks joy, for you personally. It will shine through. 
  2. Try implementing an algorithm on your own instead of using stable baselines 3. If you use sb 3, 
    explain what you did to optimize the model. Try checking how far your model can go by trying 
    more complex environments and find the breaking point 
  3. You can also use the library NEAT-Python: neat-python.readthedocs.io/en/latest/ If you 
    decide to use NEAT, experiment on how far NEAT can go and note your observations. 
  4. So either a) implement your own algorithm, or b) use SB-3 (and note what you did to optimize 
    the model) or c) use NEAT-Python, find the most complex env you are able to solve with NEAT 
    and note what leads to better NEAT implementations 
  5. Whichever environment you pick, make sure your RL bot is learning the environment reasonably well (as evinced by the plot of total reward over episodes of training). 
    WX:codinghelp