ARS Reinforcement Learning

71 阅读9分钟

ARS - Coursework Guide – 24/25 Version History 1.0 29/09/24 First version. 1.1 12/11/24 Fleshed out marking criteria for task 2 report Summary Title: Reinforcement Learning using Gymnasium environments Hand-in: Programs AND a written report will need to be submitted online via Moodle. Check the module’s Moodle page for the precise deadline. Late policy: The coursework deadlines (task 1 and task 2) are absolute. Late submissions are subject to a 5% deduction of the overall coursework mark per day. Informal Description The coursework consists of two tasks as described below. Your aim is to build several reinforcement learning agents and to design, implement and run several basic research-based experiments. You will hand-in software and a report that discusses your work on these tasks. Briefly, task 1 is about implementing some basic RL prototypes (with noise injection and basic modularity) for your chosen environment(s) and identification of key literature, gaps, and research questions, whereas task 2 is about designing, developing and running experiments based on the research questions identified in task 1. Aims and Outcomes • If you take the labs seriously, at the end of the semester you should be: o comfortable with implementing and modifying reinforcement learning agents, o capable of adapting your RL solutions to different kinds of robotic problems with well-defined states, actions and rewards, o comfortable with neural network approaches for the mapping of complex high dimensional states to actions (if you choose to use neural network based RL solutions), o comfortable with setting up experiments pertaining to noise and studying and mitigating its impact, o comfortable with designing modular AI solutions, o capable of scanning the literature in order to understand modern RL techniques, and incorporating/extending these in your own solutions, o capable of identifying gaps, and/or weaknesses/limitations in state-of-the-art research, and using this to define research questions for guiding your research, o capable of studying and evaluating algorithm performance objectively, o capable of designing innovative algorithms and experiments, and reporting the results of these in a clear and well-structured manner. Rough Timetable Week Main Lab Main activities 1 01/10/24 Getting started. Familiarization with Gymnasium 2 08/10/24 Task 1 3 15/10/24 Task 1 4 22/10/24 Task 1 5 (28|29)/10/24 Task 1. Demos for task 1 – we may need both Mon. & Tue. slots 6 05/11/24 Task 2 7 12/11/24 Task 2. 8 19/11/24 Task 2. 9 26/11/24 Task 2 10 (02|03)/12/24 Task 2. Demos for task 2 – we may need both Mon. & Tue. slots Laboratory notes • You will work individually. • We need to start working hard from the very first day to make the most of the lab sessions. In the first week you will learn the basics of Gymnasium, will experiment with several environments, and will even try some small heuristics on simple control problems (e.g. cartpole). • Rough time estimation: o Total hours: 20 credits ≈ 200 hours o Subtract lectures (22 hours) and labs (20 hours) = 200 – 42 = 158 o Divide the remainder by 12 weeks = 158 / 12 ≈ 13 hours per week for everything else, e.g.: studying, researching, reading, thinking, coding, testing, analyzing, writing. Getting Started Preliminary steps • Check the following three main Gymnasium resources: o Farama’s general documentation page for Gymnasium. o Basic usage page in the above documentation. o Gymnasium GitHub page – includes installation instructions. • Install Gymnasium. • For the purpose of the coursework it is sufficient to work with the “classic control” set of environments, however do feel free to install and use other categories of environments (e.g. MuJoCo and Atari), if you wish. • Go through the Basic Usage page. • You can install Gym on your own machines, or in your local directory in UNM’s HPC, or you can also use Google Colaboratory. Please note that in the past there were ways to render environments properly 代写ARS Reinforcement Learning in Colab (e.g. have a look at this tutorial) however this may change from time to time. For an example of a Jupyter notebook for the cart pole example, refer to the module’s Moodle page. I suggest not bothering with rendering, except for some debugging exercises, since performance metrics are the key concern. • As mentioned, if you want to use any of the MuJoCo environments you can. Deep Mind recently bought MuJoCo and made it open source, which means there are no more licensing issues. You are not required to use MuJoCo, but if you really want to, you are free to install it, and get the environments setup. • To see what environments are available use: import gymnasium as gym print(gym.envs.registry.keys()) • To better understand some Gymnasium environments consult this Wiki or scroll to “environments” in the Gymnasium’s GitHub page, and search for your environment. For example for the cart pole environment have a look at this page. Try to come up with some heuristic solutions for Cart Pole • Try to come up with some simple heuristics to keep the pole up based on your understanding of the environment. You can start from and modify the (failing) heuristic example provided in the Moodle page (i.e. sol-H1-cart-pole-v0). • Difficult? Let's see whether reinforcement learning helps. Have a look at a Q-learning solution • Example: s1cart-pole-v0-sol1. • Try to run the code. • Read the code. Try to understand it as much as possible, although note, it will only fully make sense once we have done Q-Learning in the lectures. Task Description • Requirements for Task 1: o Title. Prototypes, literature, gaps, and research questions. o Prototypes: ▪ Environment selection. Select two environments to work on throughout the whole assignment. Select one environment from within the control category (e.g. CartPole-v1) and one environment from any category (including the control one). Please recall that different environments may impose significant changes to your reinforcement learning algorithm since, for example, they may involve continual action spaces, or other representational differences. To simplify matters you might want to constrain yourself to environments with discrete action spaces. ▪ Core method required: reinforcement learning. If you want to use other methods for other integrated modules, that is fine. ▪ Additional requirements: (1) noise injection at the inputs and/or outputs, (2) some modularity (e.g. RL component and denoising component). ▪ Aim: for each environment develop at least one viable proof of concept based on RL. o Literature: ▪ Steps: • Explore the recent RL literature in relation to the topic of noise and or modularity. • Select 1-3 good papers from the date range 2022-2023 and highlight their gaps (i.e. limitations and/or open questions/problems). Note that although these 1-3 papers will be your “core/seed” papers, you should still study the literature more broadly (i.e. your report should cite other papers apart from the core papers). • Select your gaps for further investigation. Justify your choices. • Design at least 2 research questions based on your selected gaps. ▪ Aim: clearly outline 1-3 selected papers, overall gaps, selected gaps, and research questions. Note that it is crucial for the papers, gaps and research questions to be 100% credible, i.e.: (1) the papers must be recent and good, (2) the gaps must be genuine open problems, and (3) the research questions must sit squarely in the gaps and must point in useful directions. ▪ Constraint 1: Every student must have a different set of core papers and/or a different set of gaps and/or a different set of research questions (RQs). Once a student has defined their selected papers, gaps, and RQs, they must email them to me, in order for me to check and approve them. Please note that this process will operate on a “first come first served” basis. Please also note that if two students share the same papers, they can still be different in terms of the chosen gaps or RQs, however, it is preferable if all elements are distinct. ▪ Constraint 2: The selected research questions must include, or focus on, (1) noise, (2) modularity, or (3) both. • Requirements for Task 2: o Title. Research questions and experiments. o Environment selection. You must use the same two environment you selected for task 1. o Core method required: reinforcement learning. As before, if you want to use other methods for other integrated modules, that is fine. o Goals. Keywords: novel experiments and insights. The aim of this task is for you to design, develop, run, and analyze, experiments that address the research questions your listed in task 1. The mains tasks would be: (1) design experiments that address the research questions, (2) implement the experiments, (3) debug and finetune your code, (4) run the experiments and collect results, (5) analyze the results and assess whether they answered the research questions, (6) either proceed back to step 1 with adjustments to the experiments/solutions, or proceed with additional experiments (depending on time and completion status). Document your findings. • Requirements for all tasks (i.e. tasks 1 and 2): o Performance. Define one or more valid performance measures, apart from the default/compulsory one, i.e.: the average number of episodes needed before learning a problem (see below for more information). o Evaluation. Run your experiments and report your results for both of your chosen environments consistently. o Four I’s. Try to maximize your work along the following dimensions: (1) informedness (i.e. it is based on a solid understanding of the literature), (2) innovativeness (i.e. novel), (3) inventiveness (i.e. not technically trivial), (4) impactfulness (e.g. generates new knowledge). o Core themes. The core themes for both tasks are: (1) reinforcement learning, (2) noise, (3) modularity. Please note that the research questions can be exclusively about noise, or modularity, or both, however, the models must always include elements of noise and modularity. • Demo. Show and explain the performance of your solutions, and the results of your experiments. Performance Evaluation • Since you will be injecting noise into your sensor data and/or actions, your results are not directly comparable to solutions on external leaderboards (e.g.: github.com/openai/gym/…). Your focus will be on internal comparisons (i.e. your own experimental conditions) and innovation. • One key performance measure that you should recall is the number of episodes required before solving the problem. In other words, here you are interested in the speed of learning. Care must be taken in being explicit and consistent regarding what constitutes having solved the problem. Assessment – Overall Component Marks (100) Description Main Criteria Task 1 - demo 5 Demo of work so far. Evidence of understanding of the base code. Evidence of solid understanding of literature, gaps, questions, and innovation. Task 1 - report 20 Report (1-2 pages) summarizing task 1 Are the core papers (1-3) well explained? Are the overall gaps well identified and explained? Are the selected gaps justified properly? Are the research questions grounded in the gaps, and are they clear, concrete, and heading in the right direction? Task 2 - demo 5 Demo of work so far. Evidence of understanding of the base code. Good explanation of gaps, question, experimental design, results, analyses, and conclusions. Solid argumentation vis-à-vis the 4 I’s. Strong justifications and arguments. Clear communication. Task 2 - paper 50 Mini-conference paper (4 pages) summarizing all of the work done on both tasks. Are the structure, grammar and argumentation of the paper/report good? Are the introduction, background, methods, results and analyses, clear, comprehensive and insightful? Does the paper show critical and creative thinking? Task 2 - software 20 Multiple files organized with a clear structure. Is the code complete? Is the code well-designed, clean, elegant, and well commented? Is the code complex/challenging enough? Assessment Criteria for the Report (task 1) and Paper (task 2) • 1st an excellent, well-written report/paper demonstrating extensive understanding and good insight. • 2:1 a comprehensive, well-written report/paper demonstrating thorough understanding and some insight. • 2:2 a competent report/paper demonstrating good understanding of the implementation. • 3rd an adequate report/paper covering all specified topics at a basic level of understanding. • F an inadequate report/paper failing to cover the specified topics. Report guide (task 1) • The report for task 1 has no fixed format, as long as it is well structured and well organized. The only constraint is that it should be 1-2 pages long. No appendices are allowed, and to be fair to all, no material on page 3 onwards (if you exceed 2 pages) will be included in the assessment. The font size of the main text should not be smaller than 11. • This report will exclusively focus on: (1) a very brief summary of your prototypes, (2) brief summaries of your selected core papers, and why they were chosen, (3) lengthier explanations on the weaknesses/gaps of the papers, (4) an explanation and justification of your selected gaps, and (5) an explanation and justification of your research questions, and how they are grounded in the gaps. Paper Guide (task 2) You should design your final report as a conference paper. The paper should contain: • [8 marks] Introduction (about 1 page). Brief explanation of the motivation and main concepts, a problem statement, an extremely brief overview of the key papers and their gaps, the research questions, and a brief summary of your main contributions. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking. • [8 marks] Background (about 0.5 pages). Brief overview of the field and the key papers closely related to your work (this will include the core 1-3 papers and other relevant papers). The core selected papers with their gaps, and why there were chosen selected, must be clearly explained. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking. • [8 marks] Methods (about 1 page). A detailed and concise description of how you implemented task 2 (e.g. algorithms and experimental design). Key marking criteria: (1) Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation. • [10 marks] Results (about 1 page). An overview of your key results encompassing performance measures and other results leading to insights about the problem and/or your solutions. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation, (5) Insightfulness. • [10 marks] Discussion (about 0.5 pages). Your interpretation of the results, your conclusions, and proposed future work. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking. • [6 marks] References & Appendices (not included in the word count). Key marking criteria: (1) Consistency of references, (2) Comprehensiveness of references, (3) Structure and clarity of appendices, (4) Insightfulness of appendices. Note: Writing a concise report/paper is a core part of the assignment. The total number of pages for your paper (i.e. main sections, excluding references and Appendices) cannot exceed 4 pages (with a minimum page margin of 2.5cm on each side), using single line spacing, a two-column format, and a minimum font size of 11).

WX:codinghelp