Seminar on Reinforcement Learning
with Large Language Models
Saarland University — Winter Semester 2023
The course will provide an overview of recent advancements in research that study reinforcement learning (RL) with large language models (LLMs). The course material will familiarize participants with state-of-the-art techniques that bridge RL and LLMs. The course consists of three main components as follows: (i) research papers, (ii) project, and (iii) final presentation. Each component carries one-third of the final score. Further details about the course structure and logistics are provided below.

Organizers

Timeline and updates

  • Until 24 October 2023: Register for the seminar course at https://seminars.cs.uni-saarland.de.
  • 27 October 2023: We have a new mailing list for the course. To reach out to organizers/tutors, you should send an email to rl-llms-w23-tutors@mpi-sws.org (instead of contacting via individual emails).
  • 30 October 2023: Paper assignments for reading and writing reports are finalized by this date. Each student is assigned 6 papers for which they will be writing reports. The list of papers is provided below, and each student has the same set of papers.
  • Until 24 November 2023: After being allocated a slot in the seminar, you need to register for the seminar course examination in the LSF at Saarland University. You must register for the seminar course examination by 24 November 2023; this is also the deadline to withdraw by emailing us.
  • 17 November 2023: Reports for the first three papers (#1, #2, #3) are due.
  • 8 December 2023: Reports for the next three papers (#4, #5, #6) are due.
  • 15 December 2023: Initial project proposals are due.
  • Until 22 December 2023: Project details will be finalized by this date after discussions with tutors.
  • Until 5 January 2024: Project proposals with full details are due by this date.
  • 16 February 2024: Project report and code are due.
  • Between 26 February to 22 March 2024: Final presentations will take place. The exact dates will be finalized in discussion with enrolled students. Slides will be due before the presentation date.

Course structure

The course consists of three main components as follows: (i) research papers, (ii) project, and (iii) final presentation. Each component carries one-third of the final score. There will be no weekly classes. You can reach out to us anytime by sending an email to rl-llms-w23-tutors@mpi-sws.org. When needed, the tutors will arrange specific meeting times during the semester — further information will be communicated to students via emails as we move along in the semester.

Reading research papers and writing reports

  • Each student has to write reports for a total of 6 papers. The list of papers is provided below.
  • For each paper, you will have to write a two-page report. The timeline for report submissions is listed above.
  • Each report should be submitted as a PDF file by sending an email to rl-llms-w23-tutors@mpi-sws.org. You should name your PDF files as lastname_paper#.pdf (e.g., lastname_paper1.pdf, lastname_paper2.pdf, lastname_paper3.pdf, and so on).
  • Reports should be written in latex using NeurIPS style files.
  • Structure the report into three sections as follows:
    • Write down a review of the paper, including (a) a short summary of the paper, (b) a discussion on how the paper extends state of the art, and (c) the main strengths of the paper.
    • Write down the main weaknesses of the paper and discuss how this paper could be improved.
    • Write down your ideas on how you would like to extend the techniques and results in the paper. If you wish, you could also use these ideas to pursue as part of your project.
  • These reports will correspond to one-third of the final score.

Project

  • The project will be centered around techniques that bridge RL and LLMs. Students will have the freedom to pursue a project of their choice. To begin, each student will submit project proposals; the timeline for proposal submissions is listed above. The project could be related to the seminar papers, or it could also be new directions you are most excited about.
  • Based on your project proposals and discussions with tutors, a concrete project will be picked.
  • You will have to submit a report along with implementation and executable code for the project. Each student will work on the project separately (no teams).
  • The project will correspond to one-third of the final score.

Presentations

  • You will have to prepare a presentation of 20 mins. Your presentation will be based on the project, along with any relevant papers related to your project.
  • At the end of the semester, you will give a final presentation. We will block about 10 hours of time for the presentations. The exact dates will be finalized in discussion with enrolled students.
  • The slides and presentation will correspond to one-third of the final score.
  • Attendance at the final presentations will be mandatory.

List of research papers

  1. Deep Reinforcement Learning from Human Preferences
    by Christiano et al. (NeurIPS 2017).
  2. Training Language Models to Follow Instructions with Human Feedback
    by Ouyang et al. (NeurIPS 2022).
  3. Direct Preference Optimization: Your Language Model is Secretly a Reward Model
    by Rafailov et al. (NeurIPS 2023).
  4. Guiding Pretraining in Reinforcement Learning with Large Language Models
    by Du et al. (ICML 2023).
  5. Reward Design with Language Models
    by Kwon et al. (ICLR 2023).
  6. Pre-Trained Language Models for Interactive Decision-Making
    by Li et al. (NeurIPS 2022).
Please download the PDF files from the specific links provided above to avoid confusion with different versions that could be available online.


Imprint / Data Protection