13. Deep RL reward function design for lane-free autonomous driving

Video Playlist 13: Deep RL reward function design for lane-free autonomous driving

The videos included in this playlist are part of the results discussed in the paper titled "Deep RL reward function design for lane-free autonomous driving". In this work, a lane-free highway environment is appropriately reflected in a Deep Reinforcement Learning problem. This constitutes an entirely different problem domain for autonomous driving, compared to lane-based traffic, as vehicles consider the two-dimensional road space available, and their decision-making needs to adapt accordingly. In addition, each vehicle wishes to maintain a (different) desired speed, which creates many situations where vehicles need to overtake or to react appropriately to the behavior of others. As such, in this work, a Reinforcement Learning agent is designed for the problem at hand, considering different components of reward functions tied to the environment at various levels of information. The results are obtained using the Deep Deterministic Policy Gradient (DDPG) algorithm.

Trained RL agent at 70 veh/km density

In this video, we showcase how a trained agent with the most prominent reward function drives in a traffic demsity of 70 veh/km, for a highway of 10.2 m width (corresponding to a conventional 3-lane highway). The vehicle in focus was trained using the 'All-Components RF', and we observe that the learned policy adjusts the agent’s lateral position appropriately to avoid collisions, while also attempting to maintain its desired speed. The surrounding vehicles have a desired speed within the range [18-22] m/s, while our agent's desired speed is set to 20 m/s.

Trained RL agent at 70 veh/km density

Trained RL agent at 90 veh/km density

In this video, we can observe that the trained agent is able to handle slightly more dense surrounding traffic, as it overtakes vehicles downstream whenever there is available space, while moving more cautiously and lowering its longitudinal speed when downstream traffic blocks our agent.

Trained RL agent at 90 veh/km density

Trained RL agent at 120 veh/km density

In this video, we showcase the same approach at a more demanding scenario, since the agent is located on a highway with even higher vehicle density. As such, it generally tends to move more cautiously and maintain a slightly lower speed, in order to avoid collisions. One can observe a dangerous overtake around half-time of the video, that of course requires a proper reaction of the other vehicle. This occasional aggressive behavior should be addressed in future work, so that the trained agent’s policy is regulated according to hard constraints that focus on safety, e.g., pre-specified safety rules.