By Anisha Bhatnagar and Ashwin Kumar Udayakumar
All of the code for this project is available at https://github.com/ashwin2k/tmrl-rnn
The presentation for this project is available at https://docs.google.com/presentation/d/1o_5flckV6MKzOfUubradi3Pf11HMKu0q5wjviKZX-I8/edit?usp=sharing
The demo videos are available here https://drive.google.com/drive/folders/1uhuof75dGtL4r3zqn8PkkF2Jo3kGyMk3?usp=drive_link
Trackmania(2020) is a renowned racing game that offers a blend of high-speed driving and intricate track design. The application of Reinforcement Learning (RL) to Trackmania allows for the exploration of AI-driven optimization in a complex, interactive environment. Using the TMRL framework, developers can implement deep reinforcement learning strategies to train agents that navigate the game’s challenging tracks. This framework supports real-time learning, which is critical for adapting to the dynamic conditions of Trackmania’s gameplay.
Observations and Actions:
Reward Function:
Inspired by behaviorism, the environment provides a reward at each time-step as a measure of performance. Instead of merely rewarding speed, the reward function assesses how effectively the car covers the track. A demonstration trajectory is recorded and divided into points, with rewards given based on the number of points passed since the last time-step. This measures the car’s effectiveness in covering significant portions of the track efficiently.
Environments For this project, we have used three different customizable gym environments -
I. LIDAR Environment:
Offers simplified input with 19-beam LIDAR measurements, ideal for MLP models. Optimized for tracks with clear demarcations, it uses a front camera setup with the car model hidden, focusing on environmental data. Features include:
II. LIDAR with Track Progress Environment:
This enhanced LIDAR environment includes additional track completion data, enhancing predictive capabilities similar to experienced human players. It is structured as follows:
III. Hybrid Environment: Using pure vision-based models in autonomous racing like TrackMania presents significant challenges, especially in detecting and understanding crashes. These models often struggle because the TrackMania API does not provide specific data when a car crashes against walls, leading to scenarios where the car frequently hugs the wall without proper corrective actions.
Conversely, operating without vision-based systems also has its limitations. For example, non-visual systems fail to capture detailed nuances about the track, which are crucial in complex environments such as those with a street racing theme. These environments require detailed information about the track and obstacles, which pure non-visual models cannot provide adequately. Thus, both pure vision and non-vision approaches have critical gaps that can affect performance in dynamic racing scenarios.
To bridge these gaps, we created a special environment that provides extensive data inputs like screenshots and telemetry data (speed, gear, RPM) and LIDAR measurements, suitable for training with CNNs. It supports all Trackmania tracks and various camera settings, offering a flexible and robust platform. Features include:
The Goal of this environment is to imbue more data to the computer vision models. These models are not able to successfully detect crashes, therefore often restart the lap during training and lead to lower rewards. Adding LIDAR information will aid the model in knowing the exact distances from the track edges thereby reducing the collisions. However, this environment is extremely complex owing to the high number of observations being extracted from each state.
The Soft Actor-Critic (SAC) algorithm is highly effective for environments requiring continuous action decisions, such as the dynamic and complex world of F-1 racing games like Trackmania. As a model-free, off-policy algorithm, SAC optimizes a policy that balances the dual objectives of maximizing expected return and entropy. This entropy augmentation promotes exploration by valuing uncertain states, which helps prevent premature convergence on suboptimal strategies.
Key Features of SAC:
Architectural Changes and Setup:
In the development of autonomous driving models for racing scenarios like TrackMania, integrating standard LIDAR inputs has proven crucial. These inputs provide precise distance measurements, which have significantly reduced crash occurrences by enhancing the model’s understanding of its surroundings. However, these measurements are not enough as the model may not be able to remember collisions from previous time steps at the time of decision making. Therefore, we have used these measurements with a policy network made of Gated Recurrent Units (GRU) (2 layers) with Multilayer Perceptrons (MLP) (2 layers). GRU was added to augment the model’s temporal processing capabilities. This enhancement enables the model to effectively recall and utilize past driving decisions and outcomes during decision-making processes. As a result, the model can quickly recover from crashes, showcasing improved resilience and efficiency in handling the dynamic challenges presented by racing environments.
Performance and Outcomes:
This experiment highlighted the effectiveness of RNNs in managing temporal dynamics within racing games. By integrating RNNs with LIDAR inputs, we streamlined the training process and significantly enhanced performance, indicating a robust approach for tasks that require both memory and real-time processing capabilities. The integration led to a notable reduction in crashes. Additionally, unlike the EffNet_v2 model, this setup allowed the agent to effectively navigate and recover from the crashes and successfully complete the lap. However, despite the good results, the model struggled with understanding the tradeoff between collisions and speed. In certain scenarios, the model would speed up but suffer from higher collisions. At the same time, in another scenario, the model had zero collisions but was extremely slow. Another problem with this setup was that the model was not able to predict ahead of time which direction it would need to turn. In these cases it would collide with the track edge before correcting its trajectory.
Architectural Setup and Environmental Changes:
Continuing from previous successes, Experiment 3 used the same RNN architecture but added the LIDAR with Track Progress environment. This enhancement included a track completion metric, providing the model with information about the upcoming turns and track sections in advance thereby, helping the model adjust its actions based on impending track conditions.
Performance and Outcomes:
This experiment highlighted the benefits of adding contextual and predictive data to AI training in racing games. By simulating foresight with track progress data, the model adopted a strategic and anticipatory driving style similar to experienced human players. These results indicate that such enhancements can significantly improve training efficiency and effectiveness, setting the stage for advanced AI capabilities in simulated environments.
Architecture and Training:
This experiment utilized EfficientNet_v2 followed by two MLP layers with ReLU activation. This architecture was tasked with encoding images from Trackmania and integrating these with other sensory data and LIDAR measurements to control the vehicle. The combined inputs processed through the MLP layers aimed to generate appropriate action outputs.
Performance Observations:
The model showed gradual learning improvements, indicated by reducing actor and critic losses. However, challenges included:
Despite these issues, the experiment revealed potential for significant improvements with extended training. This model is currently still in training. Right now, the model is only able to move forward by hugging the walls.
Experiment 1 (RNN with LIDAR): Addressed temporal issues with RNN integration, reducing collisions and enhancing track navigation. The shift to a LIDAR environment decreased computational demands and improved training speed, resulting in the model completing the track with a lap time of about 45 seconds.
Experiment 2 (LIDAR with Track Progress): Achieved the best results, with a lap time of 35 seconds. Incorporating track progress data enabled the model to anticipate the track layout, significantly boosting performance early in training.
Experiment 3 (EffNet in Full Environment): Demonstrated efficientnet_v2’s potential with complex visuals but was limited by slow training and the model’s tendency to hug track edges.
Each experiment provided valuable insights into AI-driven racing strategies, culminating in the successful use of track progress data with an RNN to achieve a lap time of 30 seconds. Furthermore, RNN greatly reduced the training time. RNN with LIDAR track progress trained in 1/3rd the time of RNN with just LIDAR. This approach demonstrated the potential of incorporating contextual awareness into AI systems, significantly enhancing decision-making capabilities..The outcomes indicate promising strategies for refining autonomous systems, particularly through the development of predictive navigation capabilities essential in autonomous vehicles. Further research could explore more sophisticated neural network architectures, such as Transformers, to capture spatial-temporal dynamics more effectively. Hybrid models combining CNNs and RNNs might also improve performance. The training process would also benefit from a full-scale hyperparameter search and extensive training to explore AI’s capabilities in high-speed racing environments more deeply.