Kirsten Odendaal

Fast Lap? Going for Pole using RL

deepracer

A light-hearted tour of my AWS DeepRacer reinforcement-learning project


Warm-up Lap — Why race toy cars with RL?

If you can teach a 1 / 18-scale racer to hug corners, dodge cones and out-smart rival bots, you’ve basically distilled every juicy challenge in autonomous driving into a bite-size sandbox. AWS DeepRacer gives us that sandbox: simulated tracks, stereo cameras, a LIDAR ring and just enough physics to make you fist-pump when the car finishes a lap cleanly. My mission? Train one brain that blasts around three tracks and survives three very different race modes—solo time-trials, static-obstacle slaloms and chaotic head-to-head heats—without rage-quitting into a wall.


Meet the Pit-Crew Algorithms

Proximal Policy Optimization (PPO)—the steady race engineer

I picked PPO (clipped) because it updates the policy in baby steps—like tightening wheel-nuts a quarter-turn at a time instead of yanking them off. The clipped objective keeps the new policy from wandering too far from the safe baseline, which is gold when every bad update sends your car lawn-mowing.

A camera-LIDAR dream team

The agent sees the world through a 3-layer CNN that chews on twin 160 × 120 grayscale images, while a mini-MLP digests a 64-ray LIDAR scan. Mash those together, pass through a 512-unit “combiner,” and you have a fused ego-view that feeds both actor (which button to press) and critic (was that smart?).


Track Walk & Scenarios

tracks
Track nickname Real name Length Width Personality
A-Z Speedway reInvent2019 Wide 16.6 m 1.07 m Friendly oval for first dates
Smile Speedway reInvent2019 Track 23.1 m 1.07 m Twisty grin-shaped sprint
Empire City New York Track 21.9 m 0.76 m Skinny, skyscraper-tight corners

Each track is tackled in three flavors:

  1. Time-Trial – no traffic, just vibes
  2. Obstacle-Avoidance – six static barrels begging for bumper kisses
  3. Head-to-Bot – three rambunctious opponent cars that never heard of personal space

Reward Shaping — bribing the driver

Think of rewards as snacks you toss at the car to encourage good manners:

Weights? One-third each to center, progress and obstacle penalties—because democracy.


Training Regimen — from baby steps to Nürburgring

  1. Fixed start on A-Z – learn to drive in a straight line without crying.
  2. Random starts – spawn the agent anywhere on the track so it can’t memorise scenery.
  3. Random direction + harder tracks – reverse laps and graduate to the slim-fit Empire City.

Twenty-thousand simulation steps (~2 hours on a plain laptop) with PPO hyper-params straight from the OpenAI cookbook (γ = 0.99, ε = 0.2, 5 epochs per 512-step rollout).


Race-Day Results

Time-Trials: Flawless victory

Obstacle slalom: Traffic-cone trauma

Head-to-Head: Bumper-car chaos

Moral: Our solo-racing prodigy panics when the playground gets crowded.


Lessons from the Crash Logs


Next Upgrades

  1. Multi-task curriculum – sprinkle obstacles and bots during training; maybe even self-play.
  2. Lean rewards – axe the micromanagement; reward progress & penalize collisions, let the network figure out the jazz steps.
  3. Opponent modelling – give the car a crystal ball (recurrent net) to guess rival moves.
  4. More track time & hyper-tuning – because GPUs don’t need sleep.

Checkered Flag

We turned a timid toy into a track-lapping champ—as long as nobody else shows up. The project proves that curriculum learning plus PPO can nail geometry generalization, yet true robustness demands training that mirrors real-world mayhem. Next season, the car’s getting street-smarts, thicker skin and maybe a flamenco horn for overtakes. Stay tuned! 🏁


Note: If the embedded PDF is not displayed properly or if you are viewing this on a mobile device, please click here to access the PDF directly.