Turtlebot3 Navigation Using Deep Reinforcement Learning

C++, ROS2, Open Robotics Turtlebot3 Burger, 2D LiDar, OpenAI Gymnasium, Deep Reinforcement Learning

Description

In this project, I designed and implemeted a OpenAI gymnasium environment for training a model for controlling the turtlebot3 to navigate through a hallway.

Demo Video

The result of this project cam be shown in the video below:

Odometry Model

For this project, since the focus is the control of a turtlebot3, not localization, so I assume the perfect odometry for this project.

To get the odometry from the wheel angle and velocity, it applied kinematics model from Modern Robotics to conduct the Forward Kinematics and Inverse Kinematics of the robot.

The geometry model of the robot can be present using $H$ matrix.

\[\begin{align*} H(\theta) &= \begin{bmatrix} h_1(\theta) \\ h_2(\theta) \end{bmatrix} \\ h_i(\theta) &= \frac{1}{r_i \cos{\gamma_i}} \begin{bmatrix} x_i \sin{\left (\beta_i + \gamma_i \right )} - y_i \cos{\left (\beta_i + \gamma_i \right)} \\ \cos{\left ( \beta_i + \gamma_i + \theta \right )} \\ \sin{\left ( \beta_i + \gamma_i + \theta \right )} \end{bmatrix}^\intercal \end{align*}\]

For the Turtlebot3, which is a diff-drive mobile robot. The parameters are defined as

\[\begin{align*} x_1 &= 0 \\ x_2 &= 0 \\ y_1 &= d \\ y_2 &= -d \\ \beta_1 &= 0 \\ \beta_2 &= 0 \\ \gamma_1 &= -\pi/2 \\ \gamma_2 &= -\pi/2 \\ r_1 &= r \\ r_2 &= r \end{align*}\]

So that $H(0)$ can be calculated as

\[H(0)=\begin{bmatrix} -d & 1 & -\infty \\ d & 1 & -\infty \end{bmatrix}\]

Where $d$ is the one-half of the distance between the wheels and the $r$ is the radius of the wheel.

Hence $F$ matrix can be calculated by

\[F=H(0)^\dagger = r \begin{bmatrix} -\frac{1}{2d} & \frac{1}{2d} \\ \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{bmatrix}\]

Forward Kinematics

The forward kinematics is to find the tranform from current state to next state $T_{bb'}$ given the change in wheel positions $\left( \Delta\phi_l, \Delta\phi_r\right)$.

Find the body twist ${\cal V}_b$ from the body frame of the current state to next state:

\[{\cal V}_b = F \Delta \phi = r \begin{bmatrix} -\frac{1}{2d} & \frac{1}{2d} \\ \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \Delta \phi_l \\ \Delta \phi_r \end{bmatrix}\]

Construct the 3d twist by

\[{\cal V}_{b6} = \begin{bmatrix} 0 \\ 0 \\ {\cal V}_b \\ 0 \end{bmatrix}\]

Then, the transform from current state to next state $T_{bb'}$ can be calculated by

\[T_{bb'}=e^{[{\cal V}_{b6}]}\]

Inverse Kinematics

Inverse kinematics is to find the wheel velocities $\left(\dot{\phi_l}, \dot{\phi_r} \right)$ given the body twist ${\cal V}_b$.

The wheel velocities can be calculated as

\[\begin{bmatrix} \dot{\phi}_l \\ \dot{\phi}_r \end{bmatrix} = H(0){\cal V}_b = \frac{1}{r}\begin{bmatrix} -d & 1 & -\infty \\ d & 1 & -\infty \end{bmatrix} \begin{bmatrix} \omega_z \\ v_x \\ v_y \end{bmatrix}\]

Since this robot is non-holonomic so that it cannot move sideways. Hence $v_y=0$, then the wheel velocities $\left(\dot{\phi_l}, \dot{\phi_r} \right)$ are:

\[\begin{align*} \dot{\phi}_l &= \frac{1}{r} \left ( -d \omega_z + v_x\right) \\ \dot{\phi}_r &= \frac{1}{r} \left ( d \omega_z + v_x\right) \\ \end{align*}\]

OpenAI Gymnasium

To start the training pipeline of the turtlebot3, I wrapped the ROS2 turtlebot3 simulation within the OpenAI gymnasium environment.

Observation Space

The observation space for the environment can be defined as

\[\begin{align*} x &= -1 \rightarrow 4 \\ y &= -0.5 \rightarrow 4.5 \\ \theta &= -\pi \rightarrow \pi \end{align*}\]

Action Space

The action space can be defined as

\[\begin{align*} \dot{x} &= 0.0 \rightarrow 0.2 \\ \dot{\theta} &= -0.5 \rightarrow 0.5 \end{align*}\]

Reward

The reward is defined as -1 when it hits a wall and 1 when it reaches the goal.

Traning

The training process is shown in the figure below, in which every robot will be reset to its original position whenever it hits a wall.

Packages

nuturtle_description: Visualizing the pose of the turtlebot3 burger
turtlelib: Call customized library functions for kinematics and slam calculations.
nusim: Simulate the turtlebot in the real world.
nuturtle_control: Performing the turtlebot kinematics calculation and parsing the command sent to the turtlebot3.
nuslam: Performs the SLAM algorithm for correcting the error odometry.
nuturtle_interfaces: All message interfaces for communication between the nodes
nuturtle_slam: Integrate slam_toolbox for setting up learning environment
nuturtle_deeprl: Wrapped with OpenAI Gymnasium for Deep Reinforcement Learning traing environment

10 Jun 2024

« AI-Powered Autunomous Tangram Puzzle Solver

Jingkun (Allen) Liu