Polyglot Translator Robot
OCR, YOLO, Python, ROS2, MoveIt2, Emika Franka Robot Arm
Authors: Allen Liu, Damien Koh, Kassidy Shedd, Henry Brown, Megan Black
GitHub: View this project on GitHub
Project Description
Lead the team in creating a sophisticated system that can process input from either text written on a whiteboard or audio received through a microphone. This ambitious project involves multiple stages, including text and speech recognition, language identification, translation into the target language, and the unique aspect of controlling a robot arm to physically write the translated text onto a whiteboard. The team will need to implement strong text and speech recognition capabilities, incorporating language detection mechanisms for accurate translation. The integration of a robot arm introduces a novel challenge, requiring seamless communication between language processing components and the robotic control system. Collaboration across domains such as natural language processing, machine learning, computer vision, and robotics is essential for success. Emphasize usability, accuracy, and real-time responsiveness throughout the development process, with regular testing and iterative improvements to refine the system’s performance. As the team leader, encourage a creative and problem-solving mindset, fostering effective communication and coordination to bring together the diverse components into a cohesive and functional system.
Architecture
This project consists of 5 subsystems, in which each group member is in charge of one of them:
writer
: Allen was in chage of this subsystem, which uses thecartesian path planner
frommoveit2
package to find and execute the path for the robot to write the specific characters on the write board, while calibrating the relative location of the whiteboard using the apriltags.translation
: Damine was in charge of this subsystem, which callsgoogle-translation
API for translating from the source language to the target language.computer_vision
: Megan was in charge of this subsystem, which uses theYOLO
andocr
python package for recongnizing texts and human detection.string2waypoints
: Kassidy as in charge of this subsystem, which uses thematplotlib
python package to generate the waypoints for robot to travel through.apriltags
: Henry was in charge of this subsystem, which uses theapriltag_ros
package for detecting the location and orientation of each apriltag, used for pinpointing the location and orientation of the whiteboard.
Features
Translate from Chinese to English
Translate from German to French
Translate from Spanish to Korean
Translate from Simpified Chinese to Traditional Chinese
Hindi Voice to English
Spanish Voice to English
Challenges
- Cartesian Path Planner: When initially incorporating the
find cartesian path
functionality using theMoveIt
API, we encountered a challenge whereRViz
indicated that the robot had identified the path but was unable to execute it. To address this issue, we examined our code related to theMoveIt
API, specifically focusing on the function responsible for calling theComputeCartesianPath
service. Upon comparing our implementation with the officialMoveIt
documentation, we identified a crucial missing parameter known ascartesian_speed_limit_link
, which had not been specified in our code. Once we addressed this omission and provided the necessary parameter, the robot successfully executed the intended movements. - TF tree when integrating
apriltags
: Upon the initial implementation ofapriltags
on the robot, we encountered an issue where the robot occasionally failed to move as intended, leading to collisions when approaching certain orientations and positions. To address this challenge, our debugging process involved a thorough examination of theTF tree
associated with the robot. We conducted numerous experiments by sending various commands, instructing the robot to move in all possible directions. During this investigation, a crucial insight emerged when analyzing theTF tree
. It was discovered that with the introduction ofapriltags
into the system, the root frame of theTF tree
shifted frompanda_link0
, the base frame of the robot, tocamera_link
. Consequently, the commands we were sending were relative to thecamera_link
frame rather than the base frame. Upon rectifying this discrepancy, specifically aligning the commands with the correct base frame, the robot executed movements flawlessly.
Possible Improvements
- Some script language still fail to detect: To resolve this issue, we can try to refind the language model by training more dataset on the script languages making it easier to detect the script languages.
- Sometimes, the camera source get dropped, need to re-launch all: This happened because sometimes the
realsense
camera package does not detect the camera successfully so that it will throw can error when that happens. To address this issue, we can surround that with a protect function that catch the error when it throws and try it again.