Chapter 18: Voice ROS 2 Integration

Overview

This chapter explores integrating Whisper-based speech recognition with ROS 2 for robot control. You'll learn how to map voice commands to robot actions, creating intuitive voice-controlled robotics systems.

Learning Objectives

By the end of this chapter, you will be able to:

Integrate Whisper with ROS 2 control systems
Parse natural language commands into robot actions
Implement command vocabularies for robot control
Handle ambiguous commands and error cases

Natural Language Understanding

Convert recognized text into structured commands by extracting action verbs (move, pick, place, etc.), identifying objects and locations, parsing numerical parameters, and handling complex multi-step commands.

Command Parsing

import re

class CommandParser:
    def __init__(self):
        self.move_patterns = [
            r'move to (.+)',
            r'go to (.+)',
            r'navigate to (.+)'
        ]

        self.pick_patterns = [
            r'pick up the (.+)',
            r'grab the (.+)',
            r'take the (.+)'
        ]

    def parse_command(self, text):
        text = text.lower().strip()

        for pattern in self.move_patterns:
            match = re.search(pattern, text)
            if match:
                return {'action': 'move', 'target': match.group(1)}

        for pattern in self.pick_patterns:
            match = re.search(pattern, text)
            if match:
                return {'action': 'pick', 'object': match.group(1)}

        return {'action': 'unknown', 'raw': text}

Integration with ROS 2

ROS 2 Node Structure

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from geometry_msgs.msg import Pose
from sensor_msgs.msg import AudioData

class VoiceCommandNode(Node):
    def __init__(self):
        super().__init__('voice_command_node')

        self.move_pub = self.create_publisher(Pose, 'move_command', 10)
        self.action_pub = self.create_publisher(String, 'action_command', 10)

        self.audio_sub = self.create_subscription(
            AudioData, 'audio_input', self.audio_callback, 10)

        self.timer = self.create_timer(0.1, self.process_audio)

        self.whisper_model = whisper.load_model("small")

    def audio_callback(self, msg):
        audio_array = np.frombuffer(msg.data, dtype=np.int16)
        result = self.whisper_model.transcribe(audio_array)
        command_text = result["text"]

        self.execute_command(command_text)

    def execute_command(self, command_text):
        parser = CommandParser()
        parsed_command = parser.parse_command(command_text)

        if parsed_command['action'] == 'move':
            self.send_move_command(parsed_command['target'])
        elif parsed_command['action'] == 'pick':
            self.send_pick_command(parsed_command['object'])

Voice Command Vocabulary

"Go to the kitchen"
"Move to the table"
"Navigate to the charging station"
"Return to base"

Manipulation Commands

"Pick up the red cup"
"Place the book on the shelf"
"Open the door"
"Close the drawer"

System Commands

"Stop" or "Halt"
"Pause"
"Resume"
"Status"

Summary

Integrating Whisper with ROS 2 enables voice-controlled robotics systems. Command parsing translates natural language into structured robot actions, while ROS 2 integration provides the communication infrastructure for executing commands on physical robots.

Key Takeaways

Command parsing extracts structured actions from natural language
ROS 2 integration enables seamless voice-to-action pipelines
Well-defined command vocabularies improve recognition accuracy
Error handling and confirmation improve system reliability

What's Next

In the next chapter, we'll explore cognitive planning with LLMs, learning how to use large language models for high-level robot task planning and reasoning.

Overview​

Learning Objectives​

Natural Language Understanding​

Command Parsing​

Integration with ROS 2​

ROS 2 Node Structure​

Voice Command Vocabulary​

Basic Navigation Commands​

Manipulation Commands​

System Commands​

Summary​

Key Takeaways​

What's Next​