Skip to main content

Chapter 18: Voice ROS 2 Integration

Overview

This chapter explores integrating Whisper-based speech recognition with ROS 2 for robot control. You'll learn how to map voice commands to robot actions, creating intuitive voice-controlled robotics systems.

Learning Objectives

Learning Objectives

By the end of this chapter, you will be able to:

  • Integrate Whisper with ROS 2 control systems
  • Parse natural language commands into robot actions
  • Implement command vocabularies for robot control
  • Handle ambiguous commands and error cases

Natural Language Understanding

Convert recognized text into structured commands by extracting action verbs (move, pick, place, etc.), identifying objects and locations, parsing numerical parameters, and handling complex multi-step commands.

Command Parsing

import re

class CommandParser:
def __init__(self):
self.move_patterns = [
r'move to (.+)',
r'go to (.+)',
r'navigate to (.+)'
]

self.pick_patterns = [
r'pick up the (.+)',
r'grab the (.+)',
r'take the (.+)'
]

def parse_command(self, text):
text = text.lower().strip()

for pattern in self.move_patterns:
match = re.search(pattern, text)
if match:
return {'action': 'move', 'target': match.group(1)}

for pattern in self.pick_patterns:
match = re.search(pattern, text)
if match:
return {'action': 'pick', 'object': match.group(1)}

return {'action': 'unknown', 'raw': text}

Integration with ROS 2

ROS 2 Node Structure

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from geometry_msgs.msg import Pose
from sensor_msgs.msg import AudioData

class VoiceCommandNode(Node):
def __init__(self):
super().__init__('voice_command_node')

self.move_pub = self.create_publisher(Pose, 'move_command', 10)
self.action_pub = self.create_publisher(String, 'action_command', 10)

self.audio_sub = self.create_subscription(
AudioData, 'audio_input', self.audio_callback, 10)

self.timer = self.create_timer(0.1, self.process_audio)

self.whisper_model = whisper.load_model("small")

def audio_callback(self, msg):
audio_array = np.frombuffer(msg.data, dtype=np.int16)
result = self.whisper_model.transcribe(audio_array)
command_text = result["text"]

self.execute_command(command_text)

def execute_command(self, command_text):
parser = CommandParser()
parsed_command = parser.parse_command(command_text)

if parsed_command['action'] == 'move':
self.send_move_command(parsed_command['target'])
elif parsed_command['action'] == 'pick':
self.send_pick_command(parsed_command['object'])

Voice Command Vocabulary

Basic Navigation Commands

  • "Go to the kitchen"
  • "Move to the table"
  • "Navigate to the charging station"
  • "Return to base"

Manipulation Commands

  • "Pick up the red cup"
  • "Place the book on the shelf"
  • "Open the door"
  • "Close the drawer"

System Commands

  • "Stop" or "Halt"
  • "Pause"
  • "Resume"
  • "Status"

Summary

Integrating Whisper with ROS 2 enables voice-controlled robotics systems. Command parsing translates natural language into structured robot actions, while ROS 2 integration provides the communication infrastructure for executing commands on physical robots.

Key Takeaways

Key Takeaways
  • Command parsing extracts structured actions from natural language
  • ROS 2 integration enables seamless voice-to-action pipelines
  • Well-defined command vocabularies improve recognition accuracy
  • Error handling and confirmation improve system reliability

What's Next

In the next chapter, we'll explore cognitive planning with LLMs, learning how to use large language models for high-level robot task planning and reasoning.

AI Assistant
How can I help you today?