🔥 Universal Phone Agent Control

Quick Start (v2.0)

Setup

# Configure your device first
cp config.example.yaml ~/.hermes-phone-agent/config.yaml
# Edit the config file for your device type

# Test connection
python3 scripts/test_device.py

Python API (New in v2.0)

from phone_agent import PhoneAgent

# Auto-detects device from config
agent = PhoneAgent()

# Camera
agent.capture_camera()  # Captures to configured path

# Audio
agent.record_audio(duration=5, output_path="recording.wav")
agent.play_audio("response.wav")

# Screen control
agent.wake_screen()
agent.unlock_screen(pin="4658")
agent.type_text("Hello world")

# Check device status
print(agent.device.is_online())

Vision Feedback Loop (v1 Compatible)

ALWAYS follow this pattern for UI automation:

Step 1: Screenshot

bash(cmd="adb exec-out screencap -p > ./assets/screen.png")

Step 2: Analyze with Vision

bash(cmd="python3 ./scripts/vision_helper.py ./assets/screen.png \"Find all buttons and their coordinates (x,y)\"")

Step 3: Act with Exact Coordinates

# Tap at coordinates from vision analysis
bash(cmd="adb shell input tap <x> <y>")

# Swipe
bash(cmd="adb shell input swipe <x1> <y1> <x2> <y2> 300")

# Type text
bash(cmd="adb shell input text 'your text'")

Step 4: Verify

bash(cmd="adb exec-out screencap -p > ./assets/screen_after.png")
bash(cmd="python3 ./scripts/vision_helper.py ./assets/screen_after.png \"Confirm action succeeded\"")

Device Types Supported

Termux (Best Performance)

Physical Android phones with Termux installed
Direct camera access via termux-camera-photo
Audio recording via termux-microphone-record
Playback via termux-media-player
SSH + SCP for file transfers

ADB Only (Standard Android)

Any Android device with ADB enabled
Camera via screencap (screen capture)
Audio via screenrecord --audio (Android 10+)
Push/pull files via ADB

Emulator (No Hardware Needed)

Android Studio emulator
Genymotion
Waydroid (Linux container)
Outputs audio to host speakers

Available Commands

Basic Actions

# Tap at coordinates
bash(cmd="adb shell input tap <x> <y>")

# Swipe with duration
bash(cmd="adb shell input swipe <x1> <y1> <x2> <y2> <duration_ms>")

# Type text (escapes special chars)
bash(cmd="adb shell input text 'your text here'")

# Key events
bash(cmd="adb shell input keyevent KEYCODE_HOME")
bash(cmd="adb shell input keyevent KEYCODE_BACK")
bash(cmd="adb shell input keyevent KEYCODE_ENTER")
bash(cmd="adb shell input keyevent KEYCODE_WAKEUP")

App Control

# Launch app by package
bash(cmd="adb shell am start -n com.package.name/.MainActivity")

# Force stop app
bash(cmd="adb shell am force-stop com.package.name")

# Open URL
bash(cmd="adb shell am start -a android.intent.action.VIEW -d 'https://example.com'")

Screen Management

# Wake screen
bash(cmd="adb shell input keyevent KEYCODE_WAKEUP")

# Take screenshot
bash(cmd="adb exec-out screencap -p > ./assets/screen.png")

# Record screen with audio (Android 10+)
bash(cmd="adb shell screenrecord --audio /sdcard/recording.mp4")

Configuration Examples

Samsung Galaxy S10 with Termux

name: "Samsung Galaxy S10"
device_type: "termux"
ip_address: "100.93.96.90"
ssh_port: 8022
ssh_key: "~/.ssh/phone_access"
screen_pin: "4658"

Surface Duo 2

name: "Surface Duo 2"
device_type: "termux"
ip_address: "100.79.15.54"
ssh_port: 8022
ssh_key: "~/.ssh/phone_access"

Standard Android (No Termux)

name: "Pixel Phone"
device_type: "adb"
ip_address: "100.x.x.x"
screen_pin: "1234"

Emulator

name: "Android Studio Emulator"
device_type: "emulator"
ip_address: "localhost"
adb_port: 5555
screen_pin: null

Advanced Usage

Multi-Device Management

from phone_agent import PhoneAgent
from config.device_config import load_config

# Load different configs
config1 = load_config("~/.hermes-phone-agent/phone1.yaml")
config2 = load_config("~/.hermes-phone-agent/phone2.yaml")

# Create agents for each
agent1 = PhoneAgent(config1)
agent2 = PhoneAgent(config2)

# Control both devices
agent1.type_text("Hello from phone 1")
agent2.type_text("Hello from phone 2")

Vision-Driven Automation

from phone_agent import PhoneAgent
import requests

agent = PhoneAgent()

# Screenshot
agent.device._adb_shell("screencap -p /sdcard/screen.jpg")
agent.device._adb_pull("/sdcard/screen.jpg", "./assets/screen.png")

# Send to vision API (using Senter Server)
response = requests.post(
    "http://localhost:8081/v1/chat/completions",
    json={
        "model": "qwen2.5-omni:3b",
        "messages": [{
            "role": "user",
            "content": [
                {"type": "text", "text": "Find the Settings button coordinates"},
                {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
            ]
        }]
    }
)

# Parse coordinates and tap
# (implement coordinate parsing from vision response)
x, y = parse_coordinates(response.json())
agent.device._adb_shell(f"input tap {x} {y}")

Rules & Best Practices

ALWAYS screenshot before acting - Never guess coordinates
ALWAYS use vision_helper.py to get precise element locations
Use coordinates EXACTLY as provided by vision analysis
Verify after each action - Screenshot again to confirm
Handle device state - Wake and unlock screen before operations
Respect timeouts - Give devices time to respond
Test on emulators first - Safer than physical devices

Troubleshooting

"Device not found"

# Check ADB connection
adb devices

# For Termux, verify SSH
ssh -i ~/.ssh/phone_access -p 8022 droid@<IP>

# Reconnect if needed
adb connect <IP>:5555

"Permission denied"

# On physical device, enable USB debugging and authorize computer
# For wireless ADB, enable "Wireless debugging" in Developer Options

"Camera not working"

Termux devices: Install termux-camera-photo via pkg
ADB-only: Uses screencap (shows current screen, not direct camera)
Emulators: Virtual camera may need configuration

Changelog

v2.0 (Current)

✅ Universal device abstraction layer
✅ Termux, ADB, and Emulator backends
✅ Configuration-driven setup
✅ Python API for integration
✅ Multi-device management
✅ Backward compatible with v1

v1.0

✅ Vision feedback loop
✅ ADB control commands
✅ Openskills format support

Senter - AI phone companion
Hermes Agent - AI framework

burner-phone

Safety Notice

Copy this and send it to your AI assistant to learn

🔥 Universal Phone Agent Control

Quick Start (v2.0)

Setup

Python API (New in v2.0)

Vision Feedback Loop (v1 Compatible)

Step 1: Screenshot

Step 2: Analyze with Vision

Step 3: Act with Exact Coordinates

Step 4: Verify

Device Types Supported

Termux (Best Performance)

ADB Only (Standard Android)

Emulator (No Hardware Needed)

Available Commands

Basic Actions

App Control

Screen Management

Configuration Examples

Samsung Galaxy S10 with Termux

Surface Duo 2

Standard Android (No Termux)

Emulator

Advanced Usage

Multi-Device Management

Vision-Driven Automation

Rules & Best Practices

Troubleshooting

"Device not found"

"Permission denied"

"Camera not working"

Changelog

v2.0 (Current)

v1.0

Related

Source Transparency

Related Skills

OpenClaw Mobile Gateway Installer

Agent Stack Picker

Github Actions Gen

Fontpick