Building an LLM Robot with My Son — EP 6. Connecting the Robot to the LLM Server over LAN

Building an LLM Robot with My Son — EP 6. Connecting the Robot to the LLM Server over LAN

The robot needed to talk to the LLM server.

Until now the robot ran standalone — HC-SR04 measuring distance, motors responding to code. That works for basic behavior. But the whole point of this project is an LLM that makes decisions. The robot sends camera frames and sensor data to the LLM server, the LLM decides what to do, the command comes back. That communication layer had to be built.

This episode is about how robot (edge) ↔ LLM server (Mac) gets connected.


Three Options

WebSocket: bidirectional real-time communication. Simple to implement, HTTP-based so firewall issues are minimal. Works well for a setup where the robot streams data and the server streams commands back.

gRPC: Google's RPC framework. Protocol Buffers serialization means smaller payloads than WebSocket. Type safety and streaming support are both there. But setup is heavier — Protobuf schemas need to be maintained on both client and server.

ROS2 over LAN: robot-native middleware. DDS-based pub/sub topology. Native integration with the robot framework. But while the robot runs Arduino, ROS2 can't run directly on the device. This becomes relevant after the Pi migration.

The current edge device is Arduino. Arduino can't run HTTP clients directly. Attaching a WiFi shield or serial-WiFi bridge is possible but adds significant complexity.

So I structured it differently.


What We Actually Built

Arduino connects to a laptop over USB serial. A thin Python bridge script on the laptop reads sensor data from the serial port and forwards it to the Mac LLM server over WebSocket. When the LLM server sends back a command, the bridge relays it to Arduino over serial.

[Arduino] ←serial→ [Python bridge] ←WebSocket/LAN→ [Mac LLM server]

Arduino doesn't need to handle WiFi at all. The laptop handles networking. When we migrate to Pi, the bridge moves inside the Pi — no more separate laptop.


Bridge Code

The Python bridge is about 70 lines:

import asyncio
import json
import serial
import websockets

SERIAL_PORT = '/dev/cu.usbmodem14201'
BAUD_RATE = 9600
LLM_SERVER = 'ws://192.168.1.100:8765'

async def bridge():
    ser = serial.Serial(SERIAL_PORT, BAUD_RATE, timeout=1)

    async with websockets.connect(LLM_SERVER) as ws:
        print(f"Connected to LLM server: {LLM_SERVER}")

        async def read_serial():
            while True:
                line = await asyncio.get_event_loop().run_in_executor(
                    None, ser.readline
                )
                if line:
                    data = line.decode('utf-8').strip()
                    # format: "dist:23,cam:1"
                    await ws.send(json.dumps({"sensor": data}))

        async def read_commands():
            async for message in ws:
                cmd = json.loads(message)
                # format: {"action": "forward", "speed": 150}
                command_str = f"{cmd['action']},{cmd.get('speed', 0)}\n"
                ser.write(command_str.encode())

        await asyncio.gather(read_serial(), read_commands())

asyncio.run(bridge())

On the server side, llama.cpp doesn't expose a WebSocket endpoint directly, so I wrapped it with FastAPI:

from fastapi import FastAPI, WebSocket
import httpx
import json

app = FastAPI()
LLAMA_URL = "http://localhost:8080/completion"

SYSTEM_PROMPT = """
You are a robot control AI. When given sensor data, return only one of the following in JSON:
{"action": "forward", "speed": 150}
{"action": "backward", "speed": 100}
{"action": "left", "speed": 120}
{"action": "right", "speed": 120}
{"action": "stop", "speed": 0}

If obstacle distance is 20cm or less, always return stop.
"""

@app.websocket("/robot")
async def robot_ws(websocket: WebSocket):
    await websocket.accept()
    async for data in websocket.iter_text():
        sensor = json.loads(data)
        prompt = f"Sensor data: {sensor['sensor']}"

        async with httpx.AsyncClient() as client:
            resp = await client.post(LLAMA_URL, json={
                "prompt": f"{SYSTEM_PROMPT}\n\n{prompt}",
                "max_tokens": 50,
                "temperature": 0.1
            })

        result = resp.json()
        command = result['content'].strip()
        await websocket.send_text(command)

RTT Measurements

How fast is LAN-only communication?

Measurement: time from bridge sending sensor data to LLM server, until the command returns.

Path Average RTT Max RTT
Bridge → LLM server (network only) 1.2ms 4.8ms
Full round trip including LLM inference 430ms 680ms
Including serial round trip 445ms 700ms

The LAN itself is 1–5ms. The bottleneck is LLM inference — Qwen2.5-7B generating a short command takes 400–650ms on M4 Pro.

If we were using a cloud API, network latency alone would add 80–200ms on top. With a local LAN server, that layer is nearly gone. Since LLM inference is the dominant time cost, removing network overhead is a meaningful win.

Is 0.5 seconds acceptable? Depends on the use case. Our robot moves slowly, so 0.5 seconds is fine. Time-critical collision avoidance is handled at the Arduino level directly — obstacle under 15cm triggers immediate stop without waiting for LLM. The LLM handles high-level judgment only.


When the Full Pipeline First Connected

I remember the first time the whole chain worked.

Bridge running, LLM server running, Arduino connected. Terminal logs started printing:

Connected to LLM server: ws://192.168.1.100:8765
Sensor: dist:45,cam:0
LLM response: {"action": "forward", "speed": 150}
Command sent: forward,150
Sensor: dist:38,cam:0
LLM response: {"action": "forward", "speed": 150}
Sensor: dist:22,cam:0
LLM response: {"action": "forward", "speed": 100}
Sensor: dist:17,cam:0
LLM response: {"action": "stop", "speed": 0}
Command sent: stop,0

The robot moved forward, slowed as distance decreased, and stopped at 17cm.

The LLM made that decision. Not hardcoded logic in the Arduino — the LLM read the sensor data and issued "stop."

My son was watching the log. "Is the AI reading it?" Yes.


What's Still Unstable

Connection works. Stability is another question.

LLM responses occasionally come back as prose instead of JSON. "An obstacle has been detected. It would be advisable to stop." Parsing fails, command isn't delivered. Tightening the system prompt helps but doesn't eliminate it. Even at temperature=0.1, it still happens occasionally.

Temporary fix: regex extraction of JSON from the response as a fallback.

The other issue: if the bridge-server connection drops, the robot doesn't stop — it keeps executing the last command. A heartbeat mechanism is needed. If no command arrives within a threshold, Arduino stops automatically. That's next.

When we migrate to Pi, the bridge moves inside the Pi and the separate laptop disappears. The structure gets cleaner then.

댓글

이 블로그의 인기 게시물

개발자는 코드를 쓰는 사람이 아니다 — AI 시대에 남는 자리는 '책임'에 있다

Harness Engineering in Practice — How Anthropic Designs AI Agents

What Is Harness Engineering — Designing the Reins for AI Agents