Building an LLM Robot with My Son — EP 3. Local LLM Speed Compared Across Mac M1, M4, and M4 Pro
Building an LLM Robot with My Son — EP 3. Local LLM Speed Compared Across Mac M1, M4, and M4 Pro The first time I ran a local LLM on the Mac mini M1, I watched Qwen2.5-7B output tokens one character at a time and paused for a second. About 8 tokens per second. Not slow, exactly. But whether that's fast enough for real-time robot control is a different question — how long does it take from the robot sending a camera frame to receiving a command back? That needed a measurement, not a guess. I had three Macs already: Mac mini M1 16GB, Mac mini M4 24GB, MacBook Pro M4 Pro 14" 24GB. Same prompt, same model, three machines. The comparison made itself. Test Setup Model: Qwen2.5-7B-Instruct, Q4_K_M quantization. Both mlx-lm and llama.cpp Metal backend, measured separately. Metrics: - tok/s : tokens generated per second - TTFT : Time to First Token - Memory usage : at 32K and 128K context - Thermals : CPU/GPU temperature after 5 minutes of sustained load Three prompt types:...