On a constrained device, inference hits memory, energy and latency. State-space models process continuous signals at constant memory cost, where Transformers break down.
On an embedded device, inference is never free. Three walls stand in the way. Available memory, energy budget and acceptable latency. A model that ignores these limits never leaves the test bench.
Why the Transformer breaks down
The Transformer pays for attention. Its memory cost grows with context length, because each token looks at every other. On a continuous signal that never stops, the context grows without end. Memory explodes and latency follows. The architecture is ill-suited to a permanent stream.
On the edge, this growing cost is a deal-breaker. You cannot reserve memory that depends on an unbounded history. You need a bounded, predictable cost, independent of the signal's duration.
The constant cost of SSMs
State-space models process the sequence through a fixed-size recurrent state. Memory cost stays constant whatever the length, while keeping a good hold on long context. That is exactly the profile of a continuous signal on a constrained device.
The challenge then shifts to integration. Weight quantization, state-window management, synchronization with the acquisition chain. The principle to keep. For a continuous signal on the edge, a fixed-size state beats an attention that grows without limit.