7.11. NPUs

The H7 and the RT1062 run inference on a Cortex-M CPU through TFLM and CMSIS-NN. The AE3 and the N6 add a dedicated NPU on the same die – a tensor pipeline in fixed silicon that runs the heavy operators without occupying the CPU. The two NPUs in OpenMV’s lineup come from different vendors and their toolchains are different, but the cam exposes both through the same ml.Model API. What differs is the file on disk and the runtime that walks it.

7.11.1. AE3 – Arm Ethos-U55

The AE3 carries an Arm Ethos-U55 NPU on the same die as the Cortex-M55 application core. Vela is the offline compiler that prepares a model for it: Vela takes a standard .tflite in and emits a .tflite out whose NPU-eligible subgraphs have been folded into a custom Ethos-U operator carrying the byte commands the NPU runs. At inference time, TFLM walks the file normally; the Ethos-U operator dispatches its byte commands through the Ethos-U driver, and any operator Vela did not fold falls back to CMSIS-NN on the M55.

7.11.2. N6 – ST Neural-ART

The N6 carries ST’s Neural-ART NPU and runs STAI – ST’s runtime for it – in place of TFLM. STEdgeAI is the offline compiler: it takes a model in and emits a relocatable network blob laid out for the Neural-ART hardware. STAI loads the blob from ROMFS and walks it directly on the NPU. Operator coverage is whatever STEdgeAI supports for the part.

7.11.3. Same script, different cam

Both NPUs expose the same input and output tensors with the same quantization parameters as a CPU-run model would. A script written against one cam runs on another by loading a model file prepared for that cam’s NPU. Detection thresholds, ROI handling, and post-processor wiring – the script-level decisions – do not change.