7.6. Anatomy of predict¶

Model.predict(inputs, *, callback=None) is where the loaded model object actually does work. Between the inputs going in and the result coming out, three stages run in sequence: pre-process, engine dispatch, post-process. Two of the three take parameters the script controls directly; the engine in the middle is decided by the cam.

The three stages of predict(). Pre-process and post-process take parameters the script controls; the engine in the middle is fixed by the cam.¶

7.6.1. Pre-process¶

The pre-process stage turns each input into the dense tensor the network expects. The most common input is an image.Image, captured in RGB565. The stage crops and resizes it to the network’s input_shape, converts from RGB565 to the channel format the network was trained on (RGB888 for most vision networks), applies per-channel scale and offset, and – when the network expects integer input – quantizes to the model’s input_dtype in the same pass. Networks trained for float input skip the quantization step and receive the scale-and-offset result directly.

The default ml.preprocessing.Normalization reads the model’s input dtype and runs the right transformation automatically. A hand-tuned Normalization overrides the scale, mean, and stdev values for models trained against custom channel statistics (the ImageNet-derived means and standard deviations are a common case). A plain callable overrides the stage entirely – useful when the input is not an image at all or when the application has already produced the dense tensor itself.

7.6.2. Engine dispatch¶

The engine stage runs the network. Which engine it dispatches to is fixed by the cam: the H7 and RT1062 run TFLM (the TensorFlow Lite for Microcontrollers interpreter, dispatching ARM-optimised CMSIS-NN kernels where they exist); the AE3 runs the same TFLM interpreter with its Cortex-M55 fallback and the Ethos-U NPU handling any operator the offline Vela compiler tagged for the accelerator; the N6 runs STAI, ST’s runtime for the N6’s purpose-built NPU.

The script does not pick the engine. The engine that ships with the cam runs every model the cam loads.

7.6.3. Post-process¶

The post-process stage turns the network’s raw output tensors back into a usable result. The default behaviour is to dequantize each output tensor to floating point (or pass it through unchanged for networks with float outputs) and return them as a list of ndarray objects. Most applications register a post-processor – a callable that knows the network’s output layout – to decode the tensors into the result form the application acts on: a list of bounding boxes, a list of keypoints, a list of classes.

The script controls this stage in two ways. The postprocess= keyword on the constructor registers a post-processor that runs on every call. The callback= keyword on predict() overrides the registered post-processor for one call only – useful for switching between several decoders without re-loading the model. Either form receives (model, inputs, outputs) and returns whatever the application expects.

7.6.4. What the script controls¶

Pre-process and post-process are the script’s two handles. The default pre-processor handles most vision models; the right post-processor for a given network family is picked from the catalogue under ml.postprocessing. The engine in the middle is decided by the build and runs the same way regardless of what the script asks for.