Wrap up ======= The chapter walked through the parts of :mod:`ml` an OpenMV application reaches for when an inference step is part of the pipeline: * **Concepts** -- what a neural network is in arithmetic terms (a stack of trainable operators that maps a tensor to a tensor), what machine learning changed compared to classical image processing (the human-written summary algorithm is gone, replaced by weights learned from labelled data), and the hello demo that ran a face detector in a handful of lines of Python. * **The ml module** -- the :class:`ml.Model` object and its properties for inspecting input and output tensors, the model file paths it accepts, and where those files live: a read-only :doc:`ROMFS ` partition for execution directly from flash, or any other MicroPython filesystem when the model can be copied into RAM at load time. * **The inference pipeline** -- the three stages :meth:`~ml.Model.predict` runs in sequence (pre-process, engine dispatch, post-process), the :class:`~ml.preprocessing.Normalization` handle on stage one, the post-processor handle on stage three, and the quantization arithmetic that ties the integer tensors the cam runs back to the real-valued numbers the network was trained against. * **Inference engines** -- TFLM (the operator interpreter most cams run), CMSIS-NN (the SIMD kernel library underneath it on Cortex-M), and the NPUs (Arm's Ethos-U55 on the AE3 paired with the Vela offline compiler, ST's Neural-ART on the N6 paired with STAI and STEdgeAI). The engine is fixed by the cam; the script does not pick it. * **Decoding the output** -- the post-processors that turn raw output tensors into boxes, keypoints, or per-class lists, the :class:`~ml.utils.NMS` class that collapses overlapping candidates, the YOLOv8 walkthrough that shows how to keep the decode fast by thresholding before dequantizing, and the protocol for writing a custom decoder when the catalogue does not cover a model. What's now in reach ------------------- Three things the chapter prepares for: * **Loading a trained model and running it.** Anything in ``/rom/`` works without further preparation; anything supplied externally as a compatible ``.tflite`` works after the offline tool for the target cam (Vela for the AE3, STEdgeAI for the N6) has produced the right layout. * **Decoding any output tensor.** When the architecture is in the catalogue, the right post-processor is mechanical: :class:`~ml.postprocessing.ultralytics.YoloV8` for a YOLOv8 model, :class:`~ml.postprocessing.mediapipe.BlazeFace` for BlazeFace, and so on. When it is not, the :doc:`writing-your-own ` protocol covers the contract and the :doc:`YOLOv8 walkthrough ` is the cleanest reference to copy from. * **Reasoning about performance.** A model that runs at 30 FPS on an NPU may run at 3 FPS on a Cortex-M7; the ratio depends on how much of the network the cam can lift off the CPU. Quantization, ROMFS placement, NPU compilation, and the operator coverage of the target engine are the four levers, and the chapter covered each of them. ML composes with the rest of the cam ------------------------------------ An inference rarely runs in isolation. The image module captures and pre-processes the frame, the ml module runs the network, and :mod:`ulab.numpy` does whatever numeric work neither side has a built-in for. A typical detection script combines all three: capture with :mod:`csi`, optionally adjust the frame with :mod:`image`, run :meth:`~ml.Model.predict`, post-process the result with the right module out of :mod:`ml.postprocessing`, and reach for :mod:`ulab.numpy` for any custom math the application wants on top of the boxes the post-processor returned. The three modules share the same memory model; the boundaries between them are zero-copy wherever possible.