7. Machine Learning

Most detectors in the image chapter were hand-coded for a specific target: hand-tuned colour ranges for blob tracking, hand-derived weight patterns for edge filters, fixed geometric assumptions for the line and circle finders. Each algorithm covered one kind of task, and adding a new target meant writing a new algorithm. Machine learning changes the workflow. Instead of one algorithm per target, the application loads a trained model – a stack of weights produced off-board on a desktop with many example images – and runs it on the camera. The same engine that runs a face detector runs a hand-pose estimator, a body-pose tracker, an object classifier, or whatever else a model was trained for.

The ml module is the toolkit. Every operation builds on a single Model object that loads a model file from flash, manages its quantized input and output tensors, dispatches each inference to the right engine on the cam, and routes the resulting tensors through an optional post-processor that converts them back into the result form the application can act on – boxes, keypoints, classes, or whatever the model is for.