7.15. Writing your own¶
When the catalogue does not cover a model – a research network
whose output layout is bespoke, a tweak to an existing
architecture, a tensor whose semantic interpretation is
application-specific – the application provides its own
post-processor. The protocol is plain: a callable that takes
(model, inputs, outputs) and returns whatever the application
expects from predict().
A class with __call__ is the conventional form:
class MyPostprocessor:
def __init__(self, threshold=0.5):
self.threshold = threshold
def __call__(self, model, inputs, outputs):
...
return result
A plain function works too – the engine only checks that the object is callable.
7.15.1. Hooking it in¶
Two attachment points. The postprocess= kwarg on the
constructor binds the callable for every
predict() call on the model:
model = ml.Model("/rom/my_model.tflite",
postprocess=MyPostprocessor())
To override the binding for a single call – swap decoders without
re-loading the model – pass callback= to predict directly:
result = model.predict([img], callback=MyOtherPostprocessor())
The callable signature is the same in either case.
7.15.2. What the callable receives¶
model– theModelinstance, useful for the quantization parameters (output_scale,output_zero_point,output_dtype) and the input dimensions (input_shape).inputs– the list of inputs the application passed topredict(). The first element is usually the boundNormalizationinstance; itsroiattribute is whatNMSexpects for remapping boxes back into the original image.outputs– the raw output tensors as a list ofndarrayobjects, in their native dtype. Float outputs arrive as-is; integer outputs arrive quantized.
7.15.3. Quantized arithmetic¶
The shipped decoders all reach for the same helpers in
ml.utils, and a custom one usually wants the same pattern:
quantize() lifts a float threshold into the
model’s quantized space, threshold() filters
without dequantizing the whole tensor, and
dequantize() runs once on the survivors.
sigmoid() and logit() are
available for networks whose output channels are pre-sigmoid
logits (the MediaPipe detectors are the canonical case).
For models with float outputs – regression heads, models with a final dequantize layer baked in – the quantization helpers pass through unchanged, so the same post-processor code works against either dtype without special-casing.
7.15.4. Return value¶
Whatever the callable returns is what predict()
returns. For box-emitting decoders the convention is to push
candidates through an NMS and return its
per-class lists – the call shape non-max suppression documents and the
YOLOv8 walkthrough builds in context.
For anything else, return whatever the application finds
convenient: a single ndarray, a label
string, a tuple of (class, score, embedding), a dictionary.