7.15. Writing your own

When the catalogue does not cover a model – a research network whose output layout is bespoke, a tweak to an existing architecture, a tensor whose semantic interpretation is application-specific – the application provides its own post-processor. The protocol is plain: a callable that takes (model, inputs, outputs) and returns whatever the application expects from predict().

A class with __call__ is the conventional form:

class MyPostprocessor:
    def __init__(self, threshold=0.5):
        self.threshold = threshold

    def __call__(self, model, inputs, outputs):
        ...
        return result

A plain function works too – the engine only checks that the object is callable.

7.15.1. Hooking it in

Two attachment points. The postprocess= kwarg on the constructor binds the callable for every predict() call on the model:

model = ml.Model("/rom/my_model.tflite",
                 postprocess=MyPostprocessor())

To override the binding for a single call – swap decoders without re-loading the model – pass callback= to predict directly:

result = model.predict([img], callback=MyOtherPostprocessor())

The callable signature is the same in either case.

7.15.2. What the callable receives

  • model – the Model instance, useful for the quantization parameters (output_scale, output_zero_point, output_dtype) and the input dimensions (input_shape).

  • inputs – the list of inputs the application passed to predict(). The first element is usually the bound Normalization instance; its roi attribute is what NMS expects for remapping boxes back into the original image.

  • outputs – the raw output tensors as a list of ndarray objects, in their native dtype. Float outputs arrive as-is; integer outputs arrive quantized.

7.15.3. Quantized arithmetic

The shipped decoders all reach for the same helpers in ml.utils, and a custom one usually wants the same pattern: quantize() lifts a float threshold into the model’s quantized space, threshold() filters without dequantizing the whole tensor, and dequantize() runs once on the survivors. sigmoid() and logit() are available for networks whose output channels are pre-sigmoid logits (the MediaPipe detectors are the canonical case).

For models with float outputs – regression heads, models with a final dequantize layer baked in – the quantization helpers pass through unchanged, so the same post-processor code works against either dtype without special-casing.

7.15.4. Return value

Whatever the callable returns is what predict() returns. For box-emitting decoders the convention is to push candidates through an NMS and return its per-class lists – the call shape non-max suppression documents and the YOLOv8 walkthrough builds in context. For anything else, return whatever the application finds convenient: a single ndarray, a label string, a tuple of (class, score, embedding), a dictionary.