Tensor I/O
==========

The engine accepts a single tensor on the input side and produces
one or more on the output side. The tensors are
:class:`~ulab.numpy.ndarray` objects with the shape, dtype, and
descriptor vocabulary the numpy chapter introduced. Their shapes
and dtypes come from the model file and are reported through
:attr:`~ml.Model.input_shape` / :attr:`~ml.Model.output_shape` and
:attr:`~ml.Model.input_dtype` / :attr:`~ml.Model.output_dtype`.

Quantization
------------

Most networks the cam runs operate on quantized integer tensors --
``int8`` or ``uint8`` -- to fit within the cam's RAM and compute
budget. A quantized tensor carries integer values that represent
real-valued numbers through a per-tensor scale and zero point:

.. math::

   \text{real} = \text{scale} \times (q - \text{zero_point})

.. math::

   q = \mathrm{round}(\text{real} / \text{scale}) + \text{zero_point}

The scale and zero point come from the model's training-time
calibration and are stored in the model file. They are exposed as
:attr:`~ml.Model.input_scale`,
:attr:`~ml.Model.input_zero_point`,
:attr:`~ml.Model.output_scale`, and
:attr:`~ml.Model.output_zero_point` -- each a list with one entry
per input or output tensor.

:func:`ml.utils.quantize` and :func:`ml.utils.dequantize` apply the
formulas against a specified output index::

    import ml.utils

    real_tensor = ml.utils.dequantize(model, q_tensor, index=0)
    q_tensor    = ml.utils.quantize(model, real_tensor, index=0)

Both functions return the value unchanged when the output dtype at
the given index is already float, so the call is safe regardless of
the model's quantization status.

What the script sees on the output side
---------------------------------------

What :meth:`~ml.Model.predict` returns depends on whether a
post-processor is registered.

With no post-processor, the engine's raw integer outputs are
*auto-dequantized* to float and returned as a list of float
:class:`~ulab.numpy.ndarray` objects. The script receives
real-valued numbers ready to read. This is the right path for
classification networks, whose single output tensor is already a
list of per-class confidence scores the application iterates over
-- no decoding step needed. It is also the easy path for getting
an unknown model running quickly or for ad-hoc inspection from the
REPL.

With a post-processor registered (through ``postprocess=`` on the
constructor or ``callback=`` on the predict call), the raw
quantized tensors are handed to the post-processor's callable
directly. The post-processor receives the raw quantized tensors
and is responsible for whatever dequantization it needs.

The split is a performance choice. Auto-dequantization allocates a
new float tensor for each output and walks every element. A
post-processor that only needs a few values from each tensor --
threshold the confidence scores, then decode boxes for the
survivors -- skips the cost of dequantizing the rest. The box
decoders shipped under :mod:`ml.postprocessing` all take this
route, and :func:`ml.utils.threshold` is built for exactly this
case: it takes a quantized score tensor and returns the indices
whose dequantized values pass a real-valued threshold, without
dequantizing the whole tensor.