.. currentmodule:: ml.apps :mod:`ml.apps` --- ML Apps ========================== .. module:: ml.apps :synopsis: ML Apps The `ml.apps` module contains high-level ML application classes built on top of `ml.Model`. .. _apps.MicroSpeech: class MicroSpeech -- Speech Recognition --------------------------------------- The MicroSpeech object recognizes simple spoken words using the MicroSpeech model from TensorFlow Lite for Microcontrollers. The default model recognizes ``"Yes"`` and ``"No"``. .. class:: MicroSpeech(preprocessor: ml.Model = None, micro_speech: ml.Model = None, labels: list[str] = None, **kwargs) Creates a MicroSpeech object. ``preprocessor`` is the audio preprocessor `ml.Model`. If ``None``, ``/rom/audio_preprocessor.tflite`` is loaded. ``micro_speech`` is the speech recognition `ml.Model`. If ``None``, ``/rom/micro_speech.tflite`` is loaded. ``labels`` is a list of label strings matching the model output categories. If ``None``, the labels are taken from ``micro_speech.labels``. Any additional keyword arguments are forwarded to `audio.init()` (the audio peripheral is initialized with ``channels=1``, ``frequency=16000``, and ``samples=320``). .. method:: audio_callback(buf: bytes) -> None Internal audio streaming callback. Appends new samples from ``buf`` into the rolling audio buffer, updates the spectrogram by running the ``preprocessor`` model on the latest window, and updates the prediction history by running the ``micro_speech`` model on the spectrogram. Not normally called directly. .. method:: start_audio_streaming() -> None Clears the spectrogram and prediction history, then starts audio streaming with `MicroSpeech.audio_callback` as the callback. No-op if streaming is already started. .. method:: stop_audio_streaming() -> None Stops audio streaming. .. method:: listen(timeout: int = 0, callback: callable = None, threshold: float = 0.65, filter: list[str] = ["Yes", "No"]) -> tuple[str, numpy.ndarray] Listens for a spoken word and returns a tuple of ``(label, average_scores)`` once a label whose averaged score is above ``threshold`` and is contained in ``filter`` is detected. Calls `MicroSpeech.start_audio_streaming` if not already streaming. ``timeout`` is the maximum time in milliseconds to listen. If ``0``, listens indefinitely until a word is recognized. If ``-1``, runs in non-blocking mode and returns immediately with ``(None, average_scores)`` if no word is recognized; audio streaming is left running. For any positive value, listens for that many milliseconds and then returns ``(None, average_scores)`` on timeout. ``callback`` is an optional callable invoked as ``callback(label, average_scores)`` when a word is recognized instead of returning. Combined with ``timeout=0``, this allows continuous recognition. ``threshold`` is the minimum averaged confidence required to accept a recognition. ``filter`` is the list of label strings to accept. Recognitions outside this list are ignored. .. attribute:: MicroSpeech._SLICE_SIZE :type: int Number of features per spectrogram slice (``40``). .. attribute:: MicroSpeech._SLICE_COUNT :type: int Number of spectrogram slices stored (``49``). .. attribute:: MicroSpeech._SLICE_TIME_MS :type: int Time span of one slice in milliseconds (``30``). .. attribute:: MicroSpeech._AUDIO_FREQUENCY :type: int Audio sample rate in Hz (``16000``). .. attribute:: MicroSpeech._SAMPLES_PER_STEP :type: int Audio samples per 10 ms step (``160``). .. attribute:: MicroSpeech._CATEGORY_COUNT :type: int Number of output categories (``4``). .. attribute:: MicroSpeech._AVERAGE_WINDOW_SAMPLES :type: int Number of prediction frames averaged over the smoothing window (``1020 // _SLICE_TIME_MS``).