Serial protocols, framing, and CRCs =================================== :doc:`uart-code` moved bytes between two ends. By itself, that is not enough to build a reliable link. Three problems show up the moment a real device is at the other end of the wire: * **Where does a message start and end?** Bytes arrive in a stream with no built-in delimiter. If the receiver misses the first byte (powered up after the sender; brief electrical glitch on the line), every byte after it is off-by-one until the receiver finds a fresh resync point. * **How long is each message?** A 32-byte sensor reading and a 4-byte status reply look identical at the byte level. The receiver needs a way to know how many bytes belong to the current message. * **Did the bytes arrive intact?** Noise can flip individual bits. Without a check, the receiver happily acts on corrupted data. The standard answer to all three is to wrap the data in a packet frame: a known byte sequence at the start, a length field, the payload itself, and a checksum at the end. Packet framing -------------- A typical framing format: .. figure:: ../figures/packet-frame.svg :alt: Six fields drawn in sequence: a two-byte HEADER labelled 0xAA 0x55, a one-byte CMD field selecting which command this packet carries, a two-byte LEN field giving the payload size, a one-byte HCRC field covering HEADER plus CMD plus LEN, a variable-length PAYLOAD of LEN bytes whose format depends on the CMD, and a four-byte DCRC field covering the payload. A framed packet with separate header and data CRCs: header (magic bytes), command, length, header CRC, payload, data CRC. Each field does one job: * **Header (magic bytes).** A fixed, unusual byte sequence -- often two bytes like ``0xAA 0x55`` -- that the receiver scans for in the incoming stream. When it finds the sequence, it knows a new packet is starting and can throw away any garbage that came before. * **Command.** A single byte that says *what* the packet is. Different command values use different payload formats -- one command might mean "set servo angle" with two payload bytes, another might mean "read sensor" with no payload, another might be "log message" with a string. The receiver dispatches on the command byte to know how to interpret the rest of the packet. * **Length.** Two bytes giving the size of the payload in bytes (little-endian here), allowing payloads up to about 64 KiB. The receiver reads exactly this many bytes once the header CRC has been verified. * **Header CRC.** A one-byte checksum over the HEADER, CMD, and LEN fields. The receiver checks it before reading any payload, so a corrupted LEN is caught after just a handful of bytes (see the CRC section below for why this matters). * **Payload.** Command-specific application data, exactly LEN bytes long. The format is determined by the command byte: a :mod:`struct`-packed record of fixed-width fields, a string, raw memory -- whatever both sides agree on for that command. * **Data CRC.** A four-byte CRC over the payload bytes. The receiver re-computes it from the bytes it just read and drops the packet if it does not match. CRCs ---- The simplest "checksum" is the sum of all the bytes, modulo 256 or 65536. It catches most single-bit flips but misses a lot of multi-bit errors and ignores byte ordering. A *cyclic redundancy check* (CRC) is the standard upgrade. It treats the input as one long binary number and divides it (in a special way) by a fixed *polynomial*; the remainder of the division is the CRC. Different polynomials catch different classes of errors; the common 8-, 16- and 32-bit polynomials each catch every burst of errors shorter than their width plus a large fraction of longer bursts. Why two CRCs ~~~~~~~~~~~~ The packet diagram above carries *two* separate CRCs -- one over the header (HEADER, CMD, LEN) and one over the payload. This is what a robust framing actually needs, because of how a single trailing CRC fails when the LEN field itself gets corrupted in transit: * The receiver acts on the corrupt LEN and reads that many bytes from the wire -- possibly far more than the sender intended. * Only the trailing CRC eventually tells the receiver something went wrong, and only after all those bytes have been consumed. * While the parser is stuck waiting for the wrong number of bytes, real packets arriving behind the corrupt one get swallowed as payload, and the receiver loses several packets rather than just the one. Splitting the CRC fixes this: * The **header CRC** covers HEADER, CMD, and LEN. The receiver checks it before reading any payload, so a corrupt LEN is caught after a handful of bytes and the parser resyncs immediately, taking down only the one bad packet. * The **data CRC** covers the payload. Once the header CRC has passed, the receiver knows it can trust LEN, reads exactly that many payload bytes, and verifies them against the data CRC. A common sizing -- and what this page uses -- is one byte for the header CRC (a CRC-8 is plenty for a five-byte header) and four bytes for the data CRC (a CRC-32 covers many kilobytes of payload with a vanishingly low collision rate). Helpers ~~~~~~~ MicroPython ships :func:`binascii.crc32` for the four-byte CRC directly. For the one-byte header CRC, a small helper using the polynomial Maxim's 1-wire devices use (``0x8C`` in reflected form) is short enough to write inline: :: def crc8(data: bytes) -> int: crc = 0 for byte in data: crc ^= byte for _ in range(8): crc = (crc >> 1) ^ 0x8C if crc & 1 else crc >> 1 return crc & 0xFF A complete encoder combines the two CRCs in one function: :: import binascii import struct def encode_packet(cmd: int, payload: bytes) -> bytes: header = b"\xAA\x55" + bytes([cmd]) + struct.pack(" None: print("cmd", cmd, "payload", payload) while True: # Drain whatever bytes have arrived since the last poll. Idle # briefly when the line is quiet so the loop is not a busy spin. n = uart.any() if not n: time.sleep_ms(1) continue for b in uart.read(n): if state == HUNT_FOR_HEADER: # Walk the magic bytes. On a mismatch, back off by one # so a stray HEADER[0] in the noise still counts as a # possible start. if b == HEADER[matched]: matched += 1 if matched == HEADER_LEN: state = READ_COMMAND matched = 0 else: matched = 1 if b == HEADER[0] else 0 elif state == READ_COMMAND: cmd = b length_bytes = bytearray() state = READ_LENGTH elif state == READ_LENGTH: # LEN is two bytes little-endian. length_bytes.append(b) if len(length_bytes) == 2: length = struct.unpack("