6.20. Serial protocols, framing, and CRCs¶
UART in code moved bytes between two ends. By itself, that is not enough to build a reliable link. Three problems show up the moment a real device is at the other end of the wire:
Where does a message start and end? Bytes arrive in a stream with no built-in delimiter. If the receiver misses the first byte (powered up after the sender; brief electrical glitch on the line), every byte after it is off-by-one until the receiver finds a fresh resync point.
How long is each message? A 32-byte sensor reading and a 4-byte status reply look identical at the byte level. The receiver needs a way to know how many bytes belong to the current message.
Did the bytes arrive intact? Noise can flip individual bits. Without a check, the receiver happily acts on corrupted data.
The standard answer to all three is to wrap the data in a packet frame: a known byte sequence at the start, a length field, the payload itself, and a checksum at the end.
6.20.1. Packet framing¶
A typical framing format:
A framed packet with separate header and data CRCs: header (magic bytes), command, length, header CRC, payload, data CRC.¶
Each field does one job:
Header (magic bytes). A fixed, unusual byte sequence – often two bytes like
0xAA 0x55– that the receiver scans for in the incoming stream. When it finds the sequence, it knows a new packet is starting and can throw away any garbage that came before.Command. A single byte that says what the packet is. Different command values use different payload formats – one command might mean “set servo angle” with two payload bytes, another might mean “read sensor” with no payload, another might be “log message” with a string. The receiver dispatches on the command byte to know how to interpret the rest of the packet.
Length. Two bytes giving the size of the payload in bytes (little-endian here), allowing payloads up to about 64 KiB. The receiver reads exactly this many bytes once the header CRC has been verified.
Header CRC. A one-byte checksum over the HEADER, CMD, and LEN fields. The receiver checks it before reading any payload, so a corrupted LEN is caught after just a handful of bytes (see the CRC section below for why this matters).
Payload. Command-specific application data, exactly LEN bytes long. The format is determined by the command byte: a
struct-packed record of fixed-width fields, a string, raw memory – whatever both sides agree on for that command.Data CRC. A four-byte CRC over the payload bytes. The receiver re-computes it from the bytes it just read and drops the packet if it does not match.
6.20.2. CRCs¶
The simplest “checksum” is the sum of all the bytes, modulo 256 or 65536. It catches most single-bit flips but misses a lot of multi-bit errors and ignores byte ordering.
A cyclic redundancy check (CRC) is the standard upgrade. It treats the input as one long binary number and divides it (in a special way) by a fixed polynomial; the remainder of the division is the CRC. Different polynomials catch different classes of errors; the common 8-, 16- and 32-bit polynomials each catch every burst of errors shorter than their width plus a large fraction of longer bursts.
6.20.2.1. Why two CRCs¶
The packet diagram above carries two separate CRCs – one over the header (HEADER, CMD, LEN) and one over the payload. This is what a robust framing actually needs, because of how a single trailing CRC fails when the LEN field itself gets corrupted in transit:
The receiver acts on the corrupt LEN and reads that many bytes from the wire – possibly far more than the sender intended.
Only the trailing CRC eventually tells the receiver something went wrong, and only after all those bytes have been consumed.
While the parser is stuck waiting for the wrong number of bytes, real packets arriving behind the corrupt one get swallowed as payload, and the receiver loses several packets rather than just the one.
Splitting the CRC fixes this:
The header CRC covers HEADER, CMD, and LEN. The receiver checks it before reading any payload, so a corrupt LEN is caught after a handful of bytes and the parser resyncs immediately, taking down only the one bad packet.
The data CRC covers the payload. Once the header CRC has passed, the receiver knows it can trust LEN, reads exactly that many payload bytes, and verifies them against the data CRC.
A common sizing – and what this page uses – is one byte for the header CRC (a CRC-8 is plenty for a five-byte header) and four bytes for the data CRC (a CRC-32 covers many kilobytes of payload with a vanishingly low collision rate).
6.20.2.2. Helpers¶
MicroPython ships binascii.crc32() for the four-byte CRC
directly. For the one-byte header CRC, a small helper using
the polynomial Maxim’s 1-wire devices use (0x8C in
reflected form) is short enough to write inline:
def crc8(data: bytes) -> int:
crc = 0
for byte in data:
crc ^= byte
for _ in range(8):
crc = (crc >> 1) ^ 0x8C if crc & 1 else crc >> 1
return crc & 0xFF
A complete encoder combines the two CRCs in one function:
import binascii
import struct
def encode_packet(cmd: int, payload: bytes) -> bytes:
header = b"\xAA\x55" + bytes([cmd]) + struct.pack("<H", len(payload))
hcrc = crc8(header)
dcrc = binascii.crc32(payload)
return header + bytes([hcrc]) + payload + struct.pack("<I", dcrc)
The inverse function recovers the command and payload from a
complete packet, or returns None if either CRC check
fails:
def decode_packet(packet: bytes):
# Layout: HEADER(2) + CMD(1) + LEN(2) + HCRC(1) + PAYLOAD(LEN) + DCRC(4)
if len(packet) < 10 or packet[0:2] != b"\xAA\x55":
return None
header = packet[0:5]
if crc8(header) != packet[5]:
return None # header CRC mismatch
cmd = packet[2]
length = struct.unpack("<H", packet[3:5])[0]
if len(packet) != 6 + length + 4:
return None # truncated or oversized
payload = packet[6:6 + length]
received_dcrc = struct.unpack("<I", packet[6 + length:])[0]
if binascii.crc32(payload) != received_dcrc:
return None # data CRC mismatch
return cmd, bytes(payload)
In practice the receiver does not get a whole packet handed to
it – bytes arrive one at a time over the UART, and a sender
that pauses mid-packet (or a noisy line that loses a byte)
cannot just be read()-ed into a buffer of the right size.
The next section runs the same decode logic byte by byte as a
state machine.
6.20.3. A state-machine receiver¶
The receiver cannot just call uart.read(N) for some fixed
N – it does not know how many bytes the next packet will
be, and any junk on the line throws the alignment off. The
solution is a small state machine that consumes the bytes one
at a time and reacts based on where it is in the packet. The
main loop polls any() to see how many
bytes are buffered, drains them in one read()
call, and feeds each byte through the state machine:
import time
import binascii
import struct
from machine import UART
HEADER = b"\xAA\x55"
HEADER_LEN = len(HEADER)
# States, in the order the receiver walks through them per packet.
HUNT_FOR_HEADER = 0
READ_COMMAND = 1
READ_LENGTH = 2
READ_HEADER_CRC = 3
READ_PAYLOAD = 4
READ_DATA_CRC = 5
uart = UART(3, baudrate=115200)
# Receiver state plus partial-field buffers.
state = HUNT_FOR_HEADER
matched = 0 # bytes of HEADER matched so far
cmd = 0 # CMD captured in READ_COMMAND
length_bytes = bytearray() # raw LEN bytes (kept for the header CRC)
length = 0 # unpacked LEN
payload = bytearray() # payload bytes accumulated in READ_PAYLOAD
crc_bytes = bytearray() # DCRC bytes accumulated in READ_DATA_CRC
def handle_packet(cmd: int, payload: bytes) -> None:
print("cmd", cmd, "payload", payload)
while True:
# Drain whatever bytes have arrived since the last poll. Idle
# briefly when the line is quiet so the loop is not a busy spin.
n = uart.any()
if not n:
time.sleep_ms(1)
continue
for b in uart.read(n):
if state == HUNT_FOR_HEADER:
# Walk the magic bytes. On a mismatch, back off by one
# so a stray HEADER[0] in the noise still counts as a
# possible start.
if b == HEADER[matched]:
matched += 1
if matched == HEADER_LEN:
state = READ_COMMAND
matched = 0
else:
matched = 1 if b == HEADER[0] else 0
elif state == READ_COMMAND:
cmd = b
length_bytes = bytearray()
state = READ_LENGTH
elif state == READ_LENGTH:
# LEN is two bytes little-endian.
length_bytes.append(b)
if len(length_bytes) == 2:
length = struct.unpack("<H", length_bytes)[0]
state = READ_HEADER_CRC
elif state == READ_HEADER_CRC:
# Verify the CRC over HEADER + CMD + LEN before
# committing to read LEN payload bytes. A mismatch
# aborts here, after just five header bytes -- the
# next valid header re-syncs quickly.
expected = crc8(HEADER + bytes([cmd]) + length_bytes)
if b == expected:
payload = bytearray()
state = READ_PAYLOAD
else:
state = HUNT_FOR_HEADER
elif state == READ_PAYLOAD:
payload.append(b)
if len(payload) == length:
crc_bytes = bytearray()
state = READ_DATA_CRC
elif state == READ_DATA_CRC:
# Verify the CRC over the payload and either deliver
# the packet or drop it. Either way, go back to
# looking for the next header.
crc_bytes.append(b)
if len(crc_bytes) == 4:
expected = binascii.crc32(payload)
received = struct.unpack("<I", crc_bytes)[0]
if expected == received:
handle_packet(cmd, bytes(payload))
state = HUNT_FOR_HEADER
Each byte advances the state machine by one step, or falls
back to HUNT_FOR_HEADER after a complete packet, a bad
header CRC, or a bad data CRC. Junk on the line that does not
match the header is silently discarded; the next valid header
re-syncs the receiver. The key safety property comes from the
header CRC: if the LEN field is corrupted, the parser catches
it after the header-CRC check (a handful of bytes), not after
committing to read a wildly wrong number of payload bytes.
6.20.4. Beyond the baseline¶
The framing above is the minimum a serial link needs to recover from line noise: header magic, length, command, and two CRCs. It detects corruption and resyncs after garbled bytes, but it gives up on damaged packets rather than getting them through, and it leaves the sender with no idea what the receiver actually heard.
Real-world serial protocols layer features on top of that baseline. Not every embedded link needs all of them – pick what the application actually requires:
Sequence numbers. A small counter that increments on every send. The receiver detects gaps (a packet was lost), duplicates (the sender retransmitted but the receiver had already accepted the first copy), and – where the channel can reorder – out-of-order arrivals.
Acknowledgements. A dedicated ACK packet (or piggyback bit in a reply) the receiver sends back to confirm each packet. Without ACKs the sender has no way of knowing its data arrived.
Negative acknowledgements. A NACK sent when the receiver sees a CRC failure or a sequence gap. The sender retransmits immediately, instead of waiting for an ACK timeout to fire.
Retransmission. The sender keeps each unacked packet in a small queue and re-sends it after a timeout (or on a NACK). A retry limit and some backoff between retries stops a permanently broken link from looping forever.
Sliding windows. Allowing several packets in flight before requiring an ACK keeps throughput up on links where the round-trip is long compared to the per-packet send time. The cost is more sender-side state – one slot per in-flight packet.
Flow control. A signal from the receiver telling the sender to slow down or pause when its buffer is filling up. Implementations vary – explicit XON / XOFF bytes, credit-based grants where the receiver licences N more packets at a time, or the RTS / CTS hardware lines on the wire itself. Without flow control a fast sender eventually overruns a slow receiver and packets get dropped.
Protocol version. A version field early in the packet lets the format evolve. Each side can negotiate the highest version both support at startup, or reject packets from incompatible peers.
Fragmentation and reassembly. A two-byte LEN caps the packet at 64 KiB; messages larger than that get split into multiple packets and reassembled on the other side. The fragmentation metadata (fragment index, total count, or a “more fragments” flag) lives inside the payload.
Heartbeats. A small periodic packet that says “I’m still here”. The other side notices when the heartbeats stop and reconnects (or fails loudly) instead of hanging silently.
Channels. A channel or stream ID in the header so one physical link carries several logical streams – a control channel, a telemetry channel, a log channel – distinguished only by that field.
Authentication. A short tag computed from the payload and a secret value that only the legitimate sender and receiver know. The receiver computes the tag again from the bytes it received and rejects the packet if the two do not match. This catches both tampering (an attacker modified the bytes) and – if a sequence number or timestamp is part of what the tag covers – replay, where an attacker records a real packet off the wire and re-sends it later to make the receiver act on it twice.
Encryption. Scrambling the payload bytes with a shared secret key so anyone reading the line without that key sees only noise. Usually combined with the authentication tag above – without it, an attacker can feed garbage that happens to pass the CRC and the receiver wastes cycles trying to decrypt nonsense.
A typical “good” protocol for industrial gear ends up with framing, dual CRC, sequence numbers, ACK / NACK with retransmit, and heartbeats. Real-world examples worth a look: MAVLink (drone telemetry, with sequence numbers, system / component IDs, and optional packet signatures), Modbus (industrial PLCs, with function codes and CRC), and NMEA 0183 (the ASCII protocol every consumer GPS receiver speaks – line-based messages with a checksum after a star delimiter).