Text vs bytes ============= Python has two sequence types for raw character data: * :class:`str` -- a sequence of Unicode codepoints. Used for all human-readable text: file paths, log messages, JSON payloads. * :class:`bytes` -- a sequence of integers in the range 0 -- 255. Used for raw binary data: UART frames, image buffers, network packets, register values. They cannot be mixed without an explicit conversion. Passing a :class:`str` to a hardware ``write`` method raises :exc:`TypeError`, and the inverse is also rejected. .. figure:: ../figures/str-vs-bytes.svg :alt: A str of Unicode codepoints on the left and a bytes sequence of raw octets on the right, with encode and decode arrows between them. A :class:`str` stores Unicode characters; a :class:`bytes` stores raw octets. Going between them is *encoding* (str → bytes) and *decoding* (bytes → str). bytes literals -------------- A bytes literal is a string-like literal prefixed with ``b``: :: header = b"OMV" crlf = b"\r\n" payload = b"\x01\x02\x03" Only ASCII characters are allowed directly inside a bytes literal; non-ASCII values must be written as ``\xHH`` hex escapes. Encoding and decoding --------------------- * :meth:`str.encode` converts a string to bytes using a named encoding (default ``"utf-8"``). * :meth:`bytes.decode` does the reverse. :: >>> "hello".encode() b'hello' >>> "héllo".encode() b'h\xc3\xa9llo' # é is two bytes in UTF-8 >>> b"hello".decode() 'hello' UTF-8 is the default and the right choice for anything that might contain non-ASCII characters. Use ``"ascii"`` only when the data is guaranteed to be plain ASCII; that way a stray non-ASCII byte raises :exc:`UnicodeError` instead of silently passing through. Indexing and slicing -------------------- A bytes value behaves like a sequence of integers when indexed, not a sequence of one-byte strings: :: >>> data = b"abc" >>> data[0] 97 # the int 97, not 'a' >>> data[0:1] b'a' # slicing returns bytes A common mistake is comparing ``data[0] == "a"`` and being surprised it is :data:`False` -- ``data[0]`` is an integer, not a one-character string, so the two values can never match. ord and chr -- bridging characters and integers ----------------------------------------------- Because indexing a :class:`bytes` returns an integer but the rest of the program likely thinks in characters, Python provides two built-ins for moving between them: * :func:`ord` -- takes a one-character string and returns its integer codepoint. * :func:`chr` -- the inverse: given an integer, returns the one-character string for that codepoint. :: >>> ord("a") 97 >>> chr(97) 'a' >>> ord("A"), chr(0x41) (65, 'A') For ASCII characters the codepoint equals the byte value, so ``ord("a")`` and ``b"a"[0]`` both give 97. That makes byte comparisons read in terms of the character you actually care about: :: >>> data = b"abc" >>> data[0] == ord("a") # instead of the magic number 97 True And :func:`chr` is handy for logging or debugging when you want to see the printable form of a byte: :: >>> chr(data[0]) 'a' For non-ASCII characters :func:`ord` returns the Unicode codepoint, which is not the same as any single byte in the encoded form; the byte representation depends on the encoding. bytearray for mutable buffers ----------------------------- :class:`bytes` is immutable -- every "modification" returns a new object and leaves the original alone. For data you intend to modify, append to, or fill in piece by piece, use :class:`bytearray`. It holds the same content as :class:`bytes` but supports in-place mutation: :: >>> s = b"hello" >>> s[0] = ord("H") Traceback (most recent call last): File "", line 1, in TypeError: 'bytes' object does not support item assignment >>> s = bytearray(b"hello") >>> s[0] = ord("H") >>> s bytearray(b'Hello') Creating a bytearray ~~~~~~~~~~~~~~~~~~~~ The :class:`bytearray` constructor accepts several inputs: * ``bytearray(8)`` -- a buffer of 8 zero bytes. * ``bytearray(b"hello")`` -- a mutable copy of a bytes value. * ``bytearray("hello", "utf-8")`` -- a bytearray from a string, using the given encoding. * ``bytearray([72, 73, 74])`` -- a bytearray from a sequence of integers in 0 -- 255 (here, ``b"HIJ"``). :: >>> bytearray(4) bytearray(b'\x00\x00\x00\x00') >>> bytearray(b"abc") bytearray(b'abc') >>> bytearray("café", "utf-8") bytearray(b'caf\xc3\xa9') Modifying a bytearray ~~~~~~~~~~~~~~~~~~~~~ Indexed and sliced assignment work just like a :class:`list`: :: >>> buf = bytearray(8) # 8 zero bytes >>> buf[0] = 0xFF # one byte at a time >>> buf[1:4] = b"ABC" # replace a slice >>> buf bytearray(b'\xffABC\x00\x00\x00\x00') Individual bytes must be integers in 0 -- 255; assigning any other type raises :exc:`TypeError` or :exc:`ValueError`. Slice assignment can change the length of the buffer. Replacing a slice with a longer value grows the bytearray; replacing with a shorter value shrinks it. Replacing with ``b""`` deletes the slice entirely: :: >>> buf = bytearray(b"abcdef") >>> buf[1:3] = b"XYZ" # 2 bytes replaced with 3 >>> buf bytearray(b'aXYZdef') >>> buf[1:4] = b"" # delete the inserted run >>> buf bytearray(b'adef') The :meth:`bytearray.append` and :meth:`bytearray.extend` methods add bytes at the end without reallocating the whole buffer each time: :: >>> buf = bytearray() >>> buf.append(0x01) >>> buf.extend(b"abc") >>> buf bytearray(b'\x01abc') Reading from a bytearray ~~~~~~~~~~~~~~~~~~~~~~~~ Indexing, slicing, iteration, and the :class:`bytes` inspection methods (:meth:`bytes.startswith`, :meth:`bytes.find`, :meth:`bytes.strip`, etc.) all work the same as on a :class:`bytes` value. Indexing returns an integer; slicing returns another bytearray: :: >>> buf = bytearray(b"OpenMV") >>> buf[0] 79 >>> buf[0:4] bytearray(b'Open') >>> buf.startswith(b"Open") True Converting between bytes and bytearray ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :class:`bytes` and :class:`bytearray` convert to each other with their constructors. Use this when an API requires one form specifically: :: >>> ba = bytearray(b"hello") >>> snapshot = bytes(ba) # immutable copy >>> ba[0] = ord("H") >>> ba, snapshot (bytearray(b'Hello'), b'hello') memoryview for zero-copy slicing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slicing a :class:`bytes` or :class:`bytearray` normally copies the bytes into a new buffer. :class:`memoryview` exposes the same bytes *without* copying: :: >>> buf = bytearray(b"OpenMV Cam") >>> view = memoryview(buf) >>> view[0:6] # shares storage with buf >>> bytes(view[0:6]) # materialise as bytes when needed b'OpenMV' A view over a :class:`bytearray` is also writable -- mutating the view mutates the underlying buffer: :: >>> view[0] = ord("o") >>> buf bytearray(b'openMV Cam') Reach for :class:`memoryview` when copying a slice would be wasteful -- typically when the same large buffer is passed around or processed in pieces. For everyday string-style work on small bytes, plain slicing is fine.