:mod:`re` --- simple regular expressions ======================================== .. module:: re :synopsis: regular expressions This module implements regular expression operations. Regular expression syntax supported is a subset of CPython ``re`` module (and actually is a subset of POSIX extended regular expressions). Supported operators and special sequences are: ``.`` Match any character. ``[...]`` Match set of characters. Individual characters and ranges are supported, including negated sets (e.g. ``[^a-c]``). ``^`` Match the start of the string. ``$`` Match the end of the string. ``?`` Match zero or one of the previous sub-pattern. ``*`` Match zero or more of the previous sub-pattern. ``+`` Match one or more of the previous sub-pattern. ``??`` Non-greedy version of ``?``, match zero or one, with the preference for zero. ``*?`` Non-greedy version of ``*``, match zero or more, with the preference for the shortest match. ``+?`` Non-greedy version of ``+``, match one or more, with the preference for the shortest match. ``|`` Match either the left-hand side or the right-hand side sub-patterns of this operator. ``(...)`` Grouping. Each group is capturing (a substring it captures can be accessed with `match.group()` method). ``(?:...)`` Non-capturing grouping. Each group is matched using the same rules as regular grouping, but will not be part of the match object. ``\d`` Matches digit. Equivalent to ``[0-9]``. ``\D`` Matches non-digit. Equivalent to ``[^0-9]``. ``\s`` Matches whitespace. Equivalent to ``[ \t-\r]``. ``\S`` Matches non-whitespace. Equivalent to ``[^ \t-\r]``. ``\w`` Matches "word characters" (ASCII only). Equivalent to ``[A-Za-z0-9_]``. ``\W`` Matches non "word characters" (ASCII only). Equivalent to ``[^A-Za-z0-9_]``. ``\`` Escape character. Any other character following the backslash, except for those listed above, is taken literally. For example, ``\*`` is equivalent to literal ``*`` (not treated as the ``*`` operator). Note that ``\r``, ``\n``, etc. are not handled specially, and will be equivalent to literal letters ``r``, ``n``, etc. Due to this, it's not recommended to use raw Python strings (``r""``) for regular expressions. For example, ``r"\r\n"`` when used as the regular expression is equivalent to ``"rn"``. To match CR character followed by LF, use ``"\r\n"``. **NOT SUPPORTED**: * counted repetitions (``{m,n}``) * named groups (``(?P...)``) * more advanced assertions (``\b``, ``\B``) * special character escapes like ``\r``, ``\n`` - use Python's own escaping instead * etc. Example:: import re # As re doesn't support escapes itself, use of r"" strings is not # recommended. regex = re.compile("[\r\n]") regex.split("line1\rline2\nline3\r\n") # Result: # ['line1', 'line2', 'line3', '', ''] Functions --------- .. function:: compile(regex_str: str, flags: int = 0) -> "regex" Compile regular expression, return `regex ` object. .. function:: search(regex_str: str, string: str) -> "match | None" Compile *regex_str* and search it in a *string*. Unlike `match`, this will search string for first position which matches regex (which still may be 0 if regex is anchored). .. function:: sub(regex_str: str, replace: Union[str, Callable], string: str, count: int = 0, flags: int = 0, /) -> str Compile *regex_str* and search for it in *string*, replacing all matches with *replace*, and returning the new string. *replace* can be a string or a function. If it is a string then escape sequences of the form ``\`` and ``\g`` can be used to expand to the corresponding group (or an empty string for unmatched groups). If *replace* is a function then it must take a single argument (the match) and should return a replacement string. If *count* is specified and non-zero then substitution will stop after this many substitutions are made. The *flags* argument is ignored. .. _regex: Regex objects ------------- Compiled regular expression. Instances of this class are created using `re.compile()`. .. class:: regex :no-index: Compiled regular expression object returned by `re.compile()`. .. method:: match(string: str, pos: int = 0, endpos: int | None = None) -> "match | None" Apply this compiled regex to *string*, anchored at the start of the search region, and return a `match ` object, or ``None`` if the regex does not match. This is the compiled-pattern equivalent of the module-level :class:`match`, and is much more efficient when the same pattern is applied to multiple strings. The optional parameter *pos* gives an index in *string* where the search is to start; it defaults to ``0``. This is not completely equivalent to slicing the string; the ``'^'`` pattern character matches at the real beginning of the string and at positions just after a newline, but not necessarily at the index where the search is to start. The optional parameter *endpos* limits how far *string* is searched; it will be as if the string is *endpos* characters long, so only the characters from *pos* to ``endpos - 1`` are searched. If *endpos* is ``None`` (the default) the whole string is searched. .. method:: search(string: str, pos: int = 0, endpos: int | None = None) -> "match | None" Scan through *string* looking for the first location where this compiled regex produces a match, and return a `match ` object, or ``None`` if no position matches. This is the compiled-pattern equivalent of the module-level :func:`search` function, and is much more efficient when the same pattern is applied to multiple strings. The optional parameter *pos* gives an index in *string* where the search is to start; it defaults to ``0``. This is not completely equivalent to slicing the string; the ``'^'`` pattern character matches at the real beginning of the string and at positions just after a newline, but not necessarily at the index where the search is to start. The optional parameter *endpos* limits how far *string* is searched; it will be as if the string is *endpos* characters long, so only the characters from *pos* to ``endpos - 1`` are searched. If *endpos* is ``None`` (the default) the whole string is searched. .. method:: sub(replace: Union[str, Callable], string: str, count: int = 0, flags: int = 0, /) -> str Search for this compiled regex in *string*, replacing all matches with *replace*, and return the new string. This is the compiled-pattern equivalent of the module-level :func:`sub` function, and is much more efficient when the same pattern is applied to multiple strings. *replace* can be a string or a function. If it is a string then escape sequences of the form ``\`` and ``\g`` can be used to expand to the corresponding group (or an empty string for unmatched groups). If *replace* is a function then it must take a single argument (the match) and should return a replacement string. If *count* is specified and non-zero then substitution will stop after this many substitutions are made. The *flags* argument is ignored. .. method:: split(string: str, max_split: int = -1, /) -> List[str] Split a *string* using regex. If *max_split* is given, it specifies maximum number of splits to perform. Returns list of strings (there may be up to *max_split+1* elements if it's specified). Match objects ------------- A match object holds the result of a successful match. .. class:: match(regex_str: str, string: str) Match *regex_str* against *string*, anchored at the start of the string, and return a match object, or ``None`` if it does not match. This is the module-level shortcut: ``re.match(regex_str, string)`` is equivalent to ``re.compile(regex_str).match(string)``. The same match object is also returned by `search()`, by the compiled-pattern `regex.match()` / `regex.search()` methods, and is passed to the replacement function used by `sub()`. .. method:: group(index: int) -> str Return matching (sub)string. *index* is 0 for entire match, 1 and above for each capturing group. Only numeric groups are supported.