regex

Links

See also:
- regex (pypi, code)
  - A new regex implementation intended eventually to replace Python's current re module implementation.
regular-expressions.info
pymotw
Std Lib
- Regular Expression HOWTO
Regexp Syntax Summary
- Compares Python, Perl and others.
http://www.regular-expressions.info/
Regex objects
Match objects

Code	Meaning
`\d`	a digit
`\D`	a non-digit
`\s`	whitespace (tab, space, newline, etc.)
`\S`	non-whitespace
`\w`	alphanumeric
`\W`	non-alphanumeric

Anchoring

Code	Meaning
`^`	start of string, or line
`$`	end of string, or line
`\A`	start of string
`\Z`	end of string
`\b`	empty string at the beginning or end of a word
`\B`	empty string not at the beginning or end of a word

Syntax	Description
`(?P<name>...)`	Capture group with name "name". To refer to this in the same regex, use `(?P=name)` and to refer to it in a substitution, use `\g<name>`
`(?=...)`	(Positive) Lookahead assertion: Matches if `...` matches next, but doesn’t consume any of the string.
`(?!...)`	Negative Lookahead assertion: Matches if `...` does not match next, but doesn’t consume any of the string.
`(?<=...)`	Positive lookbehind assertion: Succeeds only when the current position is preceded by a match for `...`. The contained `...` must only match strings of some fixed length, meaning that abc or a
`(?<!...)`	Negative lookbehind assertion: Similar to the Positive lookbehind assertion but requires `...` to not precede the current position in the string.

Match objects

m.groupdict()

Named capture groups

Specifying in a regular expression
- to match specified pattern: (?P<name>pattern)
- backreferences / match a previously defined group: (?P=name)
Specifying in a replacement string: \g<name>

Specifing flags / compilation options.

Ref: Compilation Flags

To specify flags inline in the regex, prefix the regex with (?FLAGS).
- e.g. For case insensitive matching, prefix with (?i).

Flag	Meaning
DOTALL, S	Make . match any character, including newlines
IGNORECASE, I	Do case-insensitive matches
LOCALE, L	Do a locale-aware match
MULTILINE, M	Multi-line matching, affecting ^ and $
VERBOSE, X	Enable verbose REs, which can be organized more cleanly and understandably.
UNICODE, U	Makes several escapes like \w, \b, \s and \d dependent on the Unicode character database.

Snippets

text = "There are 24 hours in a day, 7 days in a week, 4 weeks in a month"

r = re.compile(r'\d+')

m = r.search("text")

c = itertools.count(1)
re.sub(r'\d+', lambda m: str(c.next()), in_this_text)
re.sub(r'index = (?P<counter>\d+)', lambda m: "index = {0}".format(c.next()), in_this_text)


# Verbose / multiline regex.
regex = re.compile(ur"""
    \$ (?:
      (?P<name>\w+) |
      # this is incorrect - it doesn't handle } inside the expression.
      \{(?P<expression>[^}]+)\}
    )
    """, re.VERBOSE)

# You can also use the (?x) flag instead of using re.VERBOSE
regex = re.compile(ur"""
    (?x)
    Verbose multiline regex # comment
    """)