De-mystifying Regex: How to Match Special Characters Literally in Python
Here's how to escape regex strings in Python to match these characters literally:
Using Backslashes (\)
The most common way to escape characters in a regex string is to use a backslash (\
) before the character you want to match literally. Here are some examples:
import re
# Matching a literal dot
text = "This is a string. Here is another string."
pattern = r"\." # Escaping the dot to match a literal dot
match = re.search(pattern, text)
if match:
print(f"Match found at index {match.start()}")
else:
print("No match found")
# Matching a literal dollar sign
text = "The cost is $10."
pattern = r"\$" # Escaping the dollar sign to match a literal dollar sign
match = re.search(pattern, text)
if match:
print(f"Match found at index {match.start()}")
else:
print("No match found")
# Matching a literal asterisk
text = "Hello*world"
pattern = r"\*" # Escaping the asterisk to match a literal asterisk
match = re.search(pattern, text)
if match:
print(f"Match found at index {match.start()}")
else:
print("No match found")
In these examples, the characters .
$
, and *
are escaped with a backslash, making them literal characters to be matched in the text.
Using Raw Strings ( r prefix )
Another way to avoid the need for escaping characters is to use raw strings. A raw string is prefixed with the letter r
before the opening quotation mark. In a raw string, backslashes are interpreted literally, and their special meaning in regular expressions is ignored.
text = "This is a string. Here is another string."
pattern = r"This is a string\. Here is another string."
match = re.search(pattern, text)
if match:
print(f"Match found at index {match.start()}")
else:
print("No match found")
Here, the entire pattern string is defined as a raw string using the r
prefix. This eliminates the need to escape the dot (.
) character.
Choosing the method:
Both methods, using backslashes and raw strings, achieve the same result of matching characters literally. Using backslashes offers more flexibility as you can escape individual characters within the string. However, raw strings can improve readability, especially for longer patterns with many special characters.
python regex