Regular Expressions, Searching, Replacing, Building

What is a Regular Expression

  • A regular expression (regex) is a pattern that matches a set of strings, consisting of operators, constructs literal characters and meta-characters.

📌 grep command supports tree regex syntaxes. Check some basics usage here.

  • Regular expressions are powerful, used in areas like search engines, programming languages and text processing applications.

Examples

# Find all email addresses in a file using grep
grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" filename.txt

# Python
r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"

# Javascript
/^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

#MySQL
SELECT * FROM `users` WHERE `email` NOT REGEXP '^[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$';

Searching with RegEx

  • There a 4 primary components into a regular expression:

    • character classes

    • quantifiers and alternation

    • anchors

    • roots and anchors

Replacing with RegEx

  • Replacing text with regular expressions varies between implementations.

    • Examples with the regexr site tool:

Tips on Building RegEx

  • Regular expressions are very powerful but are not appropriate for every problem.

  • Regular expressions are greedy (they'll match as much as they can).

    • Add and ? after * or + too make the match lazy (match the minimum possible)

  • Don't write an entire regex all at once.

    • Build a piece - test it - repeat

    • Use multiple, simpler, smaller expressions

  • Test with valid and invalid data - ensure the regex matches only what you want it to match.

  • Add comments using x modifier.


Last updated

Was this helpful?