# A Complete Guide to Regular Expressions
Regular expressions, often referred to as regex or regexp, provide a powerful and flexible way to search, match, and manipulate text. They are widely used in programming, text processing, and data validation. This guide aims to provide a comprehensive overview of regular expressions, including syntax, common patterns, and practical examples.
## Table of Contents
- Introduction to Regular Expressions
- Basic Syntax and Matching
- Metacharacters
- Quantifiers
- Character Classes
- Anchors
- Grouping and Capturing
- Alternation
- Escape Characters
- Examples and Use Cases
### 1. Introduction to Regular Expressions
Regular expressions are patterns that describe sets of strings. They are used for pattern matching within strings. A regular expression is composed of ordinary characters and special characters called metacharacters.
### 2. Basic Syntax and Matching
- `.` (dot): Matches any single character except a newline.
- `^`: Anchors the regex at the start of a line.
- `$`: Anchors the regex at the end of a line.
Example:
^Hello$
This regex matches the string "Hello" only if it appears at the beginning of a line.
### 3. Metacharacters
- `*`: Matches zero or more occurrences of the preceding character or group.
- `+`: Matches one or more occurrences of the preceding character or group.
- `?`: Matches zero or one occurrence of the preceding character or group.
Example:
\d+
This regex matches one or more digits.
### 4. Quantifiers
- `{n}`: Matches exactly n occurrences of the preceding character or group.
- `{n,}`: Matches n or more occurrences of the preceding character or group.
- `{n,m}`: Matches between n and m occurrences of the preceding character or group.
Example:
\w{3,6}
This regex matches word characters (alphanumeric + underscore) with a length between 3 and 6.
### 5. Character Classes
- `\d`: Matches any digit (0-9).
- `\w`: Matches any word character (alphanumeric + underscore).
- `\s`: Matches any whitespace character (space, tab, newline).
Example:
[A-Za-z]\d{2}
This regex matches an uppercase or lowercase letter followed by two digits.
### 6. Anchors
- `\b`: Word boundary.
- `\B`: Non-word boundary.
- `^` (caret) and `$` (dollar): Match the start and end of a line, respectively.
Example:
\bword\b
This regex matches the word "word" as a whole word.
### 7. Grouping and Capturing
- `()`: Groups characters together.
- `(?:)`: Non-capturing group.
Example:
(\d{3})-(\d{2})
This regex captures a three-digit group, a hyphen, and a two-digit group.
### 8. Alternation
- `|`: Acts like a logical OR.
Example:
cat|dog
This regex matches either "cat" or "dog".
### 9. Escape Characters
- `\`: Escapes a metacharacter, treating it as a literal character.
Example:
\d\.\d
This regex matches a digit followed by a literal dot and another digit.
### 10. Examples and Use Cases
- Email Validation:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
- URL Matching:
^(https?|ftp)://[^\s/$.?#].[^\s]*$
- Phone Number Matching:
^\+?[1-9]\d{1,14}$
- Extracting Date from Text:
(\d{4})-(\d{2})-(\d{2})
- HTML Tag Matching:
<([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>(.*?)</\1>
Remember that while regular expressions are powerful, they can be complex. It's essential to test and validate them thoroughly.
This guide provides a foundation for understanding regular expressions, but there is much more to explore. Practice and experimentation are key to mastering this powerful tool.