Showing posts with label Regex. Show all posts
Showing posts with label Regex. Show all posts

Thursday, November 23, 2023

 # A Complete Guide to Regular Expressions

Regular expressions, often referred to as regex or regexp, provide a powerful and flexible way to search, match, and manipulate text. They are widely used in programming, text processing, and data validation. This guide aims to provide a comprehensive overview of regular expressions, including syntax, common patterns, and practical examples.


## Table of Contents

  1. Introduction to Regular Expressions
  2. Basic Syntax and Matching
  3. Metacharacters
  4. Quantifiers
  5. Character Classes
  6. Anchors
  7. Grouping and Capturing
  8. Alternation
  9. Escape Characters
  10. Examples and Use Cases


### 1. Introduction to Regular Expressions

Regular expressions are patterns that describe sets of strings. They are used for pattern matching within strings. A regular expression is composed of ordinary characters and special characters called metacharacters.


### 2. Basic Syntax and Matching

- `.` (dot): Matches any single character except a newline.
- `^`: Anchors the regex at the start of a line.
- `$`: Anchors the regex at the end of a line.

Example:
^Hello$


This regex matches the string "Hello" only if it appears at the beginning of a line.

### 3. Metacharacters

- `*`: Matches zero or more occurrences of the preceding character or group.
- `+`: Matches one or more occurrences of the preceding character or group.
- `?`: Matches zero or one occurrence of the preceding character or group.

Example:
\d+ 

This regex matches one or more digits.

### 4. Quantifiers

- `{n}`: Matches exactly n occurrences of the preceding character or group.
- `{n,}`: Matches n or more occurrences of the preceding character or group.
- `{n,m}`: Matches between n and m occurrences of the preceding character or group.

Example:
\w{3,6}

This regex matches word characters (alphanumeric + underscore) with a length between 3 and 6.

### 5. Character Classes

- `\d`: Matches any digit (0-9).
- `\w`: Matches any word character (alphanumeric + underscore).
- `\s`: Matches any whitespace character (space, tab, newline).

Example:
[A-Za-z]\d{2}

This regex matches an uppercase or lowercase letter followed by two digits.

### 6. Anchors

- `\b`: Word boundary.
- `\B`: Non-word boundary.
- `^` (caret) and `$` (dollar): Match the start and end of a line, respectively.

Example:
\bword\b

This regex matches the word "word" as a whole word.

### 7. Grouping and Capturing

- `()`: Groups characters together.
- `(?:)`: Non-capturing group.

Example:
(\d{3})-(\d{2})


This regex captures a three-digit group, a hyphen, and a two-digit group.

### 8. Alternation

- `|`: Acts like a logical OR.

Example:
cat|dog

This regex matches either "cat" or "dog".

### 9. Escape Characters

- `\`: Escapes a metacharacter, treating it as a literal character.

Example:
\d\.\d

This regex matches a digit followed by a literal dot and another digit.

### 10. Examples and Use Cases

- Email Validation:
  ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

- URL Matching:
  ^(https?|ftp)://[^\s/$.?#].[^\s]*$

- Phone Number Matching:
  ^\+?[1-9]\d{1,14}$

- Extracting Date from Text:
  (\d{4})-(\d{2})-(\d{2})

- HTML Tag Matching:
  <([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>(.*?)</\1>


Remember that while regular expressions are powerful, they can be complex. It's essential to test and validate them thoroughly.

This guide provides a foundation for understanding regular expressions, but there is much more to explore. Practice and experimentation are key to mastering this powerful tool.