Untitled — Python Coding — Nerchuko Academy

Email Validation with Regular Expressions

MEDIUM

Write a Python function to validate an email address using regular expressions. The function should return True if the email address is valid according to a common pattern, and False otherwise.

A common (simplified) pattern for an email is: username@domain.extension

username: Can contain letters (a-z, A-Z), numbers (0-9), periods (.), underscores (_), percent signs (%), plus signs (+), and hyphens (-).
domain: Can contain letters, numbers, and hyphens (-). It typically consists of one or more parts separated by dots.
extension: The top-level domain (TLD) usually consists of 2 or more letters (e.g., .com, .org, .co.uk).

Examples:

Input: "test.user+label@example.com"   Output: True
Input: "user@sub.domain.co.uk"       Output: True
Input: "invalid_email@"               Output: False
Input: "@domain.com"                 Output: False
Input: "user@domain"                 Output: False (missing .extension)
Input: "user@domain.c"               Output: False (extension too short)

Constraints:

The input will be a string.

Function Signature (Python):

import re

class Solution:
    def is_valid_email(self, email_address: str) -> bool:
        # Your code here
        pass

Solution: Email Validation with Regex

The Goal: We want to write a Python function that can look at a string and tell us if it looks like a valid email address (e.g., "name@example.com"). We'll use "regular expressions" (regex), which are special patterns that describe sequences of characters.

Creating a regex that perfectly matches all theoretically valid email addresses according to official standards (RFCs) is incredibly complex. For interviews, a reasonably robust pattern that covers common email formats is usually sufficient.

Approach: Using Python's `re` module and a Regex Pattern

The process involves:

Defining a regular expression pattern that describes the structure of a common email address.
Using a function from Python's re module (like re.fullmatch()) to test if the input email string conforms to this pattern.

Constructing the Regex Pattern:

A common pattern structure is username@domain.extension.

^ : Matches the beginning of the string.
Username part: [a-zA-Z0-9._%+-]+
- [a-zA-Z0-9._%+-]: A character class allowing lowercase letters, uppercase letters, digits, period, underscore, percent, plus, or hyphen.
- +: Matches one or more occurrences of the preceding character class.
@ : Matches the literal "@" symbol.
Domain name part: [a-zA-Z0-9.-]+
- [a-zA-Z0-9.-]: A character class allowing letters, digits, period, or hyphen. Note that the hyphen is usually placed at the end or beginning of a character class, or escaped, to avoid being interpreted as a range.
- +: Matches one or more occurrences. This allows for subdomains (e.g., mail.example).
\. : Matches a literal dot (period). The backslash escapes the dot, as . by itself is a special regex character meaning "any character".
Extension (TLD) part: [a-zA-Z]{2,}
- [a-zA-Z]: Allows only letters for the TLD.
- {2,}: Matches two or more occurrences of letters (e.g., "com", "org", "co", "info").
$ : Matches the end of the string.

Combining these gives the pattern: r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$". The r"" denotes a raw string, which is good practice for regex patterns to avoid issues with backslashes.

import re

class Solution:
    def is_valid_email(self, email_address: str) -> bool:
        # Regular expression for validating an Email
        # ^ : Start of string
        # [a-zA-Z0-9._%+-]+ : Username part (one or more of allowed characters)
        # @ : Literal "@" symbol
        # [a-zA-Z0-9.-]+ : Domain name part (one or more of allowed characters for domain)
        # \. : Literal dot (escaped)
        # [a-zA-Z]{2,} : Top-level domain (TLD), 2 or more letters
        # $ : End of string
        pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
        
        # re.fullmatch() checks if the entire string matches the pattern.
        # It returns a match object if there is a match, None otherwise.
        if re.fullmatch(pattern, email_address):
            return True
        else:
            return False

# Example Usage:
# sol = Solution()
# emails_to_test = [
#     "test.user+label@example.com",
#     "user@sub.domain.co.uk",
#     "simple@domain.com",
#     "invalid_email@",
#     "@domain.com",
#     "user@domain",
#     "user@domain.c",
#     "user@domain..com" # Invalid due to double dot in domain by this regex
# ]
# for email in emails_to_test:
#     print(f"'{email}': {sol.is_valid_email(email)}")

Complexity Analysis:

Time Complexity: For most practical regex patterns and input strings, the time complexity of matching with Python's re module (which uses a backtracking NFA engine) can be considered roughly O(L) on average, where L is the length of the input string. However, poorly constructed regex patterns (especially those with nested quantifiers and backtracking, known as "catastrophic backtracking") can lead to exponential time complexity in worst-case scenarios. The provided pattern is generally well-behaved.

Space Complexity: O(1) for this specific implementation if we don't count the storage for the pattern string itself or the input string. The regex engine might use some space during matching, but it's typically not proportional to the input length for simple patterns.

Key Takeaways for Interviews:

Acknowledge Regex Complexity: Start by mentioning that a truly RFC-compliant email regex is very complex and usually not expected in an interview. State that you'll provide a pattern for common cases.
Break Down the Pattern: Explain each part of your regex (username, @, domain, TLD, anchors). This shows you understand how it works.
Use re.fullmatch() or Anchors: Emphasize the need to match the entire string. re.fullmatch() is ideal for this. If using re.match(), ensure your pattern starts with ^ and ends with $. re.search() finds a match anywhere, so it's usually not what you want for full string validation.
Raw Strings (r""): Use raw strings for regex patterns to avoid issues with backslashes being interpreted as Python escape sequences.
Discuss Limitations: Be ready to discuss what your regex doesn't cover (e.g., quoted usernames, IP addresses as domains, very new TLDs if your TLD length is too restrictive, comments in emails, etc.). This shows a deeper understanding. For example, the pattern [a-zA-Z0-9.-]+ for the domain might allow -- or a domain starting/ending with - or ., which are invalid. A more robust domain part might be ([a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,} but this is getting much more complex.
Alternative (No Regex): Briefly, one could use string methods (split('@'), check parts), but it becomes much more cumbersome and error-prone than a well-crafted regex for this task.

Email Validation with Regular Expressions

Examples:

Constraints:

Function Signature (Python):

Related Python Concepts

Hint

Solution: Email Validation with Regex

Approach: Using Python's `re` module and a Regex Pattern

Constructing the Regex Pattern:

Complexity Analysis:

Email Validation with Regular Expressions

Examples:

Constraints:

Function Signature (Python):

Related Python Concepts

Hint

Solution: Email Validation with Regex

Approach: Using Python's re module and a Regex Pattern

Constructing the Regex Pattern:

Complexity Analysis:

Approach: Using Python's `re` module and a Regex Pattern