Skip to content

gomills/email_generator_from_name

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Random Email Generator

A Python library for generating realistic, human-like email addresses from names and surnames. The generator applies various transformations and patterns to create diverse email variations that appear natural and authentic.

Please, note: this is actually code from a spaghetti coded script of mine that I fastly wrote in an afternoon for another side project. But since it's actually really curious and useful, I decided to make a minimal refactoring so that other people can benefit from it. Yes, it has some spaghetti coded architecture and optimizations necessary, but I leave that to every developer's needs. Right now it works quite good and that's what matters.

Examples

Sample output for "John Doe" (50 generated emails. This classification was done afterwards, the code won't discriminate on those!):

Classic Format Abbreviated Numeric Variations Creative Mutations
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]

Pattern Analysis

  • Separators: ., _, - (no separator)
  • Abbreviations: Single letters, truncated names
  • Numeric additions: Years, random digits, prefixes/suffixes
  • Mutations: Letter substitution (o→0), duplication (bb, xx), vowel dropping
  • Domains: gmail.com, icloud.com, yahoo.com, outlook.com

Features

  • Human-like Output: Generates emails that feel authentic rather than bot-generated
  • Multiple Transformations: Applies various mutations including leetspeak, vowel dropping, random separators, and digit insertion
  • Configurable Settings: Customizable generation parameters through settings
  • Collision Analysis: Built-in testing framework to analyze duplicate rates

Installation

# Clone the repository
git clone <repository-url>
cd random_email_gen

# Install dependencies
uv sync

Usage

from src.generate_rd_email import generate_email

# Generate a single email
email = generate_email("john", "doe")
print(email)  # Example: [email protected]

Statistical Analysis

Duplicate Rate Testing

Comprehensive testing was conducted to analyze the generator's collision rates across different scales:

Sample Size Runs Duplicate Rate Estimated Unique Space
100 10 0.5% ~1.0×10⁴
1,000 10 3.32% ~1.5×10⁴
10,000 10 9.6% ~4.9×10⁴
100,000 10 11.6% ~4.0×10⁵
1,000,000 10 8.45% ~5.6×10⁶

Key Findings

1. Non-Uniform Distribution

The generator exhibits structured, template-based behavior rather than purely random generation. The effective unique space (S) grows with sample size (k), indicating:

  • Multiple generation templates with varying probabilities
  • Deterministic suffix rules and character set limitations
  • Scale-dependent collision-avoidance mechanisms

2. Scale-Dependent Behavior

Small Scale (100-1,000 emails):

  • Very low collision rates (0.5-3.32%)
  • Large apparent unique space
  • Template diversity dominates

Medium Scale (10,000-100,000 emails):

  • Higher collision rates (9.6-11.6%)
  • Template limitations become apparent
  • Combinatorial constraints emerge

Large Scale (1,000,000 emails):

  • Stabilized collision rate (~8.45%)
  • Systematic generation strategy evident
  • Numeric suffix cycling dominates

3. Design Trade-offs

The constrained generation space is intentional and represents a deliberate design choice:

Advantages:

  • Every generated email appears authentically human
  • Avoids bot-like patterns
  • Maintains realistic email formatting conventions

⚠️ Limitations:

  • Finite unique space (~10⁴-10⁶ combinations per name pair)
  • Higher collision rates at scale
  • Not suitable for applications requiring unlimited unique addresses

Technical Architecture

Core Components

  • generate_rd_email.py: Main email generation engine
  • random_choices.py: Transformation and mutation functions
  • settings.py: Configuration parameters
  • testing.py: Performance testing utilities

Generation Pipeline

  1. Name Processing: Normalize and prepare input names
  2. Separator Selection: Choose joining characters (., _, etc.)
  3. Mutation Application: Apply random transformations:
    • Leetspeak conversion (e → 3, a → 4)
    • Vowel dropping
    • Random symbol insertion
    • Digit/s prepending/appending
    • Letter duplication
  4. Domain Assignment: Select from realistic domain pool with determined probability weights
  5. Validation: Ensure output meets heuristic standards

Conclusion

This generator prioritizes quality over quantity, producing emails that consistently pass the "human test" while acknowledging the inherent trade-off with uniqueness at scale. The structured approach ensures reliable, realistic output suitable for testing, demos, and applications where authentic-looking email addresses are more valuable than unlimited uniqueness.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages