A Python library for generating realistic, human-like email addresses from names and surnames. The generator applies various transformations and patterns to create diverse email variations that appear natural and authentic.
Please, note: this is actually code from a spaghetti coded script of mine that I fastly wrote in an afternoon for another side project. But since it's actually really curious and useful, I decided to make a minimal refactoring so that other people can benefit from it. Yes, it has some spaghetti coded architecture and optimizations necessary, but I leave that to every developer's needs. Right now it works quite good and that's what matters.
Sample output for "John Doe" (50 generated emails. This classification was done afterwards, the code won't discriminate on those!):
- Separators:
.,_,-(no separator) - Abbreviations: Single letters, truncated names
- Numeric additions: Years, random digits, prefixes/suffixes
- Mutations: Letter substitution (
o→0), duplication (bb,xx), vowel dropping - Domains: gmail.com, icloud.com, yahoo.com, outlook.com
- Human-like Output: Generates emails that feel authentic rather than bot-generated
- Multiple Transformations: Applies various mutations including leetspeak, vowel dropping, random separators, and digit insertion
- Configurable Settings: Customizable generation parameters through settings
- Collision Analysis: Built-in testing framework to analyze duplicate rates
# Clone the repository
git clone <repository-url>
cd random_email_gen
# Install dependencies
uv syncfrom src.generate_rd_email import generate_email
# Generate a single email
email = generate_email("john", "doe")
print(email) # Example: [email protected]Comprehensive testing was conducted to analyze the generator's collision rates across different scales:
| Sample Size | Runs | Duplicate Rate | Estimated Unique Space |
|---|---|---|---|
| 100 | 10 | 0.5% | ~1.0×10⁴ |
| 1,000 | 10 | 3.32% | ~1.5×10⁴ |
| 10,000 | 10 | 9.6% | ~4.9×10⁴ |
| 100,000 | 10 | 11.6% | ~4.0×10⁵ |
| 1,000,000 | 10 | 8.45% | ~5.6×10⁶ |
The generator exhibits structured, template-based behavior rather than purely random generation. The effective unique space (S) grows with sample size (k), indicating:
- Multiple generation templates with varying probabilities
- Deterministic suffix rules and character set limitations
- Scale-dependent collision-avoidance mechanisms
Small Scale (100-1,000 emails):
- Very low collision rates (0.5-3.32%)
- Large apparent unique space
- Template diversity dominates
Medium Scale (10,000-100,000 emails):
- Higher collision rates (9.6-11.6%)
- Template limitations become apparent
- Combinatorial constraints emerge
Large Scale (1,000,000 emails):
- Stabilized collision rate (~8.45%)
- Systematic generation strategy evident
- Numeric suffix cycling dominates
The constrained generation space is intentional and represents a deliberate design choice:
✅ Advantages:
- Every generated email appears authentically human
- Avoids bot-like patterns
- Maintains realistic email formatting conventions
- Finite unique space (~10⁴-10⁶ combinations per name pair)
- Higher collision rates at scale
- Not suitable for applications requiring unlimited unique addresses
generate_rd_email.py: Main email generation enginerandom_choices.py: Transformation and mutation functionssettings.py: Configuration parameterstesting.py: Performance testing utilities
- Name Processing: Normalize and prepare input names
- Separator Selection: Choose joining characters (., _, etc.)
- Mutation Application: Apply random transformations:
- Leetspeak conversion (e → 3, a → 4)
- Vowel dropping
- Random symbol insertion
- Digit/s prepending/appending
- Letter duplication
- Domain Assignment: Select from realistic domain pool with determined probability weights
- Validation: Ensure output meets heuristic standards
This generator prioritizes quality over quantity, producing emails that consistently pass the "human test" while acknowledging the inherent trade-off with uniqueness at scale. The structured approach ensures reliable, realistic output suitable for testing, demos, and applications where authentic-looking email addresses are more valuable than unlimited uniqueness.