Skip to content

fix: add graceful error handling for VirusTotal invalid domains#111

Merged
jaspermayone merged 3 commits into
mainfrom
claude/issue-37-20250629_140833
Jun 29, 2025
Merged

fix: add graceful error handling for VirusTotal invalid domains#111
jaspermayone merged 3 commits into
mainfrom
claude/issue-37-20250629_140833

Conversation

@jaspermayone
Copy link
Copy Markdown
Member

@jaspermayone jaspermayone commented Jun 29, 2025

Summary

  • Add null checks in parseData.ts to prevent crashes when VirusTotal returns null
  • Add domain validation before VirusTotal API calls using existing validateDomain utility
  • Enable proper error logging with informative warning messages
  • Apply consistent domain validation to both check and report methods

Test plan

  • Test with valid domains (should work as before)
  • Test with invalid domains (should skip VT call and log warning)
  • Test with unresolvable domains (should handle gracefully)
  • Verify no crashes occur in parseData.ts
  • Check logs for transparency warnings

Fixes #37

Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved reliability by adding extra checks to prevent errors when analyzing virus scan data with missing information.
    • Enhanced error handling during domain checks and reports, ensuring invalid domains are safely skipped and users experience fewer disruptions.
  • Chores
    • Updated domain validation to include DNS resolution for more accurate domain verification.
    • Streamlined domain sanitization by making it synchronous across multiple services, improving performance and consistency.

- Add null checks in parseData.ts to prevent crashes when VirusTotal returns null
- Add domain validation before VirusTotal API calls using existing validateDomain utility  
- Enable proper error logging with informative warning messages
- Apply consistent domain validation to both check and report methods

Fixes #37 - prevents app crashes when domains are invalid or unresolvable

Co-authored-by: Jasper Mayone <jaspermayone@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 29, 2025 14:15
@jaspermayone jaspermayone requested a review from a team as a code owner June 29, 2025 14:15
@github-actions github-actions Bot added the bug label Jun 29, 2025
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 29, 2025

Claude finished @jaspermayone's task —— View job


Code Review for VirusTotal Error Handling PR

Todo List:

  • Read CLAUDE.md for repository guidelines (none found)
  • Analyze changes in src/func/domain/parseData.ts
  • Analyze changes in src/services/VirusTotal.ts
  • Review code quality and best practices
  • Check for potential bugs and issues
  • Evaluate performance considerations
  • Assess security concerns
  • Review test coverage and suggestions
  • Provide comprehensive feedback

📋 Review Summary

Overall, this is a solid defensive programming improvement that addresses crash-prone scenarios with VirusTotal API integration. The changes are well-structured and follow security best practices.

✅ Code Quality & Best Practices

Excellent improvements:

  • Proper null safety: The null checks in parseData.ts:50-56 prevent runtime crashes when VirusTotal returns unexpected data structures
  • Input validation: Domain validation in VirusTotal.ts:25-33 and VirusTotal.ts:70-78 prevents malformed inputs from reaching the API
  • Consistent error handling: Both check() and report() methods now validate domains using the same pattern
  • Graceful degradation: The service returns null instead of crashing, allowing other security checks to continue

Code style observations:

  • The error handling pattern is consistent and follows TypeScript best practices
  • Good use of the existing DomainValidationError class for typed error handling
  • Warning logs provide good visibility for debugging without being overly verbose

🐛 Potential Issues

Minor concerns:

  1. Database insertion still occurs on error (VirusTotal.ts:45-49): If the API call succeeds but the domain is later found to be invalid during processing, the raw data is still stored. Consider validating earlier.

  2. Error message consistency: The warning messages use different formats:

    • Line 29: "VirusTotal: Skipping invalid domain "${domain}": ${error.message}"
    • Line 74: "VirusTotal: Skipping report for invalid domain "${domain}": ${error.message}"

    Consider using a consistent format for better log parsing.

  3. Async/await inconsistency (VirusTotal.ts:89-120): The report() method uses .catch() chaining while check() uses try/catch. Consider using try/catch consistently.

⚡ Performance Considerations

Positive impacts:

  • Reduced API calls: Invalid domains are filtered out before expensive HTTP requests
  • Early validation: Domain validation happens before sanitization and API calls, saving resources
  • Non-blocking errors: The service doesn't throw exceptions that could interrupt the processing pipeline

Suggestions:

  • The validation regex DOMAIN_REGEX in sanitizeDomain.ts:2 is relatively efficient for the validation scope

🔒 Security Assessment

Security improvements:

  • Input sanitization: Domain validation prevents potential injection attacks via malformed domain inputs
  • URL prevention: The validation explicitly blocks URL-like inputs (sanitizeDomain.ts:24-30)
  • Error information leakage: Warning logs are appropriately vague and don't expose sensitive API details

Security best practices followed:

  • No sensitive data logged in error messages
  • Graceful failure modes that don't expose internal state
  • Input validation before external API calls

🧪 Test Coverage Recommendations

The PR mentions a test plan but I don't see automated tests. Consider adding:

  1. Unit tests for parseData.ts:

    describe('parseData with null virusTotalData', () => {
      it('should handle null virusTotalData gracefully', async () => {
        const result = await parseData(mockWalshyData, ..., null, ...);
        expect(result).toBe(false); // or expected behavior
      });
    });
  2. Unit tests for VirusTotal.ts:

    describe('VirusTotal domain validation', () => {
      it('should skip invalid domains and log warning', async () => {
        const consoleSpy = jest.spyOn(console, 'warn');
        const result = await virusTotalService.domain.check('invalid-domain!@#');
        expect(result).toBeNull();
        expect(consoleSpy).toHaveBeenCalledWith(expect.stringContaining('Skipping invalid domain'));
      });
    });
  3. Integration tests for the complete flow with various domain types (valid, invalid, malformed)

🎯 Final Recommendations

  1. Consider extracting validation: The domain validation pattern is repeated. Consider creating a higher-order function:

    private async withDomainValidation<T>(domain: string, operation: (sanitized: string) => Promise<T>): Promise<T | null>
  2. Add JSDoc for new behavior: Document the null return behavior in the method documentation.

  3. Consider rate limiting awareness: With better error handling, ensure the service doesn't retry invalid domains unnecessarily.

🏆 Overall Assessment

This PR successfully addresses the crash issues mentioned in #37. The implementation is defensive, secure, and maintains backward compatibility. The changes follow established patterns in the codebase and improve system reliability.

Approval Status: ✅ Recommended - This is a solid improvement that makes the system more robust without introducing new risks.


@jaspermayone
Copy link
Copy Markdown
Member Author

@claude do a dns lookup as part of the domain validation

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 29, 2025

Walkthrough

The changes add defensive checks and domain validation before making VirusTotal API calls. The VirusTotalService methods now validate domains and skip API calls for invalid ones, logging warnings instead of throwing errors. Additionally, safer property access is implemented in data parsing to prevent runtime exceptions from undefined nested properties.

Changes

File(s) Change Summary
src/services/VirusTotal.ts Added synchronous domain sanitization and validation before API calls; improved error handling and logging.
src/func/domain/parseData.ts Added defensive null/undefined checks for nested property access in VirusTotal data parsing.
src/utils/sanitizeDomain.ts Changed validateDomain to async with DNS resolution check; throws error if domain does not resolve.
src/services/* (AbuseCh, GoogleSafebrowsing, IpQualityScore, PhishReport, SecurityTrails, SinkingYahts, UrlScan, Walshy) Removed unnecessary await from sanitizeDomain calls, treating it as synchronous in all these services.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant VirusTotalService
    participant Logger

    Client->>VirusTotalService: check(domain)
    VirusTotalService->>VirusTotalService: sanitizeDomain(domain)
    VirusTotalService->>VirusTotalService: validateDomain(domain)
    alt Domain invalid
        VirusTotalService->>Logger: log warning
        VirusTotalService-->>Client: return null
    else Domain valid
        VirusTotalService->>VirusTotalService: call VirusTotal API
        VirusTotalService-->>Client: return result
    end
Loading

Assessment against linked issues

Objective Addressed Explanation
Skip VirusTotal API call if domain is invalid/unresolvable (#37)
App should continue running without crashing on invalid domains (#37)
Log a warning or note when VirusTotal is skipped for invalid domains (#37)
Ensure VT integration is wrapped in error handling (try-catch or equivalent) (#37)
Update database schema to make VT data optional (#37) No schema/database changes observed in the provided code changes.

Poem

A hop and a skip, I checked the domain—
If it's not valid, I won't complain!
No more crashes, no more dread,
Just a gentle warning logged instead.
With careful checks, my code’s robust—
VirusTotal calls only when I trust!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @jaspermayone, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and error handling of the VirusTotal integration. It introduces proactive domain validation to prevent unnecessary API calls for invalid inputs and adds defensive programming with null checks to avoid crashes when processing potentially incomplete data from the VirusTotal API. The changes ensure more graceful degradation and clearer logging for problematic scenarios.

Highlights

  • Improved Robustness: Added null and undefined checks within parseData.ts to gracefully handle cases where VirusTotal data might be incomplete or missing, preventing potential crashes when accessing nested properties like last_analysis_stats.
  • Pre-API Call Domain Validation: Implemented domain validation using the existing validateDomain utility in the VirusTotalService's check and report methods. This ensures that API calls to VirusTotal are only made for valid domains, skipping calls and logging warnings for malformed inputs.
  • Graceful Error Handling & Logging: Enhanced error handling in VirusTotalService to log informative warning messages for invalid domains or API errors, rather than throwing exceptions or silently failing. This provides better transparency and prevents the application from crashing due to external service issues or bad input.
  • Consistent Application of Validation: Applied the new domain validation logic consistently across both the check and report functionalities within the VirusTotalService for uniform behavior.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 29, 2025

Claude finished @jaspermayone's task —— View job


Enhanced Domain Validation with DNS Lookup ✅

I've successfully enhanced the domain validation to include DNS lookup functionality as requested.

Changes Made:

1. Enhanced Domain Validation (src/utils/sanitizeDomain.ts):

  • Added DNS lookup using Node.js built-in dns.promises.resolve()
  • Made validateDomain async to handle DNS resolution
  • Enhanced error handling for DNS resolution failures
  • Domain validation now includes:
    • Length validation (1-253 characters)
    • Format validation using regex
    • URL component prevention
    • NEW: DNS lookup verification - ensures domain actually exists

2. Updated VirusTotal Service (src/services/VirusTotal.ts):

  • Made validation calls async with await validateDomain()
  • Proper error handling for DNS resolution failures
  • Graceful logging when domains don't resolve

3. Code Quality Improvements:

  • Fixed incorrect await usage with synchronous sanitizeDomain() across 9 service files
  • Improved code consistency and clarity

How It Works:

When a domain is validated, it now:

  1. Checks basic format and length requirements
  2. Performs DNS lookup to verify the domain exists
  3. Only proceeds with VirusTotal API calls for domains that actually resolve
  4. Logs informative warnings for domains that don't resolve

This prevents unnecessary API calls and provides better error handling for invalid/non-existent domains. The enhancement has been committed to the PR branch at commit 8b73e77.


Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the robustness and error handling of the VirusTotal integration. The addition of explicit null checks in parseData.ts prevents potential crashes from incomplete API responses, and the proactive domain validation in VirusTotal.ts for both check and report methods is a crucial improvement, preventing unnecessary external API calls for invalid inputs. The refined error logging also contributes to better transparency and debugging.

Comment thread src/func/domain/parseData.ts
// FIXME: this is terrible practice, handle ratelimits better after issue addressed.
// console.error(`VirusTotal API error for domain ${domain}:`, error);
// Log the error for transparency, but don't throw to prevent crashes
console.warn(`VirusTotal API error for domain "${domain}":`, error instanceof Error ? error.message : error);

Check failure

Code scanning / CodeQL

Use of externally-controlled format string High

Format string depends on a
user-provided value
.

Copilot Autofix

AI 11 months ago

To fix the issue, the format string in the console.warn statement on line 54 of src/services/VirusTotal.ts should be updated to use the %s specifier for the domain parameter. This ensures that the domain is treated as a string and prevents any format specifiers in the user-controlled input from being interpreted.

Steps:

  1. Replace the format string `VirusTotal API error for domain "${domain}":` with "VirusTotal API error for domain '%s':".
  2. Pass the domain parameter as an argument to the format string.

No additional imports or dependencies are required for this fix.


Suggested changeset 1
src/services/VirusTotal.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/services/VirusTotal.ts b/src/services/VirusTotal.ts
--- a/src/services/VirusTotal.ts
+++ b/src/services/VirusTotal.ts
@@ -53,3 +53,3 @@
         // Log the error for transparency, but don't throw to prevent crashes
-        console.warn(`VirusTotal API error for domain "${domain}":`, error instanceof Error ? error.message : error);
+        console.warn("VirusTotal API error for domain '%s':", domain, error instanceof Error ? error.message : error);
         return null;
EOF
@@ -53,3 +53,3 @@
// Log the error for transparency, but don't throw to prevent crashes
console.warn(`VirusTotal API error for domain "${domain}":`, error instanceof Error ? error.message : error);
console.warn("VirusTotal API error for domain '%s':", domain, error instanceof Error ? error.message : error);
return null;
Copilot is powered by AI and may make mistakes. Always verify output.
@jaspermayone jaspermayone committed this autofix suggestion 11 months ago.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances error resilience by validating and sanitizing domains before VirusTotal API calls and preventing null-related crashes in data parsing.

  • Add and apply validateDomain with graceful skips and warnings for invalid domains in both check and report methods
  • Switch sanitizeDomain calls from async to sync (removed await)
  • Add null checks in parseData.ts to guard against missing VirusTotal data

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/services/VirusTotal.ts Introduced validateDomain checks, adjusted sanitizeDomain usage, and improved error logging
src/func/domain/parseData.ts Added nested null checks around virusTotalData to prevent crashes
Comments suppressed due to low confidence (2)

src/services/VirusTotal.ts:22

  • sanitizeDomain previously returned a promise (used with await). Removing await means sanitizedDomain is now a Promise instead of a string. Add back await before sanitizeDomain(domain).
        const sanitizedDomain = sanitizeDomain(domain);

src/services/VirusTotal.ts:67

  • Same issue here: sanitizeDomain returns a promise and needs await to produce a string. Please restore await sanitizeDomain(domain).
      const sanitizedDomain = sanitizeDomain(domain);

Comment on lines +50 to +55
virusTotalData &&
virusTotalData.data &&
virusTotalData.data.attributes &&
virusTotalData.data.attributes.last_analysis_stats &&
(virusTotalData.data.attributes.last_analysis_stats.malicious > 0 ||
virusTotalData.data.attributes.last_analysis_stats.suspicious > 0)
Copy link

Copilot AI Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This long chain of null checks can be simplified using optional chaining (e.g., virusTotalData?.data?.attributes?.last_analysis_stats?.malicious > 0).

Suggested change
virusTotalData &&
virusTotalData.data &&
virusTotalData.data.attributes &&
virusTotalData.data.attributes.last_analysis_stats &&
(virusTotalData.data.attributes.last_analysis_stats.malicious > 0 ||
virusTotalData.data.attributes.last_analysis_stats.suspicious > 0)
virusTotalData?.data?.attributes?.last_analysis_stats?.malicious > 0 ||
virusTotalData?.data?.attributes?.last_analysis_stats?.suspicious > 0

Copilot uses AI. Check for mistakes.
…rolled format string

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Jasper Mayone <me@jaspermayone.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 29, 2025

Claude finished @jaspermayone's task —— View job


Code Review for VirusTotal Error Handling PR

Todo List:

  • Read CLAUDE.md for repository guidelines (none found)
  • Analyze changes in src/func/domain/parseData.ts
  • Analyze changes in src/services/VirusTotal.ts
  • Review code quality and best practices
  • Check for potential bugs and issues
  • Evaluate performance considerations
  • Assess security concerns
  • Review test coverage and suggestions
  • Provide comprehensive feedback

📋 Review Summary

Overall, this is a solid defensive programming improvement that addresses crash-prone scenarios with VirusTotal API integration. The changes are well-structured and follow security best practices.

✅ Code Quality & Best Practices

Excellent improvements:

  • Proper null safety: The null checks in parseData.ts:50-56 prevent runtime crashes when VirusTotal returns unexpected data structures
  • Input validation: Domain validation in VirusTotal.ts:25-33 and VirusTotal.ts:70-78 prevents malformed inputs from reaching the API
  • Consistent error handling: Both check() and report() methods now validate domains using the same pattern
  • Graceful degradation: The service returns null instead of crashing, allowing other security checks to continue

Code style observations:

  • The error handling pattern is consistent and follows TypeScript best practices
  • Good use of the existing DomainValidationError class for typed error handling
  • Warning logs provide good visibility for debugging without being overly verbose

🐛 Potential Issues

Minor concerns:

  1. Database insertion still occurs on error (VirusTotal.ts:45-49): If the API call succeeds but the domain is later found to be invalid during processing, the raw data is still stored. Consider validating earlier.

  2. Error message consistency: The warning messages use different formats:

    • Line 29: "VirusTotal: Skipping invalid domain "${domain}": ${error.message}"
    • Line 74: "VirusTotal: Skipping report for invalid domain "${domain}": ${error.message}"

    Consider using a consistent format for better log parsing.

  3. Async/await inconsistency (VirusTotal.ts:89-120): The report() method uses .catch() chaining while check() uses try/catch. Consider using try/catch consistently.

  4. Optional chaining suggestion (parseData.ts:50-56): The nested null checks could be simplified using TypeScript's optional chaining:

    virusTotalData?.data?.attributes?.last_analysis_stats?.malicious > 0 ||
    virusTotalData?.data?.attributes?.last_analysis_stats?.suspicious > 0

⚡ Performance Considerations

Positive impacts:

  • Reduced API calls: Invalid domains are filtered out before expensive HTTP requests
  • Early validation: Domain validation happens before sanitization and API calls, saving resources
  • Non-blocking errors: The service doesn't throw exceptions that could interrupt the processing pipeline

Suggestions:

  • The validation regex DOMAIN_REGEX in sanitizeDomain.ts:2 is relatively efficient for the validation scope

🔒 Security Assessment

Security improvements:

  • Input sanitization: Domain validation prevents potential injection attacks via malformed domain inputs
  • URL prevention: The validation explicitly blocks URL-like inputs (sanitizeDomain.ts:24-30)
  • Error information leakage: Warning logs are appropriately vague and don't expose sensitive API details

Security best practices followed:

  • No sensitive data logged in error messages
  • Graceful failure modes that don't expose internal state
  • Input validation before external API calls

⚠️ Security Note: GitHub Advanced Security flagged the use of externally-controlled format strings in the console.warn statements (VirusTotal.ts:29,74). While the risk is minimal due to domain sanitization, consider using parameterized logging if available.

🧪 Test Coverage Recommendations

The PR mentions a test plan but I don't see automated tests. Consider adding:

  1. Unit tests for parseData.ts:

    describe('parseData with null virusTotalData', () => {
      it('should handle null virusTotalData gracefully', async () => {
        const result = await parseData(mockWalshyData, ..., null, ...);
        expect(result).toBe(false); // or expected behavior
      });
    });
  2. Unit tests for VirusTotal.ts:

    describe('VirusTotal domain validation', () => {
      it('should skip invalid domains and log warning', async () => {
        const consoleSpy = jest.spyOn(console, 'warn');
        const result = await virusTotalService.domain.check('invalid-domain!@#');
        expect(result).toBeNull();
        expect(consoleSpy).toHaveBeenCalledWith(expect.stringContaining('Skipping invalid domain'));
      });
    });
  3. Integration tests for the complete flow with various domain types (valid, invalid, malformed)

🎯 Final Recommendations

  1. Consider extracting validation: The domain validation pattern is repeated. Consider creating a higher-order function:

    private async withDomainValidation<T>(domain: string, operation: (sanitized: string) => Promise<T>): Promise<T | null>
  2. Add JSDoc for new behavior: Document the null return behavior in the method documentation.

  3. Consider rate limiting awareness: With better error handling, ensure the service doesn't retry invalid domains unnecessarily.

🏆 Overall Assessment

This PR successfully addresses the crash issues mentioned in #37. The implementation is defensive, secure, and maintains backward compatibility. The changes follow established patterns in the codebase and improve system reliability.

Approval Status: ✅ Recommended - This is a solid improvement that makes the system more robust without introducing new risks.


Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
src/services/VirusTotal.ts (1)

110-120: Use /votes endpoint instead of /comments for vote submissions

The VirusTotal API v3 requires a POST to /domains/{domain}/votes when casting a vote, not /comments. Update the second axios call accordingly:

• File: src/services/VirusTotal.ts
Lines: ~110–120

- axios.post(
-   `https://www.virustotal.com/api/v3/domains/${sanitizedDomain}/comments`,
-   voteData,
-   { headers: headersWithVirusTotal }
- )
+ axios.post(
+   `https://www.virustotal.com/api/v3/domains/${sanitizedDomain}/votes`,
+   voteData,
+   { headers: headersWithVirusTotal }
+ )

Reference: https://docs.virustotal.com/reference/domain-votes-post

🧹 Nitpick comments (1)
src/func/domain/parseData.ts (1)

50-55: Simplify with optional chaining for better readability.

The defensive null checks are excellent for preventing runtime errors when VirusTotal returns incomplete data. However, this can be simplified using optional chaining for better readability.

Apply this diff to use optional chaining:

-  } else if (
-    virusTotalData &&
-    virusTotalData.data &&
-    virusTotalData.data.attributes &&
-    virusTotalData.data.attributes.last_analysis_stats &&
-    (virusTotalData.data.attributes.last_analysis_stats.malicious > 0 ||
-      virusTotalData.data.attributes.last_analysis_stats.suspicious > 0)
-  ) {
+  } else if (
+    (virusTotalData?.data?.attributes?.last_analysis_stats?.malicious > 0 ||
+      virusTotalData?.data?.attributes?.last_analysis_stats?.suspicious > 0)
+  ) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf6ebb8 and d1b4333.

📒 Files selected for processing (2)
  • src/func/domain/parseData.ts (1 hunks)
  • src/services/VirusTotal.ts (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/services/VirusTotal.ts (1)
src/utils/sanitizeDomain.ts (3)
  • sanitizeDomain (40-40)
  • validateDomain (40-40)
  • DomainValidationError (40-40)
🪛 Biome (1.9.4)
src/func/domain/parseData.ts

[error] 50-53: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🪛 GitHub Check: CodeQL
src/services/VirusTotal.ts

[failure] 54-54: Use of externally-controlled format string
Format string depends on a user-provided value.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: claude-review
  • GitHub Check: Codacy Security Scan
🔇 Additional comments (3)
src/services/VirusTotal.ts (3)

6-6: LGTM: Clean import of domain validation utilities.

The import properly brings in the necessary utilities for domain sanitization and validation with appropriate error handling.


22-33: Excellent domain validation implementation with proper error handling.

The validation logic correctly sanitizes domains first, then validates them before making API calls. The graceful handling of DomainValidationError by logging warnings and returning null prevents unnecessary API calls for invalid domains.


67-78: Consistent domain validation implementation.

Good implementation that mirrors the validation logic from the check method, ensuring consistent behavior across both API methods.

Comment thread src/services/VirusTotal.ts
- Add DNS lookup to validateDomain using Node.js dns.promises.resolve()
- Make validateDomain async to handle DNS resolution properly
- Update VirusTotal service to await async domain validation
- Fix incorrect await usage with synchronous sanitizeDomain across services
- Improve error handling for DNS resolution failures
- Add informative warnings when domains don't resolve

This enhancement prevents unnecessary API calls for non-existent domains
and provides better validation by checking if domains actually exist.

Co-authored-by: Jasper Mayone <jaspermayone@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (1)
src/services/IpQualityScore.ts (1)

42-46: Fix duplicate database insert.

The same data is being inserted twice into the database, which appears to be unintentional and could cause data duplication issues.

-      await db.insert(rawAPIData).values({
-        sourceAPI: "IpQualityScore",
-        domain: dbDomain!.id!,
-        data: data,
-      });
♻️ Duplicate comments (1)
src/services/VirusTotal.ts (1)

54-54: Fix format string vulnerability in logging.

Despite past review comments indicating this was addressed, the static analysis tool correctly identifies a format string vulnerability. The domain variable is embedded directly in the template string, which could allow format string attacks.

-        console.warn(`VirusTotal API error for domain "${domain}":`, error instanceof Error ? error.message : error);
+        console.warn(`VirusTotal API error for domain:`, { domain, error: error instanceof Error ? error.message : error });
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3de9af and 8b73e77.

📒 Files selected for processing (10)
  • src/services/AbuseCh.ts (1 hunks)
  • src/services/GoogleSafebrowsing.ts (1 hunks)
  • src/services/IpQualityScore.ts (1 hunks)
  • src/services/PhishReport.ts (1 hunks)
  • src/services/SecurityTrails.ts (1 hunks)
  • src/services/SinkingYahts.ts (1 hunks)
  • src/services/UrlScan.ts (1 hunks)
  • src/services/VirusTotal.ts (4 hunks)
  • src/services/Walshy.ts (2 hunks)
  • src/utils/sanitizeDomain.ts (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (7)
src/services/AbuseCh.ts (1)
src/utils/sanitizeDomain.ts (1)
  • sanitizeDomain (49-49)
src/services/Walshy.ts (1)
src/utils/sanitizeDomain.ts (1)
  • sanitizeDomain (49-49)
src/services/UrlScan.ts (1)
src/utils/sanitizeDomain.ts (1)
  • sanitizeDomain (49-49)
src/services/SinkingYahts.ts (1)
src/utils/sanitizeDomain.ts (1)
  • sanitizeDomain (49-49)
src/services/SecurityTrails.ts (1)
src/utils/sanitizeDomain.ts (1)
  • sanitizeDomain (49-49)
src/services/PhishReport.ts (1)
src/utils/sanitizeDomain.ts (1)
  • sanitizeDomain (49-49)
src/services/VirusTotal.ts (1)
src/utils/sanitizeDomain.ts (3)
  • sanitizeDomain (49-49)
  • validateDomain (49-49)
  • DomainValidationError (49-49)
🪛 GitHub Check: CodeQL
src/services/VirusTotal.ts

[failure] 54-54: Use of externally-controlled format string
Format string depends on a user-provided value.

🔇 Additional comments (9)
src/utils/sanitizeDomain.ts (2)

1-1: DNS import looks good.

The import of the DNS promises API is appropriate for the new DNS resolution functionality.


12-12: Function signature change correctly reflects the new async behavior.

The change from synchronous to asynchronous is necessary for the DNS resolution functionality.

src/services/SinkingYahts.ts (1)

21-21: sanitizeDomain is synchronous—removal of await is correct

  • In src/utils/sanitizeDomain.ts (line 44), sanitizeDomain is defined as
    const sanitizeDomain = (domain: string): string => { … },
    confirming it returns a string synchronously.
  • All service calls now correctly invoke it without await.

No further changes required; approving.

src/services/SecurityTrails.ts (1)

22-22: LGTM - consistent with synchronous sanitizeDomain refactor.

This change aligns with the coordinated refactor to make sanitizeDomain synchronous across all service files.

src/services/IpQualityScore.ts (1)

23-23: LGTM - consistent with synchronous sanitizeDomain refactor.

This change aligns with the coordinated refactor to make sanitizeDomain synchronous across all service files.

src/services/GoogleSafebrowsing.ts (1)

22-22: LGTM - consistent with synchronous sanitizeDomain refactor.

This change aligns with the coordinated refactor to make sanitizeDomain synchronous across all service files.

src/services/VirusTotal.ts (3)

6-6: LGTM - proper imports for domain validation.

The imports correctly bring in the necessary validation utilities to implement graceful error handling for invalid domains.


22-33: Excellent domain validation implementation.

The synchronous sanitization followed by asynchronous validation with proper error handling is well-implemented. Invalid domains are logged and gracefully skipped rather than causing crashes.


67-78: Consistent domain validation in report method.

The validation implementation in the report method follows the same pattern as the check method, providing consistent error handling across both methods.

Comment on lines +34 to +40
// Perform DNS lookup to verify domain exists
try {
await dns.resolve(domain);
} catch (error) {
throw new DomainValidationError(`Domain does not resolve: ${domain}`);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Consider adding timeout and improving error handling for DNS resolution.

The DNS resolution logic could be more robust:

  1. DNS lookups can fail due to temporary network issues or DNS server problems
  2. No timeout is specified, which could lead to long delays
  3. Consider whether DNS validation should be mandatory or optional
  // Perform DNS lookup to verify domain exists
  try {
-    await dns.resolve(domain);
+    await Promise.race([
+      dns.resolve(domain),
+      new Promise((_, reject) => 
+        setTimeout(() => reject(new Error('DNS timeout')), 5000)
+      )
+    ]);
  } catch (error) {
-    throw new DomainValidationError(`Domain does not resolve: ${domain}`);
+    throw new DomainValidationError(`Domain does not resolve or DNS timeout: ${domain}`);
  }

Let me verify how this validation is being used across the codebase:


🏁 Script executed:

#!/bin/bash
# Check how validateDomain is being used across the codebase
echo "=== validateDomain usage ==="
rg -A 3 "validateDomain" --type ts

echo "=== DomainValidationError handling ==="
rg -A 5 "DomainValidationError" --type ts

Length of output: 4661


Add DNS lookup timeout and enhance error handling in sanitizeDomain.ts

To fail fast on hanging DNS queries and ensure downstream code handles timeouts correctly, update the DNS resolution block in src/utils/sanitizeDomain.ts as shown below. We’ve verified that validateDomain and its DomainValidationError are only used in src/services/VirusTotal.ts, where errors are already caught and logged.

src/utils/sanitizeDomain.ts

   // Perform DNS lookup to verify domain exists
   try {
-    await dns.resolve(domain);
+    await Promise.race([
+      dns.resolve(domain),
+      new Promise<never>((_, reject) =>
+        setTimeout(() => reject(new Error('DNS timeout')), 5000)
+      )
+    ]);
   } catch (error) {
-    throw new DomainValidationError(`Domain does not resolve: ${domain}`);
+    throw new DomainValidationError(
+      `Domain does not resolve or DNS timeout: ${domain}`
+    );
   }

   return true;

No additional changes are needed—every invocation of validateDomain already catches DomainValidationError and skips invalid domains.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Perform DNS lookup to verify domain exists
try {
await dns.resolve(domain);
} catch (error) {
throw new DomainValidationError(`Domain does not resolve: ${domain}`);
}
// Perform DNS lookup to verify domain exists
try {
await Promise.race([
dns.resolve(domain),
new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error('DNS timeout')), 5000)
)
]);
} catch (error) {
throw new DomainValidationError(
`Domain does not resolve or DNS timeout: ${domain}`
);
}
return true;
🤖 Prompt for AI Agents
In src/utils/sanitizeDomain.ts around lines 34 to 40, the DNS lookup currently
lacks a timeout and has basic error handling. Modify the DNS resolution to
include a timeout mechanism that rejects the lookup if it takes too long,
ensuring the function fails fast on hanging queries. Also, enhance the error
handling to differentiate timeout errors from other DNS errors, throwing a
DomainValidationError with an appropriate message in both cases.

Comment thread src/services/AbuseCh.ts
);

let sanitizedDomain = await sanitizeDomain(domain);
let sanitizedDomain = sanitizeDomain(domain);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Correct removal of await, but consider adding domain validation.

The removal of await is correct since sanitizeDomain is synchronous. However, unlike the VirusTotal service, this doesn't validate domains before making API calls.

Consider adding domain validation to prevent unnecessary API calls with invalid domains:

+import { validateDomain, DomainValidationError } from "src/utils/sanitizeDomain";

  check: async (domain: string) => {
    // metrics.increment("services.abusech.domain.check");

+   let sanitizedDomain = sanitizeDomain(domain);
+   
+   try {
+     await validateDomain(sanitizedDomain);
+   } catch (error) {
+     if (error instanceof DomainValidationError) {
+       console.warn(`Skipping AbuseCh check for invalid domain: ${sanitizedDomain}`, error.message);
+       return { error: "Invalid domain" };
+     }
+     throw error;
+   }

    let response = await axios.get(
      `https://urlhaus.abuse.ch/api/v1/host/${domain}`,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let sanitizedDomain = sanitizeDomain(domain);
// at top of src/services/AbuseCh.ts, alongside your other imports
import { validateDomain, DomainValidationError } from "src/utils/sanitizeDomain";
export const AbuseChService = {
check: async (domain: string) => {
// metrics.increment("services.abusech.domain.check");
let sanitizedDomain = sanitizeDomain(domain);
try {
await validateDomain(sanitizedDomain);
} catch (error) {
if (error instanceof DomainValidationError) {
console.warn(
`Skipping AbuseCh check for invalid domain: ${sanitizedDomain}`,
error.message
);
return { error: "Invalid domain" };
}
throw error;
}
let response = await axios.get(
`https://urlhaus.abuse.ch/api/v1/host/${domain}`
);
// …rest of your logic…
},
};
🤖 Prompt for AI Agents
In src/services/AbuseCh.ts at line 29, after removing the incorrect await from
sanitizeDomain, add a domain validation step before proceeding with API calls.
Implement a function or use a utility to check if the domain string is valid,
and only continue with the API call if the domain passes validation. This
prevents unnecessary requests with invalid domains and aligns with the
VirusTotal service's approach.

// metrics.increment("services.phishreport.domain.check");

const sanitizedDomain = await sanitizeDomain(domain);
const sanitizedDomain = sanitizeDomain(domain);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Correct removal of await, but missing domain validation.

The removal of await is correct since sanitizeDomain is synchronous. However, this service lacks domain validation before making API calls, which could result in unnecessary requests for invalid domains.

Consider adding domain validation consistent with the VirusTotal service approach:

+import { validateDomain, DomainValidationError } from "src/utils/sanitizeDomain";

  check: async (domain: string) => {
    // metrics.increment("services.phishreport.domain.check");

    const sanitizedDomain = sanitizeDomain(domain);
+   
+   try {
+     await validateDomain(sanitizedDomain);
+   } catch (error) {
+     if (error instanceof DomainValidationError) {
+       console.warn(`Skipping PhishReport check for invalid domain: ${sanitizedDomain}`, error.message);
+       return { error: "Invalid domain" };
+     }
+     throw error;
+   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const sanitizedDomain = sanitizeDomain(domain);
// At the top of src/services/PhishReport.ts
import { validateDomain, DomainValidationError } from "src/utils/sanitizeDomain";
export const PhishReport = {
check: async (domain: string) => {
// metrics.increment("services.phishreport.domain.check");
const sanitizedDomain = sanitizeDomain(domain);
try {
await validateDomain(sanitizedDomain);
} catch (error) {
if (error instanceof DomainValidationError) {
console.warn(
`Skipping PhishReport check for invalid domain: ${sanitizedDomain}`,
error.message
);
return { error: "Invalid domain" };
}
throw error;
}
// …rest of existing logic…
},
};
🤖 Prompt for AI Agents
In src/services/PhishReport.ts at line 22, after sanitizing the domain, add
domain validation to ensure the domain is valid before proceeding with API
calls. Implement a validation check similar to the VirusTotal service, and if
the domain is invalid, handle it appropriately by skipping the API call or
returning an error. This prevents unnecessary requests for invalid domains.

Comment thread src/services/Walshy.ts
check: async (domain: string) => {
// metrics.increment("services.walshy.domain.check");
const sanitizedDomain = await sanitizeDomain(domain);
const sanitizedDomain = sanitizeDomain(domain);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Correct removal of await, but both methods lack domain validation.

The removal of await is correct since sanitizeDomain is synchronous. However, both check and report methods should validate domains before making API calls to prevent unnecessary requests with invalid domains.

Consider adding domain validation to both methods:

+import { validateDomain, DomainValidationError } from "src/utils/sanitizeDomain";

    check: async (domain: string) => {
      // metrics.increment("services.walshy.domain.check");
      const sanitizedDomain = sanitizeDomain(domain);
+     
+     try {
+       await validateDomain(sanitizedDomain);
+     } catch (error) {
+       if (error instanceof DomainValidationError) {
+         console.warn(`Skipping Walshy check for invalid domain: ${sanitizedDomain}`, error.message);
+         return { error: "Invalid domain" };
+       }
+       throw error;
+     }

    report: async (domain: string) => {
      // metrics.increment("services.walshy.domain.report");

      const sanitizedDomain = sanitizeDomain(domain);
+     
+     try {
+       await validateDomain(sanitizedDomain);
+     } catch (error) {
+       if (error instanceof DomainValidationError) {
+         console.warn(`Skipping Walshy report for invalid domain: ${sanitizedDomain}`, error.message);
+         return { error: "Invalid domain" };
+       }
+       throw error;
+     }

Also applies to: 52-52

🤖 Prompt for AI Agents
In src/services/Walshy.ts at lines 21 and 52, both the check and report methods
call sanitizeDomain without validating the domain input. Add domain validation
logic before these calls to ensure the domain is valid and prevent unnecessary
API requests. Implement a validation function or use existing validation
utilities to verify the domain format and handle invalid domains appropriately
before proceeding.

Comment thread src/services/UrlScan.ts
check: async (domain: string) => {
// metrics.increment("services.urlscan.domain.check");
const sanitizedDomain = await sanitizeDomain(domain);
const sanitizedDomain = sanitizeDomain(domain);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Correct removal of await, but validation is especially important for UrlScan.

The removal of await is correct since sanitizeDomain is synchronous. However, domain validation is particularly important for UrlScan since it performs actual web scanning, which is resource-intensive and should not be wasted on invalid domains.

Add domain validation before the expensive scanning operations:

+import { validateDomain, DomainValidationError } from "src/utils/sanitizeDomain";

    check: async (domain: string) => {
      // metrics.increment("services.urlscan.domain.check");
      const sanitizedDomain = sanitizeDomain(domain);
+     
+     try {
+       await validateDomain(sanitizedDomain);
+     } catch (error) {
+       if (error instanceof DomainValidationError) {
+         console.warn(`Skipping UrlScan check for invalid domain: ${sanitizedDomain}`, error.message);
+         return { error: "Invalid domain" };
+       }
+       throw error;
+     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const sanitizedDomain = sanitizeDomain(domain);
// at the top of the file, next to the existing import of sanitizeDomain
import { sanitizeDomain } from "src/utils/sanitizeDomain";
import { validateDomain, DomainValidationError } from "src/utils/sanitizeDomain";
export const UrlScan = {
check: async (domain: string) => {
// metrics.increment("services.urlscan.domain.check");
const sanitizedDomain = sanitizeDomain(domain);
try {
await validateDomain(sanitizedDomain);
} catch (error) {
if (error instanceof DomainValidationError) {
console.warn(
`Skipping UrlScan check for invalid domain: ${sanitizedDomain}`,
error.message
);
return { error: "Invalid domain" };
}
throw error;
}
// ... existing scanning logic ...
},
};
🤖 Prompt for AI Agents
In src/services/UrlScan.ts at line 21, after sanitizing the domain with
sanitizeDomain, add a validation step to check if the sanitized domain is valid
before proceeding with any scanning operations. Implement a domain validation
function or use an existing one to verify the domain format and reject or handle
invalid domains early to avoid unnecessary resource usage during scanning.

@jaspermayone jaspermayone merged commit 75e4d85 into main Jun 29, 2025
19 of 22 checks passed
@jaspermayone jaspermayone deleted the claude/issue-37-20250629_140833 branch June 29, 2025 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug: VirusTotal throws error on invalid domains

3 participants