Skip to content

[Feature] Add unstructured doctor CLI for dependency and capability diagnostics #4341

@jsdevninja

Description

@jsdevninja

Summary

Implement a first-class diagnostics command that helps users quickly verify whether their environment is ready for specific document types before running partitioning.

Today, users typically discover missing dependencies only after a runtime failure (or by manually running scripts/collect_env.py). This creates friction, especially with optional extras and system-level dependencies across platforms.

Proposed feature

Add a CLI command such as:

  • unstructured doctor
  • unstructured doctor --for pdf
  • unstructured doctor --file path/to/document.pdf

Expected behavior

  • Report Python/package/system dependency status in a readable table.
  • Show supported file types in the current environment.
  • For unsupported file types, provide actionable install hints (e.g. pip install "unstructured[pdf]").
  • Include platform-aware checks (Windows/macOS/Linux) for common system requirements (e.g. libmagic, tesseract, libreoffice where relevant).
  • Return non-zero exit code when requested capability is not available (useful for CI).

Why this matters

  • Reduces setup confusion and support load.
  • Improves onboarding and developer experience.
  • Gives users proactive guidance instead of reactive runtime errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions