Summary
Implement a first-class diagnostics command that helps users quickly verify whether their environment is ready for specific document types before running partitioning.
Today, users typically discover missing dependencies only after a runtime failure (or by manually running scripts/collect_env.py). This creates friction, especially with optional extras and system-level dependencies across platforms.
Proposed feature
Add a CLI command such as:
- unstructured doctor
- unstructured doctor --for pdf
- unstructured doctor --file path/to/document.pdf
Expected behavior
- Report Python/package/system dependency status in a readable table.
- Show supported file types in the current environment.
- For unsupported file types, provide actionable install hints (e.g. pip install "unstructured[pdf]").
- Include platform-aware checks (Windows/macOS/Linux) for common system requirements (e.g. libmagic, tesseract, libreoffice where relevant).
- Return non-zero exit code when requested capability is not available (useful for CI).
Why this matters
- Reduces setup confusion and support load.
- Improves onboarding and developer experience.
- Gives users proactive guidance instead of reactive runtime errors.
Summary
Implement a first-class diagnostics command that helps users quickly verify whether their environment is ready for specific document types before running partitioning.
Today, users typically discover missing dependencies only after a runtime failure (or by manually running scripts/collect_env.py). This creates friction, especially with optional extras and system-level dependencies across platforms.
Proposed feature
Add a CLI command such as:
Expected behavior
Why this matters