A FastAPI-based REST API service that converts various file formats to Markdown using Microsoft's MarkItDown library.
- Convert single files to Markdown format
- Batch conversion support
- Download converted files or get JSON response
- Image OCR support (requires OpenAI API key)
- Support for multiple file formats:
- Documents: PDF, DOCX, PPTX, XLSX
- Images: PNG, JPG, JPEG, GIF, BMP, TIFF
- Audio: MP3, WAV, M4A, OGG
- Web: HTML, XML
- Data: CSV, JSON
- Text: TXT, MD, RTF
- Other: EPUB, ZIP
This project uses modern development tools for improved developer experience:
We use Task instead of traditional Makefiles for task automation. Task provides:
- Cross-platform compatibility (works on Windows, macOS, Linux)
- YAML syntax that's more readable than Makefiles
- Built-in variable support and dependency management
- Better error handling and output formatting
API testing is done with Hurl instead of traditional curl scripts or Postman collections:
- Tests are written in plain text files that are easy to version control
- Human-readable format that serves as living documentation
- Built-in assertions and JSON path support
- Can be integrated into CI/CD pipelines
- Faster than UI-based testing tools
- Create virtual environment and install dependencies:
task install
Or manually:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Start the server:
task start
Or manually:
python main.py
The API will be available at http://localhost:8000
Create a .env
file (see .env.example
):
OPENAI_API_KEY
- Optional: Enable OCR for image filesMODEL
- Optional: OpenAI model to use for OCR (default: gpt-4o, recommended: gpt-4-turbo)
POST /api/v1/convert
Upload a file to convert to Markdown. Optional query parameter download=true
to download the result as a .md file.
Example:
curl -X POST "http://localhost:8000/api/v1/convert" \
-F "[email protected]"
POST /api/v1/convert/batch
Convert multiple files in a single request.
Example:
curl -X POST "http://localhost:8000/api/v1/convert/batch" \
-F "[email protected]" \
-F "[email protected]"
GET /health
Interactive API documentation available at:
- Swagger UI:
http://localhost:8000/docs
- ReDoc:
http://localhost:8000/redoc
All API tests are written in Hurl format and located in tests/hurl/
. Tests serve as both validation and documentation.
Run all tests:
task test
Test specific endpoints:
task test:health # Health endpoints
task test:convert # Conversion endpoints
task test:image # Image OCR (requires OPENAI_API_KEY)
View all available tasks:
task --list
Common development tasks:
task install # Install dependencies
task start # Start the API server
task dev # Start with auto-reload
task test # Run all tests
Build and run with Docker:
# Build the image
docker build -t markitdown-api .
# Run the container
docker run -p 8000:8000 markitdown-api
# With environment variables
docker run -p 8000:8000 -e OPENAI_API_KEY=your_key markitdown-api
{
"filename": "document.pdf",
"original_format": ".pdf",
"markdown_content": "# Converted content...",
"metadata": {
"file_size": 12345,
"converted_at": 1234567890.123
},
"conversion_time": 0.234
}
{
"successful_conversions": [...],
"errors": [...],
"total_files": 3,
"successful": 2,
"failed": 1
}