CLI tool that pulls metadata from files and URLs. Built this because I got tired of opening 5 different apps to check file info.
npm installYou'll also need ffmpeg if you want audio/video extraction. On Ubuntu: sudo apt install ffmpeg
# basic
node src/index.js ./report.pdf
node src/index.js https://github.com
# quiet mode - just JSON, no status messages
node src/index.js -q ./photo.jpg
# skip security scan (faster)
node src/index.js --no-security ./archive.zip| Type | What you get |
|---|---|
| pages, author, producer, text stats, language, watermark detection | |
| Images | dimensions, EXIF, GPS coords, camera/lens info |
| Audio | duration, codec, bitrate, sample rate |
| Video | resolution, fps, codecs, audio tracks |
| URLs | title, meta tags, OG/Twitter cards, load time |
| ZIP | file listing, sizes, nested zip detection |
Always JSON. Always this shape:
{
"status": "success",
"input_type": "file",
"source": "./report.pdf",
"metadata": { ... },
"extracted_at": "2024-01-15T12:00:00.000Z"
}If something fails, you get "status": "error" or "status": "partial" with an errors array.
URL extraction:
$ node src/index.js https://example.com{
"status": "success",
"input_type": "url",
"source": "https://example.com",
"metadata": {
"web": {
"title": "Example Domain",
"statusCode": 200,
"loadTimeMs": 245,
"pageStats": {
"linkCount": 1,
"imageCount": 0
}
}
}
}Error case:
$ node src/index.js ./doesnt-exist.pdf{
"status": "error",
"source": "./doesnt-exist.pdf",
"errors": ["File not found or not accessible"]
}src/
index.js # CLI entry
core/
extractor.js # routes to handlers
normalizer.js # builds output JSON
handlers/
pdf.js, image.js, audio.js, video.js, web.js, zip.js
utils/
file.js, hash.js, time.js
security/
scan.js # macro detection, pattern matching
- The virus scan is mocked - swap in ClamAV or VirusTotal for real use
- ZIP extraction has depth limits (3 levels) and file count limits (100) to prevent zip bombs
- Scans for suspicious patterns like
powershell,eval(), encoded strings - Flags external URLs in documents
- DOCX/XLSX text extraction not implemented yet (just security scan)
- YouTube URLs treated as regular web pages
- GPS extraction only works on images with EXIF data (obviously)
- ffprobe errors if ffmpeg not installed
MIT