meta-extract

CLI tool that pulls metadata from files and URLs. Built this because I got tired of opening 5 different apps to check file info.

Install

npm install

You'll also need ffmpeg if you want audio/video extraction. On Ubuntu: sudo apt install ffmpeg

Usage

# basic
node src/index.js ./report.pdf
node src/index.js https://github.com

# quiet mode - just JSON, no status messages
node src/index.js -q ./photo.jpg

# skip security scan (faster)
node src/index.js --no-security ./archive.zip

What it extracts

Type	What you get
PDF	pages, author, producer, text stats, language, watermark detection
Images	dimensions, EXIF, GPS coords, camera/lens info
Audio	duration, codec, bitrate, sample rate
Video	resolution, fps, codecs, audio tracks
URLs	title, meta tags, OG/Twitter cards, load time
ZIP	file listing, sizes, nested zip detection

Output

Always JSON. Always this shape:

{
  "status": "success",
  "input_type": "file",
  "source": "./report.pdf",
  "metadata": { ... },
  "extracted_at": "2024-01-15T12:00:00.000Z"
}

If something fails, you get "status": "error" or "status": "partial" with an errors array.

Examples

URL extraction:

$ node src/index.js https://example.com

{
  "status": "success",
  "input_type": "url",
  "source": "https://example.com",
  "metadata": {
    "web": {
      "title": "Example Domain",
      "statusCode": 200,
      "loadTimeMs": 245,
      "pageStats": {
        "linkCount": 1,
        "imageCount": 0
      }
    }
  }
}

Error case:

$ node src/index.js ./doesnt-exist.pdf

{
  "status": "error",
  "source": "./doesnt-exist.pdf",
  "errors": ["File not found or not accessible"]
}

Project structure

src/
  index.js          # CLI entry
  core/
    extractor.js    # routes to handlers
    normalizer.js   # builds output JSON
  handlers/
    pdf.js, image.js, audio.js, video.js, web.js, zip.js
  utils/
    file.js, hash.js, time.js
  security/
    scan.js         # macro detection, pattern matching

Security stuff

The virus scan is mocked - swap in ClamAV or VirusTotal for real use
ZIP extraction has depth limits (3 levels) and file count limits (100) to prevent zip bombs
Scans for suspicious patterns like powershell, eval(), encoded strings
Flags external URLs in documents

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
examples/sample-outputs		examples/sample-outputs
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

meta-extract

Install

Usage

What it extracts

Output

Examples

Project structure

Security stuff

Known limitations

License

more legit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

meta-extract

Install

Usage

What it extracts

Output

Examples

Project structure

Security stuff

Known limitations

License

more legit

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages