Skip to content

⭐️ The native node.js bindings to the Tesseract OCR project.

License

Notifications You must be signed in to change notification settings

kaelzhang/penteract-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

d7afef3 · Sep 4, 2017

History

34 Commits
Aug 25, 2017
Aug 28, 2017
Aug 29, 2017
Sep 4, 2017
Aug 29, 2017
Aug 23, 2017
Aug 23, 2017
Sep 4, 2017
Aug 24, 2017
Aug 29, 2017
Sep 4, 2017
Aug 23, 2017
Aug 29, 2017
Aug 27, 2017
Sep 4, 2017
Aug 29, 2017

Repository files navigation

Build Status Coverage

penteract

The native Node.js bindings to the Tesseract OCR project.

  • Using Node.js bindings, avoid spawning tesseract command line.
  • Asynchronous I/O: Image reading and processing in insulated event loop backed by libuv.
  • Support to read image data from JavaScript buffers.

Contributions are welcome.

Install

First of all, a g++ 4.9 compiler is required.

Before install penteract, the following dependencies should be installed

$ brew install pkg-config tesseract # mac os

Then npm install

$ npm install penteract

To Use with Electron

Due to the limitation of node native modules, if you want to use penteract with electron, add a .npmrc file to the root of your electron project, before npm install:

runtime = electron
; The version of the local electron,
; use `npm ls electron` to figure it out
target = 1.7.5
target_arch = x64
disturl = https://atom.io/download/atom-shell

Usage

Recognize an Image Buffer

import {
  recognize
} from 'penteract'

import fs from 'fs-extra'

const filepath = path.join(__dirname, 'test', 'fixtures', 'penteract.jpg')

fs.readFile(filepath).then(recognize).then(console.log) // 'penteract'

Recognize a Local Image File

import {
  fromFile
} from 'penteract'

fromFile(filepath, {lang: 'eng'}).then(console.log)     // 'penteract'

recognize(image [, options])

  • image Buffer the content buffer of the image file.
  • options PenteractOptions= optional

Returns Promise.<String> the recognized text if succeeded.

fromFile(filepath [, options])

  • filepath Path the file path of the image file.
  • options PenteractOptions=

Returns Promise.<String>

PenteractOptions Object

{
  // @type `(String|Array.<String>)=eng`,
  //
  // Specifies language(s) used for OCR.
  //   Run `tesseract --list-langs` in command line for all supported languages.
  //   Defaults to `'eng'`.
  //
  // To specify multiple languages, use an array.
  //   English and Simplified Chinese, for example:
  // ```
  // lang: ['eng', 'chi_sim']
  // ```
  lang: 'eng'
}

Promise.reject(error)

  • error Error The JavaScript Error instance
    • code String Error code.
    • message String Error message.
    • other properties of Error.

code: ERR_READ_IMAGE

Rejects if it fails to read image data from file or buffer.

code: ERR_INIT_TESSER

Rejects if tesseract fails to initialize

Example of Using with Electron

// For details of `mainWindow: BrowserWindow`, see
// https://github.com/electron/electron/blob/master/docs/api/browser-window.md
mainWindow.capturePage({
  x: 10,
  y: 10,
  width: 100,
  height: 10

}, (data) => {
  recognize(data.toPNG()).then(console.log)
})

Compiling Troubles

For Mac OS users, if you are experiencing trouble when compiling, run the following command:

$ xcode-select --install

will resolve most problems.

Warnings:

xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance

resolver:

$ sudo xcode-select -s /Applications/Xcode.app/Contents/Developer

License

MIT