----------------------------------------------------------------
  NHocr - the Japanese OCR
----------------------------------------------------------------

1. Introduction

NHocr is a command line OCR (Optical Character Recognition)
program for Japanese language. It has been designed to recognize
machine-printed Japanese characters and some ASCII characters
/symbols in an image.
NHocr is probably the first Open Source Japanese OCR software,
except some experimental, partial codes open to the academic
communities.

"nhocr" command reads PBM/PGM/PPM image file(s), recognizes the
text line image for each file, and produces text data in UTF-8.
Each file should contain only ONE horizontal text line image
without any surrounding dirt.

You can also use NHocr through WeOCR service at:
  http://appsv.ocrgrid.org/nhocr/

The program is highly experimental, and the character
recognition performance is limited. (You would become happier
with a commercial product if you want a high performance OCR.)

The character feature used in NHocr is based on Peripheral 
Local Moment (P-LM) proposed by Hori et al. in late 90's.

NHocr is originally a product of the author's weekend
programming. The development work may be rather slow.




2. Installation and configuration

1) O2-tools-2.00 (or newer) is required for building NHocr.
   The source package is available at:
     http://www.imglab.org/p/O2/

   Download O2-tools-2.xx.tar.gz, build it, and install it.


2) Run configure script with --with-O2tools option in the top
   directory. Then, build and install the programs.

  $ ./configure --with-O2tools=<O2tools_directory_on_your_system>
  $ make
  (switch to root if necessary)
  # make install


3) If you have changed --prefix and installed NHocr at a
   location other than the standard directory (/opt/nhocr), the
   environment variable NHOCR_DICDIR must be set so it points to
   the data directory.

   For example, if the prefix is set to /usr/local/nhocr ,

  $ NHOCR_DICDIR=/usr/local/nhocr/share ; export NHOCR_DICDIR




3. Usage
 
Running nhocr without any argument will show the usage.
A typical usage is:

  $ nhocr -line -o output.txt input.pgm




4. Using NHocr with OCRopus

NHocr can be used as a line recognizer together with OCRopus,
a document analysis and OCR system.

NHocr-OCRopus bridge is included in the package.  See the Lua
scripts in ocropus/ directory.




5. License

See LICENSE file.




For details:
  http://code.google.com/p/nhocr/
  http://sourceforge.jp/projects/nhocr/
--
May 15, 2009  Hideaki Goto,  Tohoku University, Japan
