In this challenge, the goal is to collect the data from a picture of the receipt. Otherwise, collect data from a picture of a wine bottle.
Assuptions:
Sample Reciepts
Sample Wine
Tesseract is available as a package in Ubuntu.
$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.1 LTS Release: 14.04 Codename: trusty $ sudo apt-get install tesseract-ocr ... $ tesseract --version tesseract 3.03 leptonica-1.70 libgif 4.1.6(?) : libjpeg 8d : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : webp 0.4.0 $ tesseract receipt_01.jpg stdout > receipt_01.tsa.txt $ tesseract wine_01_front.jpg stdout > wine_01_front.tsa.txt $ tesseract wine_01_back.jpg stdout > wine_01_back.tsa.txt $ tesseract --print-parameters > tesseract_params.txt $ man tesseract
I suppose I might get better results if I modify some of those parameters. Or, maybe I need to specifically use the english language pack.
I guess, I thought it would have read the information much better by default...
Here is how to generate the tesseract and thumbnails
bash$ for i in *.jpg do tesseract $i tesseract_$i djpeg $i | pnmscale -xsize 200 | cjpeg -opto -progr -qual 75 > small_$i echo $i processed done
by JJ Stiff - jjstiff arroba hotmail punto com - 31 August 2014