Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
I've got an HP OfficeJet 5610 and I'm interested in finding a replacement for the bundled scanner software. The HP software (Windows) can produce something it calls a "searchable PDF." I really like this format because it's combines an image of the document with OCR'd text. The text gets embedded in such a way that you can select/copy text directly from acroread, evince, etc. I've tried gscan2pdf and it comes pretty close to what I'm looking for. However... 1. The OCR'd text gets embedded differently, so you can't actually select/copy the OCR'd text from a PDF viewer. 2. The OCR back-ends for gscan2pdf (tesserract and GOCR) seem to have trouble with multiple columns of text, or things like pay-stubs where the text doesn't flow in paragraphs. The free HP software seems to handle this without a problem. So, I've been scanning from Windows. I'd really like to find an alternative. Any suggestions? Thanks! David
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |