![]() |
Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
James R. Van Zandt wrote: > I have put together a sizable collection of IEEE papers, but they're > image-only PDFs, making them hard to search. > > Is there a convenient way to add the metadata to the PDF files > themselves, along with (say) a hand-typed abstract and OCR of the > rest, so the whole thing can be indexed by something like beagle > <http://beaglewiki.org/Main_Page>? > > - Jim Van Zandt I would start by running pdftotext on them, then using regular expressions to pull metadata out of the text versions. Oddly enough, this is the basis of one of the projects I'm working on at Aptima. Pulling metadata from information coming from many sources in many formats, tracking the metadata, and grouping documents into that metadata.
![]() |
|
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |