Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Never used it, but Text::Extract::Word on CPAN. Jerry Natowitz j.natowitz (at) rcn.com David Kramer wrote: > On 09/19/2010 03:38 PM, jc-8FIgwK2HfyJMuWfdjsoA/w at public.gmane.org wrote: >> Anyone here have advice on programs (scriptable and usable >> on linux) that convert Word docs to plain text? >> >> I've been googling, of course, but most of the things I'm >> finding start with "1. Load the file into Word". This is a >> good clue that the scheme probably can't be used in a >> script that's running on a linux system. ;-) > > If you want an automated solution. how about writing it in Java? > > http://poi.apache.org/ > The Apache POI Project's mission is to create and maintain Java APIs for > manipulating various file formats based upon the Office Open XML > standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). > In short, you can read and write MS Excel files using Java. In addition, > you can read and write MS Word and MS PowerPoint files using Java. > Apache POI is your Java Excel solution (for Excel 97-2008). We have a > complete API for porting other OOXML and OLE2 formats and welcome others > to participate. > > OLE2 files include most Microsoft Office files such as XLS, DOC, and PPT > as well as MFC serialization API based file formats. The project > provides APIs for the OLE2 Filesystem (POIFS) and OLE2 Document > Properties (HPSF). > > Here are some other solutions: > http://www.linux.com/archive/feed/52385 > _______________________________________________ > Discuss mailing list > Discuss-mNDKBlG2WHs at public.gmane.org > http://lists.blu.org/mailman/listinfo/discuss >
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |