Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Good Word doc -> plain text conversion



Never used it, but Text::Extract::Word on CPAN.

	Jerry Natowitz
	j.natowitz (at) rcn.com


David Kramer wrote:
> On 09/19/2010 03:38 PM, jc-8FIgwK2HfyJMuWfdjsoA/w at public.gmane.org wrote:
>> Anyone here have advice on programs (scriptable and  usable
>> on linux) that convert Word docs to plain text?
>>
>> I've been googling, of course, but most of the  things  I'm
>> finding start with "1.  Load the file into Word". This is a
>> good clue that the scheme  probably  can't  be  used  in  a
>> script that's running on a linux system.  ;-)
> 
> If you want an automated solution. how about writing it in Java?
> 
> http://poi.apache.org/
> The Apache POI Project's mission is to create and maintain Java APIs for
> manipulating various file formats based upon the Office Open XML
> standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2).
> In short, you can read and write MS Excel files using Java. In addition,
> you can read and write MS Word and MS PowerPoint files using Java.
> Apache POI is your Java Excel solution (for Excel 97-2008). We have a
> complete API for porting other OOXML and OLE2 formats and welcome others
> to participate.
> 
> OLE2 files include most Microsoft Office files such as XLS, DOC, and PPT
> as well as MFC serialization API based file formats. The project
> provides APIs for the OLE2 Filesystem (POIFS) and OLE2 Document
> Properties (HPSF).
> 
> Here are some other solutions:
> http://www.linux.com/archive/feed/52385
> _______________________________________________
> Discuss mailing list
> Discuss-mNDKBlG2WHs at public.gmane.org
> http://lists.blu.org/mailman/listinfo/discuss
> 






BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org