Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On 09/19/2010 03:38 PM, jc-8FIgwK2HfyJMuWfdjsoA/w at public.gmane.org wrote: > Anyone here have advice on programs (scriptable and usable > on linux) that convert Word docs to plain text? > > I've been googling, of course, but most of the things I'm > finding start with "1. Load the file into Word". This is a > good clue that the scheme probably can't be used in a > script that's running on a linux system. ;-) If you want an automated solution. how about writing it in Java? http://poi.apache.org/ The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). In short, you can read and write MS Excel files using Java. In addition, you can read and write MS Word and MS PowerPoint files using Java. Apache POI is your Java Excel solution (for Excel 97-2008). We have a complete API for porting other OOXML and OLE2 formats and welcome others to participate. OLE2 files include most Microsoft Office files such as XLS, DOC, and PPT as well as MFC serialization API based file formats. The project provides APIs for the OLE2 Filesystem (POIFS) and OLE2 Document Properties (HPSF). Here are some other solutions: http://www.linux.com/archive/feed/52385
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |