Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month at the Massachusetts Institute of Technology, in Building E51.

BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Converting "rich" (MIME) email to plain text

This is not for extracting just the "Content-Type: text/plain" section of
an email message, but rather for converting an HTML file to plain text.

I needed to do this on a very limited scale, so I just wrote a few lines of
PHP that suited my situation:

function textify ($file) {
  $contents = file_get_contents($file);
  $contents = strip_tags($contents);
  $contents = htmlspecialchars_decode($contents, ENT_QUOTES); // including
single and double quotes
  $contents = str_replace(' ', ' ', $contents); // replace entity with
  $contents = preg_replace('#\{literal\}.*?\{/literal\}#mUs', '',
$contents); // remove {literal} Smarty blocks
  $contents = preg_replace("/[\t ]+/", " ", $contents); // replace
successive blanks with a single blank
  $contents = preg_replace("/^[\t ]+/m", "", $contents); // remove leading
  $contents = preg_replace("/^ *$\n/mU", "", $contents); // remove empty

  return $contents;

There is a class[1] to do this in PHP that has been used by several full
programs such as PHPMailer. Curiously, PHPMailer *removed* the class
because the former is GPL while the latter is LGPL[2]


Greg Rundlett

BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!

Boston Linux & Unix /