[Discuss] Converting "rich" (MIME) email to plain text

Greg Rundlett (freephile) greg at freephile.com
Wed Feb 17 16:54:12 EST 2016


This is not for extracting just the "Content-Type: text/plain" section of
an email message, but rather for converting an HTML file to plain text.

I needed to do this on a very limited scale, so I just wrote a few lines of
PHP that suited my situation:

function textify ($file) {
  $contents = file_get_contents($file);
  $contents = strip_tags($contents);
  $contents = htmlspecialchars_decode($contents, ENT_QUOTES); // including
single and double quotes
  $contents = str_replace(' ', ' ', $contents); // replace entity with
space
  $contents = preg_replace('#\{literal\}.*?\{/literal\}#mUs', '',
$contents); // remove {literal} Smarty blocks
  $contents = preg_replace("/[\t ]+/", " ", $contents); // replace
successive blanks with a single blank
  $contents = preg_replace("/^[\t ]+/m", "", $contents); // remove leading
blanks
  $contents = preg_replace("/^ *$\n/mU", "", $contents); // remove empty
lines

  return $contents;
}


There is a class[1] to do this in PHP that has been used by several full
programs such as PHPMailer. Curiously, PHPMailer *removed* the class
because the former is GPL while the latter is LGPL[2]

[1] https://github.com/mtibben/html2text
[2]
https://github.com/PHPMailer/PHPMailer/commit/127d26ef3c43118d82c244c15016cf37d67504c6


Greg Rundlett
https://eQuality-Tech.com
https://freephile.org



More information about the Discuss mailing list