[Discuss] Converting "rich" (MIME) email to plain text
Greg Rundlett (freephile)
greg at freephile.com
Wed Feb 17 16:54:12 EST 2016
This is not for extracting just the "Content-Type: text/plain" section of
an email message, but rather for converting an HTML file to plain text.
I needed to do this on a very limited scale, so I just wrote a few lines of
PHP that suited my situation:
function textify ($file) {
$contents = file_get_contents($file);
$contents = strip_tags($contents);
$contents = htmlspecialchars_decode($contents, ENT_QUOTES); // including
single and double quotes
$contents = str_replace(' ', ' ', $contents); // replace entity with
space
$contents = preg_replace('#\{literal\}.*?\{/literal\}#mUs', '',
$contents); // remove {literal} Smarty blocks
$contents = preg_replace("/[\t ]+/", " ", $contents); // replace
successive blanks with a single blank
$contents = preg_replace("/^[\t ]+/m", "", $contents); // remove leading
blanks
$contents = preg_replace("/^ *$\n/mU", "", $contents); // remove empty
lines
return $contents;
}
There is a class[1] to do this in PHP that has been used by several full
programs such as PHPMailer. Curiously, PHPMailer *removed* the class
because the former is GPL while the latter is LGPL[2]
[1] https://github.com/mtibben/html2text
[2]
https://github.com/PHPMailer/PHPMailer/commit/127d26ef3c43118d82c244c15016cf37d67504c6
Greg Rundlett
https://eQuality-Tech.com
https://freephile.org
More information about the Discuss
mailing list