Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month, online, via Jitsi Meet.

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Converting "rich" (MIME) email to plain text



This is not for extracting just the "Content-Type: text/plain" section of
an email message, but rather for converting an HTML file to plain text.

I needed to do this on a very limited scale, so I just wrote a few lines of
PHP that suited my situation:

function textify ($file) {
  $contents = file_get_contents($file);
  $contents = strip_tags($contents);
  $contents = htmlspecialchars_decode($contents, ENT_QUOTES); // including
single and double quotes
  $contents = str_replace(' ', ' ', $contents); // replace entity with
space
  $contents = preg_replace('#\{literal\}.*?\{/literal\}#mUs', '',
$contents); // remove {literal} Smarty blocks
  $contents = preg_replace("/[\t ]+/", " ", $contents); // replace
successive blanks with a single blank
  $contents = preg_replace("/^[\t ]+/m", "", $contents); // remove leading
blanks
  $contents = preg_replace("/^ *$\n/mU", "", $contents); // remove empty
lines

  return $contents;
}


There is a class[1] to do this in PHP that has been used by several full
programs such as PHPMailer. Curiously, PHPMailer *removed* the class
because the former is GPL while the latter is LGPL[2]

[1] https://github.com/mtibben/html2text
[2]
https://github.com/PHPMailer/PHPMailer/commit/127d26ef3c43118d82c244c15016cf37d67504c6


Greg Rundlett
https://eQuality-Tech.com
https://freephile.org



BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org