[Discuss] Converting "rich" (MIME) email to plain text

Derek Martin invalid at pizzashack.org
Thu Feb 18 20:45:56 EST 2016


Hi Michael,

On Wed, Feb 17, 2016 at 11:39:22AM -0500, Michael Tiernan wrote:
> I can find lots of things about going plain to HTML but I've not
> seen anything that allows you to just extract the "Content-Type:
> text/plain" section of an email.

I've read over the thread twice now, including both of the messages
you posted, and honestly I'm still not 100% sure I know what you're
asking.  I suspect that at least some people are finding the
terminology you're using to be unclear.  Some possible confusion:

"Rich Text Format" (RTF):
  This is a dead encoding format that was intended to be better suited
  to formating e-mail messages than HTML is, on account of it being
  simpler and less prone to making your mail reader do untoward
  things.  It never gained much popularity though, this is probably
  NOT what you're trying to deal with. THough, if it is, the best
  alternative may be to get your senders to stop using it.

"Multi-part MIME message":
  This is a MIME content type that actually has several purposes, the
  most common of which is to supply multiple alternative versions of
  the same content (in theory).  Each of the parts has its own
  Content-Type.  It's meant to handle e-mail messages that someone
  wanted to format, with the understanding that not all clients can do
  it, and not all humans want it.  Support for this is somewhat
  variable among mail clients.  Typically ONE of the multiple parts
  has a Content-Type of text/plain, though strictly speaking that need
  not necessarily be true.  

  If this is what you're after, your mail client may or may not have a
  feature that lets you save individual message parts.  That's likely
  the best option.  Mutt is a mail client that allows this, and there
  are others, though I've not been keeping track of what clients have
  what features these days.

HTML messages:
  A third possibility is you're receiving HTML mail, and simply want
  to extract the meaningful bits from the HTML.  Someone suggested
  using a text-based web browser like links to do that, though you may
  still need to save the message part to disk first, depending on your
  particular environment and mail client.

Plain Text:
  The last alternative I can think of is that your messages are
  already just single-part plain text messages, and you simply want to
  extract the message body part from the e-mail message (i.e. the
  message content without the headers).  This seems like the easiest
  of the problems, but the trick is that you may need to deal with
  "quoted-printable" or other alternative message encodings to make
  the text readable.  I'm not positive, but procmail (or formail, part
  of the procmail package) might do this for you...  Otherwise I'm
  unaware of any specific tool that does this.  I can think of a way
  to use Mutt to do this, but it would involve writing a script to use
  with Mutt's "display filter" feature, which simply cats the message
  content into a text file.

Then of course, you might be trying to accomplish a combination of all
of these...

The long and short of it is, there ARE ways to do what you want, but
the exact details may depend on what mail client you're using, and
what exactly you're trying to accomplish.  I personally am not
familiar with any single canned solution to any of these problems, if
automating the process is what you're after.

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.




More information about the Discuss mailing list