[Discuss] Converting "rich" (MIME) email to plain text
Derek Martin
invalid at pizzashack.org
Thu Feb 18 20:45:56 EST 2016
Hi Michael,
On Wed, Feb 17, 2016 at 11:39:22AM -0500, Michael Tiernan wrote:
> I can find lots of things about going plain to HTML but I've not
> seen anything that allows you to just extract the "Content-Type:
> text/plain" section of an email.
I've read over the thread twice now, including both of the messages
you posted, and honestly I'm still not 100% sure I know what you're
asking. I suspect that at least some people are finding the
terminology you're using to be unclear. Some possible confusion:
"Rich Text Format" (RTF):
This is a dead encoding format that was intended to be better suited
to formating e-mail messages than HTML is, on account of it being
simpler and less prone to making your mail reader do untoward
things. It never gained much popularity though, this is probably
NOT what you're trying to deal with. THough, if it is, the best
alternative may be to get your senders to stop using it.
"Multi-part MIME message":
This is a MIME content type that actually has several purposes, the
most common of which is to supply multiple alternative versions of
the same content (in theory). Each of the parts has its own
Content-Type. It's meant to handle e-mail messages that someone
wanted to format, with the understanding that not all clients can do
it, and not all humans want it. Support for this is somewhat
variable among mail clients. Typically ONE of the multiple parts
has a Content-Type of text/plain, though strictly speaking that need
not necessarily be true.
If this is what you're after, your mail client may or may not have a
feature that lets you save individual message parts. That's likely
the best option. Mutt is a mail client that allows this, and there
are others, though I've not been keeping track of what clients have
what features these days.
HTML messages:
A third possibility is you're receiving HTML mail, and simply want
to extract the meaningful bits from the HTML. Someone suggested
using a text-based web browser like links to do that, though you may
still need to save the message part to disk first, depending on your
particular environment and mail client.
Plain Text:
The last alternative I can think of is that your messages are
already just single-part plain text messages, and you simply want to
extract the message body part from the e-mail message (i.e. the
message content without the headers). This seems like the easiest
of the problems, but the trick is that you may need to deal with
"quoted-printable" or other alternative message encodings to make
the text readable. I'm not positive, but procmail (or formail, part
of the procmail package) might do this for you... Otherwise I'm
unaware of any specific tool that does this. I can think of a way
to use Mutt to do this, but it would involve writing a script to use
with Mutt's "display filter" feature, which simply cats the message
content into a text file.
Then of course, you might be trying to accomplish a combination of all
of these...
The long and short of it is, there ARE ways to do what you want, but
the exact details may depend on what mail client you're using, and
what exactly you're trying to accomplish. I personally am not
familiar with any single canned solution to any of these problems, if
automating the process is what you're after.
--
Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address. Replying to it will result in
undeliverable mail due to spam prevention. Sorry for the inconvenience.
More information about the Discuss
mailing list