BLU Discuss list archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss] Converting "rich" (MIME) email to plain text
- Subject: [Discuss] Converting "rich" (MIME) email to plain text
- From: cra at WPI.EDU (Chuck Anderson)
- Date: Wed, 17 Feb 2016 13:18:35 -0500
- In-reply-to: <56C4A23A.2040704@gMail.com>
- References: <56C4A23A.2040704@gMail.com>
On Wed, Feb 17, 2016 at 11:39:22AM -0500, Michael Tiernan wrote: > I'm sure that I'm not the first who tried to find an easy way to > filter a piece of email so that only the plain text comes out. > > I can find lots of things about going plain to HTML but I've not > seen anything that allows you to just extract the "Content-Type: > text/plain" section of an email. > > Any pointers available? I don't want to try and reinvent the > reinvented wheel. Here is what I use with Mutt to get lightly-formatted text and unobfuscated links. It isn't perfect, but it works acceptably 90% of the time and it avoids downloading any remote links which was my primary goal. >grep mailcap .muttrc set mailcap_path = ~/.muttmailcap set mailcap_sanitize >cat .muttmailcap text/html; /home/cra/bin/striphtml.pl; copiousoutput text/calendar; /home/cra/bin/vcalendar-filter; copiousoutput >cat ~/bin/striphtml.pl #!/usr/bin/perl -w use HTML::Strip; use HTML::LinkExtor; use HTML::Entities qw/decode_entities/; use URI::Escape qw/uri_unescape/; use Encode qw/from_to/; undef $/; my $html_text = <ARGV>; my $charset = 'UTF-8'; if ($html_text =~ /\ncontent-type:\s+text\/html;\s+charset=(.*)/i) { $charset = $1; $charset =~ s/\"//g; } else { print "no char set\n"; #print $html_text; } $html_text =~ s/<br>/\n/gi; $html_text =~ s/<p>/\n/gi; my $hs = HTML::Strip->new(); my $stripped_text = $hs->parse($html_text); my $decoded_text = decode_entities($stripped_text); $decoded_text =~ s/\n\s*\n/\n\n/g; $decoded_text =~ s/\n\n+/\n\n/g; $decoded_text =~ s/\240/ /g; $decoded_text =~ s/\r//g; #$decoded_text = decode($charset, $decoded_text); ###from_to($decoded_text, $charset, 'UTF-8'); my $hl = HTML::LinkExtor->new(); $hl->parse($html_text); my @links = $hl->links; print "Charset: $charset\n"; print "Message:\n\n"; print $decoded_text; print "\nLinks:\n\n"; foreach my $link (@links) { printf "%-7s %-15s %s\n", $$link[0], $$link[1], uri_unescape($$link[2]); }
- References:
- [Discuss] Converting "rich" (MIME) email to plain text
- From: michael.tiernan at gmail.com (Michael Tiernan)
- [Discuss] Converting "rich" (MIME) email to plain text
- Prev by Date: [Discuss] Converting "rich" (MIME) email to plain text
- Next by Date: [Discuss] Converting "rich" (MIME) email to plain text
- Previous by thread: [Discuss] Converting "rich" (MIME) email to plain text
- Next by thread: [Discuss] Converting "rich" (MIME) email to plain text
- Index(es):