Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month at the Massachusetts Institute of Technology, in Building E51.

BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] emoji in my url

Eric Chadbourne <sillystring at> writes:

> I just noticed that you can have an emoji URL. I'm I just old or is this moronic?
> The url bar should contain plain text and obscure nothing, else how do you know where you are?

Is this a URL with UCS characters? This is what RFC 3986 has to say:

   When a new URI scheme defines a component that represents textual
   data consisting of characters from the Universal Character Set
   [UCS], the data should first be encoded as octets according to the
   UTF-8 character encoding [STD63]; then only those octets that do
   not correspond to characters in the unreserved set should be
   percent- encoded.  For example, the character A would be
   represented as "A", the character LATIN CAPITAL LETTER A WITH
   GRAVE would be represented as "%C3%80", and the character KATAKANA
   LETTER A would be represented as "%E3%82%A2".

This is what it considers unreserved:

   unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

It also says this:

   A URI is a sequence of characters from a
   very limited set: the letters of the basic Latin alphabet, digits,
   and a few special characters.

So I'd say the URI with the emoji is supposed to be encoded (assuming
it's a standard UCS emoji).

But which is more obscure, %01%F6%3C or a little cat face with a wry
smile? I might like a way to get the UCS code point and long description
from the glyph, but I think I'd rather see the kitty by default even if
the character in the actual HTTP stream has to be encoded. Actually,
there is a way outside the browser to find out the codepoint. You could
copy and paste the glyph to the command line and run a command named uni
(included with the Perl module App::Uni on CPAN) on it. So yeah, if your
browser gets %01%F6%3C in a URI and shows you a face instead of the
standard URI encoding I think that's great (if there aren't security
implications from doing that, and if it lets you set this to your
preference).  But if it's some stupid thing like what Pidgin does to
certain character pairs then I'm with you. That would be awful.

Mike Small
smallm at

BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!

Boston Linux & Unix /