BLU Discuss list archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss] emoji in my url
- Subject: [Discuss] emoji in my url
- From: smallm at sdf.org (Mike Small)
- Date: Thu, 23 Mar 2017 15:16:29 +0000
- In-reply-to: <hyZqFB0TRhr-e7Mm6dSHrMw1sNeMSLkd1S47eDXoNpuB10vevcGkoDBmRQ5G-y5D98uFAuWcn51mQIScPT7Ov2HThwShmxOH3z0X1mwjcdo=@protonmail.com> (Eric Chadbourne's message of "Thu, 23 Mar 2017 10:08:25 -0400")
- References: <hyZqFB0TRhr-e7Mm6dSHrMw1sNeMSLkd1S47eDXoNpuB10vevcGkoDBmRQ5G-y5D98uFAuWcn51mQIScPT7Ov2HThwShmxOH3z0X1mwjcdo=@protonmail.com>
Eric Chadbourne <sillystring at protonmail.com> writes: > I just noticed that you can have an emoji URL. I'm I just old or is this moronic? > > The url bar should contain plain text and obscure nothing, else how do you know where you are? Is this a URL with UCS characters? This is what RFC 3986 has to say: When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded. For example, the character A would be represented as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be represented as "%C3%80", and the character KATAKANA LETTER A would be represented as "%E3%82%A2". This is what it considers unreserved: unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" It also says this: A URI is a sequence of characters from a very limited set: the letters of the basic Latin alphabet, digits, and a few special characters. So I'd say the URI with the emoji is supposed to be encoded (assuming it's a standard UCS emoji). But which is more obscure, %01%F6%3C or a little cat face with a wry smile? I might like a way to get the UCS code point and long description from the glyph, but I think I'd rather see the kitty by default even if the character in the actual HTTP stream has to be encoded. Actually, there is a way outside the browser to find out the codepoint. You could copy and paste the glyph to the command line and run a command named uni (included with the Perl module App::Uni on CPAN) on it. So yeah, if your browser gets %01%F6%3C in a URI and shows you a face instead of the standard URI encoding I think that's great (if there aren't security implications from doing that, and if it lets you set this to your preference). But if it's some stupid thing like what Pidgin does to certain character pairs then I'm with you. That would be awful. -- Mike Small smallm at sdf.org
- References:
- [Discuss] emoji in my url
- From: sillystring at protonmail.com (Eric Chadbourne)
- [Discuss] emoji in my url
- Prev by Date: [Discuss] emoji in my url
- Next by Date: [Discuss] emoji in my url
- Previous by thread: [Discuss] emoji in my url
- Next by thread: [Discuss] emoji in my url
- Index(es):