BLU Discuss list archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss] Please help with a PHP and/or Apache2 problem
- Subject: [Discuss] Please help with a PHP and/or Apache2 problem
- From: alex at pennace.org (Alex Pennace)
- Date: Sun, 7 Feb 2021 21:58:06 -0500
- In-reply-to: <44e0e029-3da7-cd56-3a2d-bda68743d3ed@gmail.com>
- References: <06ab24f0-ab46-350f-36ae-365a1a7b0f94@gmail.com> <20210207181343.GC621@buick.pennace.org> <44e0e029-3da7-cd56-3a2d-bda68743d3ed@gmail.com>
On Sun, Feb 07, 2021 at 04:31:05PM -0500, Bill Horne wrote: > On 2/7/2021 1:13 PM, Alex Pennace wrote: > > try this: > > > > <?php > > header('Content-Type: text/html; charset=ISO-8859-1'); > > include "[redacted]/archives/back.issues/recent.single.issues/V40-38"; > > ?> > > > Alex, that cured the problem. THANK YOU for you help and expertise. No problem! > The archived html file contains a line which I thought would send the same > info: > > <html> > > <head> > > <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=iso-8859-1"> > > Please tell me if the use of "META" tags is deprecated, and why it works on > files which aren't included via PHP code. I'd also like to know where the > "utf-8" response from Apache2 is set. Answering the questions out of order: Where is the "utf-8" default set: I believe it would be somewhere in PHP itself. Note that the non-PHP URL (presumably served directly by Apache) does not have a "Content-Type" HTTP header at all. Which leads us to: What is going on with meta tags: Evidently[1], the browsers we are using use the Content-Type HTTP header's value for the charset, else use the Content-Type meta tag in the HTML[2]. The non-PHP URLs don't have a Content-Type HTTP header, so the browser goes by the meta tag. The PHP URLs did set a Content-Type header, which overrides the meta tag. A debate on Stack Overflow seems to yield an answer of using both HTTP header and meta tag (naturally, both should be the same value): https://stackoverflow.com/questions/9417024/response-header-vs-meta-tag https://html.spec.whatwg.org/multipage/semantics.html#pragma-directives tells us that both the "Content-Type" http-equiv attribute and the distinct charset attribute are valid HTML, however, it seems to insist on the use of UTF-8 as the only allowed character encoding. https://html.spec.whatwg.org/multipage/semantics.html#charset goes further and states "To enforce the above rules, authoring tools must default to using UTF-8 for newly-created documents." On the other hand, a quick read[3] of https://encoding.spec.whatwg.org/#preface suggests that the WHATWG acknowledges that there will be pages with legacy encodings forever, and offer some direction for how browsers should cope with them. It seems the the powers that be want a UTF-8-only future. There is a nod to keep things working for legacy content. But the practical consequence of this direction is a shrinking body of non-UTF-8 pages on the web, and consequently a declining quantity of non-UTF-8 examples to test and qualify web browsers with (a web browser that doesn't work with 50% of web pages won't ship, but one that doesn't work with 0.01% of web pages might still squeak out into the world). https://archive.is/6eNfW tells us that as of today, 96.3% of web pages are UTF-8, versus 1.5% for ISO-8859-1. What does this mean for sites with legacy content, such as the Telecom Digest? Depends on what is on the page: * Since ASCII is a subset of ISO-8859-1 and UTF-8, pages that are ASCII only will work regardless, and will work until long after something comes along to deprecate ASCII and UTF-8. That will happen approximately never. * The remaining Telecom Digest pages that use non-ASCII ISO-8859-1 characters may encounter a problem in the future where some random browser was implemented by someone who has no concept of non-UTF-8 charsets. But given that the standards body believes in supporting legacy charsets indefinitely, I'd file a bug with said browser maker telling them to fix their product (the alternative is to do iconv conversion on the server and serve everything as UTF-8. But that doesn't solve the problem for all legacy web pages, just yours). [1] Before today I actually never gave thought about the meta tag for this purpose. This was a learning experience for me, too. [2] I don't know if this behavior is applicable to all browsers, and I don't know what charset browsers default to if neither HTTP header or meta tag are available. It wouldn't be impossible to find out empirically, of course. [3] Please read it more thoroughly, I may have missed something that would suggest a completely different conclusion! -- Alex Pennace, alex at pennace.org
- References:
- [Discuss] Please help with a PHP and/or Apache2 problem
- From: malassimilation at gmail.com (Bill Horne)
- [Discuss] Please help with a PHP and/or Apache2 problem
- From: alex at pennace.org (Alex Pennace)
- [Discuss] Please help with a PHP and/or Apache2 problem
- From: malassimilation at gmail.com (Bill Horne)
- [Discuss] Please help with a PHP and/or Apache2 problem
- Prev by Date: [Discuss] Please help with a PHP and/or Apache2 problem
- Next by Date: [Discuss] Boston Linux VIRTUAL Meeting Wednesday, February 17, 2021 - What's New With CentOS
- Previous by thread: [Discuss] Please help with a PHP and/or Apache2 problem
- Next by thread: [Discuss] Boston Linux VIRTUAL Meeting Wednesday, February 17, 2021 - What's New With CentOS
- Index(es):