[Discuss] Please help with a PHP and/or Apache2 problem

Alex Pennace alex at pennace.org
Sun Feb 7 21:58:06 EST 2021


On Sun, Feb 07, 2021 at 04:31:05PM -0500, Bill Horne wrote:
> On 2/7/2021 1:13 PM, Alex Pennace wrote:
> > try this:
> > 
> > <?php
> > header('Content-Type: text/html; charset=ISO-8859-1');
> > include "[redacted]/archives/back.issues/recent.single.issues/V40-38";
> > ?>
> 
> 
> Alex, that cured the problem. THANK YOU for you help and expertise.

No problem!

> The archived html file contains a line which I thought would send the same
> info:
> 
> <html>
> 
> 	<head>
> 
> 	<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=iso-8859-1">
> 
> Please tell me if the use of "META" tags is deprecated, and why it works on
> files which aren't included via PHP code. I'd also like to know where the
> "utf-8" response from Apache2 is set.

Answering the questions out of order:

Where is the "utf-8" default set: I believe it would be somewhere in
PHP itself. Note that the non-PHP URL (presumably served directly by
Apache) does not have a "Content-Type" HTTP header at all. Which leads
us to:

What is going on with meta tags: Evidently[1], the browsers we are using
use the Content-Type HTTP header's value for the charset, else use the
Content-Type meta tag in the HTML[2]. The non-PHP URLs don't have a
Content-Type HTTP header, so the browser goes by the meta tag. The PHP
URLs did set a Content-Type header, which overrides the meta tag.

A debate on Stack Overflow seems to yield an answer of using both HTTP
header and meta tag (naturally, both should be the same value):
https://stackoverflow.com/questions/9417024/response-header-vs-meta-tag

https://html.spec.whatwg.org/multipage/semantics.html#pragma-directives
tells us that both the "Content-Type" http-equiv attribute and the
distinct charset attribute are valid HTML, however, it seems to insist
on the use of UTF-8 as the only allowed character encoding.
https://html.spec.whatwg.org/multipage/semantics.html#charset goes
further and states "To enforce the above rules, authoring tools must
default to using UTF-8 for newly-created documents." On the other
hand, a quick read[3] of https://encoding.spec.whatwg.org/#preface
suggests that the WHATWG acknowledges that there will be pages with
legacy encodings forever, and offer some direction for how browsers
should cope with them.

It seems the the powers that be want a UTF-8-only future. There is a
nod to keep things working for legacy content. But the practical
consequence of this direction is a shrinking body of non-UTF-8 pages
on the web, and consequently a declining quantity of non-UTF-8
examples to test and qualify web browsers with (a web browser that
doesn't work with 50% of web pages won't ship, but one that doesn't
work with 0.01% of web pages might still squeak out into the
world). https://archive.is/6eNfW tells us that as of today, 96.3% of
web pages are UTF-8, versus 1.5% for ISO-8859-1.

What does this mean for sites with legacy content, such as the Telecom
Digest? Depends on what is on the page:

* Since ASCII is a subset of ISO-8859-1 and UTF-8, pages that are
  ASCII only will work regardless, and will work until long after
  something comes along to deprecate ASCII and UTF-8. That will happen
  approximately never.
* The remaining Telecom Digest pages that use non-ASCII ISO-8859-1
  characters may encounter a problem in the future where some random
  browser was implemented by someone who has no concept of non-UTF-8
  charsets. But given that the standards body believes in supporting
  legacy charsets indefinitely, I'd file a bug with said browser maker
  telling them to fix their product (the alternative is to do iconv
  conversion on the server and serve everything as UTF-8. But that
  doesn't solve the problem for all legacy web pages, just yours).

[1] Before today I actually never gave thought about the meta tag for
this purpose. This was a learning experience for me, too.

[2] I don't know if this behavior is applicable to all browsers, and I
don't know what charset browsers default to if neither HTTP header or
meta tag are available. It wouldn't be impossible to find out
empirically, of course.

[3] Please read it more thoroughly, I may have missed something that
would suggest a completely different conclusion!

-- 
Alex Pennace, alex at pennace.org


More information about the Discuss mailing list