Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] web file download



As Tom mentioned, these headers are designed to help control cache by various clients that your web pages may go through.

The http spec: http://www.w3.org/Protocols/rfc2616/rfc2616.html defines the headers.

Here is the first line for each of the headers in question:
Expires - The Expires entity-header field gives the date/time after which the response is considered stale.
Pragma - The Pragma general-header field is used to include implementation- specific directives that might apply to any recipient along the request/response chain.
Cache-Control - The Cache-Control general-header field is used to specify directives that MUST be obeyed by all caching mechanisms along the request/response chain.

I'd not heard of Pragma: public, I've only used Pragma: no-cache, but apparently someone has a use for it: http://stackoverflow.com/questions/1920781/what-does-the-http-header-pragma-public-mean

In a web app I work on with others (perl based), when we want to prevent cache we do:
                $r->header_out('Expires' => "Mon, 26 Jul 1997 05:00:00 GMT");
                $r->header_out('Last-Modified' => time2str(time));
                $r->header_out('Cache-Control' => " no-store, no-cache, must-revalidate, post-check=0, pre-check=0");
                $r->header_out('Pragma' =>  "no-cache");

And, if its an http document, we add in:
                push @metaData, "<META HTTP-EQUIV=\"cache-control\" CONTENT=\"no-cache\">\n";
                push @metaData, "<META HTTP-EQUIV=\"pragma\" CONTENT=\"no-cache\">\n";
                push @metaData, "<META HTTP-EQUIV=\"expires\" CONTENT=\"0\">\n";
Which is really redundant.

When we send a file we use the following headers (again in perl) in addition to the cache headers:
        $r->content_type($contentType);
        $r->header_out("Accept-Ranges", "bytes");
        $r->header_out("Content-Length", $fileSize);
        $r->header_out("Content-disposition","$attachmentType; filename=" . $file_title . "." . $ext);

Where:
$contentType is either "application/unknown" or an appropriate type for an inline download (like application/msword for a .doc file)
$fileSize is the size of the attachment
$attachementType is either: attachment or inline (I do inline for doc,dock,ppt,xls and pdf files).
$file_title is the name of the file
$ext is the extension

As Nuno mentioned wireshark is a really good tool to know how to use. As a web developer you may also want to consider use something Live HTTP Headers for firefox as it will give you the headers going to the server and coming back to the client without having to set up a full trace. If you install that you can go download some files from web sources and see what headers they are giving you back.

-John


On Feb 29, 2012, at 2:37 AM, Tom Metro wrote:

Stephen Adler wrote:
I'm writing a web application which downloads files using PHP.

That's a vague statement, but I gather from the context that you are
generating an HTTP response document, and you want it to trigger the
requesting browser to prompt the user to save a file.


The most comprehensive set of header commands I've seen to initiate a
file download is the following...

header('Content-Description: File Transfer');
header('Content-Type: application/octet-stream');
header('Content-Disposition: attachment;filename="'.$Filename.'"');
header('Content-Transfer-Encoding: binary');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');
header('Content-Length: '.$FileSize);
header('X-SendFile: '.$StorageDirectory.'/'.$UUID);
exit();

Why are you exiting? Per your headers, the content of the file should
follow.

(Those generally look like the correct headers, but I couldn't say if
you missed something without looking it up.)


I'm wondering about the 'Expires:', 'Cache-Control:', and 'Pragma:'
headers. What are they needed for and how do they make the transfer
work better.

These are all attempts to get the browser or a proxy in the middle to
not cache the returned content. (Not everyone adhered to the same
standard, so this sort of multiple header kitchen sink approach is common.)

If the content is actually unchanged (i.e. not dynamically generated),
then you don't need and shouldn't include those headers.

-Tom

--
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/
_______________________________________________
Discuss mailing list
Discuss at blu.org<mailto:Discuss at blu.org>
http://lists.blu.org/mailman/listinfo/discuss




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org