<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Web Scaling Blog &#187; Caching</title>
	<atom:link href="http://www.webscalingblog.com/category/caching/feed" rel="self" type="application/rss+xml" />
	<link>http://www.webscalingblog.com</link>
	<description>Everything about web scaling and high availability</description>
	<lastBuildDate>Fri, 21 May 2010 13:31:12 +0000</lastBuildDate>
	
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Caching HTTP Headers, Last-Modified and ETag</title>
		<link>http://www.webscalingblog.com/performance/caching-http-headers-last-modified-and-etag.html</link>
		<comments>http://www.webscalingblog.com/performance/caching-http-headers-last-modified-and-etag.html#comments</comments>
		<pubDate>Sun, 12 Oct 2008 06:58:16 +0000</pubDate>
		<dc:creator>Nail</dc:creator>
				<category><![CDATA[Caching]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://www.webscalingblog.com/?p=13</guid>
		<description><![CDATA[What if we cannot predict lifetime of page content? If we have a page with info that changes unpredictably we still can use browser cache to avoid unneeded traffic.
Using validation mechanism browser sends HTTP request with info about cache entry and server can respond that the content wasn&#8217;t changed.
There are two validation methods: one is [...]]]></description>
			<content:encoded><![CDATA[<p>What if we cannot predict lifetime of page content? If we have a page with info that changes unpredictably we still can use browser cache to avoid unneeded traffic.<br />
Using validation mechanism browser sends HTTP request with info about cache entry and server can respond that the content wasn&#8217;t changed.<br />
There are two validation methods: one is based on Last-Modified and the other is based on Etag.<br />
<span id="more-13"></span><br />
<strong>Last-Modified</strong><br />
Server sends <strong>Last-Modified</strong> header with datetime value that means the time when content was changed last time.<br />
<code>Cache-Control: must-revalidate<br />
Last-Modified: 15 Sep 2008 17:43:00 GMT<br />
</code></p>
<p>The first header <strong>Cache-Control: must-revalidate</strong> means that browser must send validation request every time even if there is already cache entry exists for this object.<br />
Browser receives the content and stores it in the cache along with the last modified value.<br />
Next time browser will send additional header:<br />
<code>If-Modified-Since: 15 Sep 2008 17:43:00 GMT</code></p>
<p>This header means that browser has cache entry that was last changed 17:43.<br />
Then server will compare the time with last modified time of actual content and if it was changed server will send the whole updated object along with new Last-Modified value.</p>
<p>If there were no changes since the previous request then there will be short empty-body answer:<br />
<code>HTTP/1.x 304 Not Modified</code></p>
<p>And browser will use the cached content.</p>
<p>What if server doesn&#8217;t send <strong>Cache-Control: must-revalidate</strong>?<br />
Then modern browsers look at profile setting or decide on their own whether to send conditional request. So we better to send Cache-Control to make sure that browser sends conditional request.</p>
<p>Sample PHP code:<br />
<code><br />
$last_modified_ts = floor(mktime()/30)*30;<br />
if (<br />
    isset($_SERVER['HTTP_IF_MODIFIED_SINCE']) &#038;&#038;<br />
    strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE']) >= $last_modified_ts<br />
   )<br />
{<br />
  header('HTTP/1.1 304 Not Modified');<br />
  exit;<br />
}<br />
header('Cache-Control: must-revalidate');<br />
header('Last-Modified: '.gmdate('d M Y H:i:s',$last_modified_ts).' GMT');<br />
echo date('d M Y H:i:s');<br />
</code></p>
<p>This example outputs cached datetime that&#8217;s expiring every 30 seconds.</p>
<p>Last-Modified suits good in case we can easily calculate modification time of page content.</p>
<p>We must be careful with Last-Modified if we have page content that consists many fragments. We should calculate Last-Modified value of every fragment and get the latest one.<br />
Note that if we have authentication and there is a page fragment that depends on authentication we have to reset Last-Modified value after login/logout &#8211; for every page that contains the fragment.<br />
Also note that in case of several web servers we should make sure that Last-Modified value changes synchronous for all the servers.</p>
<p><strong>ETag</strong><br />
This method suits for cases when it&#8217;s difficult to maintain Last-Modified value: when you have complicated application with many page fragments especially if there are third-party libraries. Or for the case with authentication, when page content depends on authentication info.<br />
There is simple idea besides <strong>ETag</strong>:<br />
ETag value depends on the content and must be different for different content and the same for the same content.</p>
<p>Sample usage of ETag header:<br />
<code>$content = floor(mktime()/30)*30;<br />
$etag = md5($content);<br />
if (isset($_SERVER['HTTP_IF_NONE_MATCH']) &#038;&#038; $_SERVER['HTTP_IF_NONE_MATCH'] == $etag)<br />
{<br />
  header('HTTP/1.1 304 Not Modified');<br />
  exit;<br />
}<br />
header('Cache-Control: must-revalidate');<br />
header('ETag: '.$etag);<br />
echo $content;<br />
echo '<br />Request time: '.date('d M Y H:i:s');<br />
</code></p>
<p>In this example content changes every 30 seconds and browsers will download only if the content was changed.</p>
<p><strong>Static Content and Unnecessary ETag Header</strong><br />
For static content it&#8217;s recommended to send <strong>Cache-Control: max-age=&#8230;</strong> header with higher max-age value. In this case browser won&#8217;t send any request on normal page views.<br />
So for static content there is no use of ETag header.<br />
The worse case is in case of web servers cluster as ETag value differs for the file on different servers.<br />
For Lighttpd server you can disable Etag using<br />
<code>static-file.etags = 'disable'</code><br />
in lighttpd.conf</p>
<p>Disabling ETag in Apache:<br />
<code>Header unset ETag<br />
FileETag None</code></p>
<p>Note that we still want <strong>Last-Modified</strong> header for static files. If user presses Refresh button, then browser will send conditional request and server will respond “304 Not Modified”.<br />
And if you disable both Last-Modified and ETag browser will have to download the whole content again when user presses Refresh.</p>
<p>Lighttpd and Apache will send Last-Modified if you have configured mod_expires.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webscalingblog.com/performance/caching-http-headers-last-modified-and-etag.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Caching HTTP Headers, Cache-Control: max-age</title>
		<link>http://www.webscalingblog.com/performance/caching-http-headers-cache-control-max-age.html</link>
		<comments>http://www.webscalingblog.com/performance/caching-http-headers-cache-control-max-age.html#comments</comments>
		<pubDate>Fri, 19 Sep 2008 17:57:01 +0000</pubDate>
		<dc:creator>Nail</dc:creator>
				<category><![CDATA[Caching]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://www.webscalingblog.com/?p=11</guid>
		<description><![CDATA[Caching speeds up repeated page views and saves a lot of traffic by preventing downloading of unchanged content every page view.
We can use Cache-Control: max-age=&#8230; to inform browser that the component won&#8217;t be changed for defined period. This way we avoid unneeded further requests if browser already has the component in its cache and therefore [...]]]></description>
			<content:encoded><![CDATA[<p>Caching speeds up repeated page views and saves a lot of traffic by preventing downloading of unchanged content every page view.<br />
We can use <strong>Cache-Control: max-age=&#8230;</strong> to inform browser that the component won&#8217;t be changed for defined period. This way we avoid unneeded further requests if browser already has the component in its cache and therefore primed-cache page views will be performed faster.<br />
Modern browsers able to cache static files even without any cache control headers using some heuristic methods but they will do it more efficient if we define caching headers implicitly.<br />
<span id="more-11"></span><br />
For Apache2 you can enable max-age using <a href="http://httpd.apache.org/docs/2.0/mod/mod_expires.html">mod_expires</a>:<code><br />
ExpiresActive On<br />
ExpiresByType image/gif "access plus 1 month"<br />
ExpiresByType image/png "access plus 1 month"<br />
ExpiresByType image/jpeg "access plus 1 month"<br />
ExpiresByType text/css "access plus 1 month"<br />
ExpiresByType text/javascript "access plus 1 month"<br />
ExpiresByType application/x-javascript "access plus 1 month"<br />
ExpiresByType application/x-shockwave-flash "access plus 1 month"<br />
</code></p>
<p>For Lighttpd there is <a href="http://trac.lighttpd.net/trac/wiki/Docs%3AModExpire">mod_expire</a> module. Enable it in server.modules section:<br />
<code>server.modules = (<br />
...<br />
"mod_expire",<br />
...<br />
)<br />
Then add following directives for directories with static files:<br />
$HTTP["url"] =~ "^/images/" {<br />
expire.url = ( "" => "access 30 days" )<br />
}<br />
</code></p>
<p>Max-age for Nginx server can be enabled using <a href="http://sysoev.ru/nginx/docs/http/ngx_http_headers_module.html">ngx_http_headers_module</a>:<br />
<code>expires max;</code></p>
<p>Now web server sends the caching header for static files:<br />
<code>Cache-Control: max-age=2592000</code></p>
<p>In case of design change we should prevent using outdated content that browsers have in their caches. This can be done by adding file versions to filenames:<br />
<code>script.js -> script1.js -> script2.js -> ... etc</code></p>
<p><strong>Cache-control: max-age</strong> can be useful also when we output HTML. Imagine pages generated by PHP that changed not so often, once per day or even longer. But browsers still have to download HTML every page view.<br />
We can improve it by sending max-age value in PHP.<br />
<code>header('Cache-Control: max-age=28800');</code></p>
<p>This way we set desirable cache lifetime to 8 hours. Now if someone is clicking a link for second time within 8 hours period he gets the page instantly.</p>
<p>Max-age also helps to make proxy servers more efficient. We can easily organize transparent server-side caching by adding proxy server to web frontend.</p>
<p>Note that there is not easy case if pages have content that changes often and that&#8217;s relevant.<br />
For example, there can be difficulties in caching pages with login form that transforms into some box with «Hello username» after user login or if there are user comments, the user who posted commentary will not see it. Because we cannot ask browser to destroy cache entry, it will still get the old page from cache.<br />
The solution can be using Javascript to generate login box (requires enabled Javascript). If we set a cookie after user logged in, we can check it on client-side and generate suitable content for the logged in user. This way the content will be the same from server side view and can be cached.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webscalingblog.com/performance/caching-http-headers-cache-control-max-age.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
