Website Speed Optimization and Waterfall Diagram

Waterfall diagram is useful to find out how browser spending time loading page components. There is a an opensource tool built for IE called AOL Pagetest. It has online version where we can check sites via connections in USA and we can define connection speed.

I’ve prepared a sample page: http://webscalingblog.com/waterfall/1/, let’s look how we can optimize it.

Waterfall diagram of loading this page using IE7 (clickable):

We see that page contains 3 CSS components, 3 Javascript and 7 images.
There are two orange spot on the diagram – the browser opened two connections and was ready to download two components in parallel.
Green part of bars – time to first byte – it’s the HTTP request overhead. The overhead here is 150-200ms on each HTTP request. We see much of green color on the diagram.

Look at 2.css, 2.js and 3.js – the browser spent most of time sending request and waiting rather than downloading.

Especially important is Time to Start Render which is shown as green vertical line. We see that it took 2 seconds to start render this pretty simple page.
2 seconds the browser downloaded HTML, CSS and JS components.

As first step I’ve joined the CSS files and the JS files into 1 CSS file and 1 JS file:
http://webscalingblog.com/waterfall/2/

We see it gets better but browser still spending time on the Javascript, rendering cannot be started until browser downloaded and executed Javascript. And Javascript is blocking downloading of next components.

Next example shows what we get after moving Javascript to the bottom of page (let’s suppose that in this case we can do it).

http://webscalingblog.com/waterfall/3/

It becomes pretty much better, Time to Start Rendering was improved. Javascript is loading after the all other components loaded.
But still, the browser has to download CSS, there is green bar (wait time) after HTML downloading is done and the CSS component consumes one downloading slot.

Now let’s try to embed CSS into HTML.

http://webscalingblog.com/waterfall/4/

Finally, Time to Start Render was reduced dramatically. Now it’s 0.8s instead of 2s we had at the beginning.

Note that after embeding CSS we increased download volume for cached page views, because browser now has to download CSS within HTML. To avoid this, we can use the technique with cookie to guess cache state. It was described in the previous post.

Going from waterfall3 to waterfall4 we didn’t improve Full Load Time. So if we would use some benchmark to measure Full Load Time we probably would not notice any difference. While from end-user point of view there was significant improvement in response time.

At the moment I don’t know any benchmark (besides AOL Pagetest) that’s able to measure Time to Start Render. There is Firebug 1.3 Beta for Firefox which is able to do waterfall diagrams and which shows time to DOMContentReady and most time we can use it, but actually it isn’t the same, I noticed that moving Javascript to the bottom doesn’t improve DOMContentReady time while actually browser starts rendering earlier (I tested it with a big Javascript file)

And last step, if we want to improve Full Load Time and if we can use CSS Image Sprites here, we can join images into one bigger image.

http://webscalingblog.com/waterfall/5/
(Sorry, I’ve simply included the file and didn’t use CSS positioning)

It reduced Full Load Time from 2.4s to 2s. And we had only 7 images – usually sites contain much more and improvement will be more significant.
CSS Sprites were described in one of previous posts.

13 comments ↓

#1 John Laur on 11.10.08 at 6:03 pm

Instead of CSS Sprites (which I personally despise) you could host images (and CSS/JS for that matter) on separate hostnames but the same webserver (via CNAMEs) to improve download concurrency. It’s not a particularly good technique if you use heavy apache processes to serve both your dynamic and static content, but paired with a lightweight frontend server (ngnix, lighttpd, pound) it will help tremendously without having to jack around with page layout and the horrible junk that is CSS sprites.

#2 Nail on 11.11.08 at 12:09 am

John,
Yes, for this case we could use several hostnames to load images in parallel.
It’s not always applicable though, I mean there are sites with 50+ images and we are limited with DNS requests – they can slow down page loading. Plus browser will have to establish 2 connection per hostname.

#3 frederic sidler on 11.11.08 at 9:32 pm

firebug with yslow and the 10 yahoo rules is also worth reading.

you can also use yuicompressor to concatenate JS and CSS

you can gzip these files too (make sure you have zip and non-zip version in case the client browser does not support it)

You denfinitely should use a version number in your JS and CSS files and set the cache never to expire so that you will never retrieve these files again if they have not changed.

Last, you can use a CDN to host these files so that your server will never be hit.

#4 zuborg on 11.19.08 at 3:54 pm

Nice article

But nowadays is more common limit 6-8 connections per host, IE8 will have 8 too (as far as I know).

http://performance.webpagetest.org:8080/ is nice tool too, but I would also recommend to try this one speedmeter – http://Site-Perf.com/
It’s highly customizable, supports a lot of features like Keep-Alive, HTTP-Compression and HTTP-Auth…
Also there is tool to measure packetloss level on internet link of you server.

#5 Nail on 11.29.08 at 1:11 pm

frederic,
Thanks for the addition, sure there are more things like server-side gzip compression, CDN, etc.
In this post I wanted to describe page composition issues and how to use tools to look how browsers load pages and how to measure performance from end-user point of view.

#6 Nail on 11.29.08 at 1:13 pm

zuborg,
IE7 is still most popular browser and it makes only 2 connections per host. According to http://stevesouders.com/ua/ in IE8 it will be 6 connections per host which is definitely will help performance.
Though page compositions will be still actual since pages usually have much more components and because of HTTP request cost.
About Site-Perf.com, it’s a very good tool, also it would be good to have some presets for Link control, something like GPRS connection, Europe ADSL, etc.

#7 Patrick Meenan on 11.29.08 at 1:49 pm

Site-Perf looks interesting but the browser “emulation” apprears to be pretty lacking. Testing http://www.aol.com only requests the base page (there are actually over 100 requests for the page) and http://www.yahoo.com only requests the base page and 4 images (also supposed to be a lot larger).

There really is no substitute for using a real browser. Even the best emulators I have seen (from companies that sell the service) don’t quite get it right.

If there’s something you’d like to see in the online version of pagetest just shoot me a note and I’ll see what I can do (there is a contact me link on the site).

#8 zuborg on 12.11.08 at 7:31 pm

2Patrick: Btw, yahoo.com looks at User-Agent header, for unknown UAs it shows special simple page with little number of images. For http://www.aol.com Site-Perf.com fetch 53 objects.

You are right that only browser is able to show what happens there when browser do some work, thus FireBug Firefox addon is irreplaceable for webdevelopers. The only problem is that FireBug and similar tools relies on internet link of those who use it ;)

#9 Patrick Meenan on 03.23.09 at 6:57 pm

The online version of pagetest that was used in the article runs the tests through a real IE browser at one of the hosted locations (Virginia and New Zealand) so you’re not reliant on the developer’s connection (which is often very fast and has low latency to the servers). That’s actually the main reason I bothered even putting up the site because I couldn’t find any (free) tools that weren’t running simulated browsers.

aol.com was re-launched recently which probably accounts for the difference in count but it didn’t get any lighter – it’s now at around 130 requests :( . The only other hosted tool I’m aware of that uses a real browser is Keynote. They also have some free testing that you can run tests from 5 different locations around the world (though at backbone speeds which tend to differ enough from end-user connections that you want to look at both).

#10 compound bows on 12.03.09 at 11:47 am

Using parallel images using host names is not always reliable. However, I agree with your point of view to some extent.

#11 Arne on 04.18.10 at 11:18 pm

Ahh, gotta love the warm, fuzzy internationalization/globalization/democracy of the internets. The result is that half the tech articles online are written by people who have a first language other than English, making this obscure, arcane, esoteric field even more difficult to understand.

#12 Ramoonus on 07.14.10 at 1:11 pm

The pictures in the article aren`t working

#13 Alfa on 08.31.10 at 8:49 am

Thank for website speed optimization tutorial.

Leave a Comment