Blogging

Increase WordPress Website Speed for Free (using .htaccess and robots.txt)


I run this websites and a few others, including my more popular Amateur Traveler website on a bluehost.com account. I started to run into problems with performance as my site started to push past 2000 page views a day, especially when I would get bursts of traffic from sites like StumbleUpon. I started to get issues where the CPU on my website was “throttled” as much as half of the day. I investigated more expensive hosting options but eventually found I could get better performance from my WordPress hosted site for free using a few small changes to two files: .htaccess and robots.txt.
Before you implement these changes I would also recommend using a caching plugin like WP Super Cache or W3 Total Cache.

.htaccess

Most hosting websites use a free webserver program called Apache to run their websites. With Apache, you can change the local configuration for a site using a file called .htaccess which lives in the top directory of your WordPress site. It should be in the same directory as wp-config.php.
Caching
The first set of changes to add to this file are a series of commands that make certain static files like images save for longer in the user’s browser. Setting these commands will allow fewer and fewer downloads of files from your server as a user returns to your site. There is a risk here that as you need to change an image, for instance, you will need to rename the image file so that a user’s browsers will download it promptly. So you might not want to increase the caching time on your CSS and javascript files if you plan on changing them constantly.

# Expire images header
ExpiresActive on
ExpiresDefault "access plus 7 day"
ExpiresByType image/gif "access plus 3 months"
ExpiresByType image/jpg "access plus 3 months"
ExpiresByType image/png "access plus 3 months"
ExpiresByType image/jpeg "access plus 3 months"
ExpiresByType text/javascript "access plus 1 month"
ExpiresByType application/x-javascript "access plus 1 month"
ExpiresByType text/css "access plus 7 days"
ExpiresByType text/html "access plus 1 day"
ExpiresByType text/plain "access plus 1 day"
ExpiresByType application/x-shockwave-flash "access plus 3 months"

Spiders
The way that search engines work is that they send a program (called a “bot” or a “spider”) to grab the contents of your website periodically. You can make these bots more efficient by using a plugin like Google XML Sitemaps (or WPMU Google Sitemap on a multi-site configuration) to create a file with a map of all of the pages of your site. Normally you can communicate with these programs via a file called robots.txt which we will deal within a minute.
I did find that one spider from the Chinese site Baidu was ignoring the content of my robots.txt file so I am blocking that spider via the following lines of code in the .htaccess file. The reason that I am blocking Baidu is that I found that the spider was hitting my site a thousand times a day but only sending me maybe 6 page views a day of user traffic. Note that access from spiders will not show up in Google Analytics so you will have to look at your webserver logs to see what kind of traffic you are receiving.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]
RewriteRule ^.* - [F,L]

Robots.txt

The file robots.txt should also be in the top-level directory of your website. It tells the robots/bots/spiders how they should deal with your website. It tells them what content they should and what content they should not spider. The less content they spider then the less load on your system. So what you want to do is tell the spiders not to look at content that you do not want to show up on Google.
When you install a sitemap creating plugin mentioned above then you can tell the various bots about where they can find your sitemap in your robots.txt file. You can also tell Google via the Google Webmasters tool.

Sitemap: http://chris2x.com/sitemap.xml.gz

These commands block the comments form, directories that should be hidden, trackback’s and other dynamic pages.

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

Another bot that I found that was spidering my sites a lot, but not sending any appreciable traffic was the Majestic12 bot which is part of an open-source search project. These lines block that bot. You could put other bots here but if, for instance, you block the Google bot then you are saying you don’t want traffic from Google.
User-agent: www.majestic12.co.uk
Disallow:

You can also slow down how often various bots will crawl your site. These two lines tell Google and MSN not to crawl my pages more than once every 1 minute and every 5 minutes respectively.
User-agent: msnbot
Crawl-delay: 300
User-agent: Googlebot
Crawl-delay: 60

With just these changes I found that I could get more performance from my website for what it was intended, serving pages to real users.

Author: chris2x

One man's view of life in Silicon Valley from Chris Christensen - a podcaster, blogger, programmer, entrepreneur

2 Comments

  1. Lately I’ve had the problem of not getting visited enough and indexed by the Google bot, let alone me blocking it, but I’ll definitely implement a couple of your suggestions. No need to have some bots eat up my bandwidth in exchange for nothing.

Leave a Reply

Your email address will not be published. Required fields are marked *