SEO Friendly ExpressionEngine Templates Part 2
- Published: Nov 17, 2011 - Tags: expressionengine seoSo in Part 1 I went over the different ways of setting up your Meta Descriptions and title tags in ExpressionEngine. In Part 2 I go over cleaning up the different problems you might have with 404 not found pages and redirecting incorrect URIs to the correct page using your htaccess file. I'll also go over the basics of cacheing your media in your htaccess file as well.
If you haven’t read Part 1 where I go over setting up Titles and Meta Descriptions for all your templates, you can read that here.
Checking For Broken Links
So if you are ever working on a live site or just published a new site, there is a good chance that some broken links may have been crawled by google. And with the way ExpressionEngine URLs are set up, a link that may not be correct can still display content. Which means these incorrect URLs will be indexed by Google. To fix this problem you will have to redirect these links before Google will remove them.
Even if you are not working actively on a live site, it is good to check Google Webmaster Tools somewhat frequently. Your templates might not have any problems but there may be broken links in your entries that you are not aware of.
Go to your Webmaster Tools and check under: Diagnostics > Crawl Errors
This list includes all the 404 and 500 errors on your site. If the Linked From number is one or two than it is probably a broken link in one of your entries. Hopefully you don’t but if you have Linked From numbers of 5+ pages listed, then there is probably a link in one of your templates that is incorrect. If that is the case you need to fix it as soon as possible. The error list will continue to get longer as you add content to your website.
Redirect URLs With Your .htaccess File
Now that you have your list of errors it is time to fix them. If the link is inside an entry then go ahead and make the fix. If the error is appearing on multiple pages, then it is probably an error inside one of your templates. In this case open the template and make the fix.
Once you have fixed all the broken links in your templates and entries, it is time to get into your htaccess file and redirect those broken links. Any broken link to an internal page you will want to make a redirect for. Google will continue to crawl this broken link until you make the redirect so make sure to do so.
redirect 301 /the/broken/link-goes-here/ http://yoursite.com/the/correct/url-here/
It is really that simple. On one line type “redirect 301” a space and then the broken link. Note, don’t include the domain just the URI segments. After the broken link add a space and include the full path to the correct URL. Do this for all of your 404 and 500 errors.
Continue to check Webmaster tools to make sure additional broken links or the same broken links don’t appear again. When your site gets crawled again, no new errors should show up on the list.
Broken Links In Search Results
If you had problems in your templates there is a good chance Google has indexed some incorrect URLs. Because ExpressionEngine will display a page even if the URL is incorrect Google will index it if a link is pointing at it.
On this site I had a problem in one of my templates where I wrote a URL /blog/posts instead of /blog/post. Needless to say, Google indexed a bunch of incorrect URLs. It is easy to see these problems when you look at your HTML Suggestions. There will be a number of duplicate Meta and Title tags. It is essential that you fix these problems right away or else you can end up with tens and potentially hundreds of broken links being indexed.
Check Your Search Results
Easiest way to check your sites links in Google is to search “site:yourdomain.com”. Most likely any broken links will be at the end of your results, so go to your last page of search results and check those links first.
Removing Pages From Google
The redirects in your htaccess file will keep people from visiting those incorrect links but to remove them from Google permanently go to Site Configuration > Crawler Access and click on the Remove URL tab. Enter all the search results you want to get rid of. Broken links are an obvious one but you may also want to get rid of all pagination pages because they hold no real value to searchers and contain duplicate content.
Keep Pagination Pages From Being Indexed
To keep pagination pages from being indexed you need to create a NoIndex Meta Tag which looks like:
<meta name="robots" content="noindex" />
You need to be very careful when using these in your templates because it can be easy to accidentally keep Google from indexing lots of pages instead of the ones you intended. Here is how I use it in my homepage template.
{if segment_1 != ""}<meta name="robots" content="noindex" />{/if}
I use a conditional to hide the meta tag on the homepage but if there is something in segment_1 (a pagination page) the meta tag is displayed, and prevents Google from indexing it.
Here is a good description of why you want to remove pagination pages from search results http://www.seomoz.org/blog/how-to-deal-with-pagination-duplicate-content-issues
Robot Text File
Robot text files are used to hide entire directories from search engine crawlers. I don’t do a lot with robot text files but I do use them to hide my miscellaneous directories like my javascript and css directories. To do so simply type:
User-Agent: Googlebot
Disallow: /directory-to-hide/
Cacheing With Your .htaccess File
This is more of a speed improvement than a SEO improvement but those kind of go hand in hand, so I’ll go over it quickly. I basically use the .htaccess file from the HTML Boilerplate, just with the time lengths changed.
# Perhaps better to whitelist expires rules? Perhaps.
ExpiresDefault "access plus 1 month"
# cache.appcache needs re-requests
# in FF 3.6 (thx Remy ~Introducing HTML5)
ExpiresByType text/cache-manifest "access plus 0 seconds"
# Your document html
ExpiresByType text/html "access plus 1 hour"
# Data
ExpiresByType text/xml "access plus 0 seconds"
ExpiresByType application/xml "access plus 0 seconds"
ExpiresByType application/json "access plus 0 seconds"
# RSS feed
ExpiresByType application/rss+xml "access plus 1 hour"
# Favicon (cannot be renamed)
ExpiresByType image/x-icon "access plus 1 week"
# Media: images, video, audio
ExpiresByType image/gif "access plus 1 year"
ExpiresByType image/png "access plus 1 year"
ExpiresByType image/jpg "access plus 1 year"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType video/ogg "access plus 1 month"
ExpiresByType audio/ogg "access plus 1 month"
ExpiresByType video/mp4 "access plus 1 month"
ExpiresByType video/webm "access plus 1 month"
# HTC files (css3pie)
ExpiresByType text/x-component "access plus 1 month"
# Webfonts
ExpiresByType font/truetype "access plus 1 month"
ExpiresByType font/opentype "access plus 1 month"
ExpiresByType application/x-font-woff "access plus 1 month"
ExpiresByType image/svg+xml "access plus 1 month"
ExpiresByType application/vnd.ms-fontobject "access plus 1 month"
# CSS and JavaScript
ExpiresByType text/css "access plus 1 year"
ExpiresByType application/javascript "access plus 1 year"
ExpiresByType text/javascript "access plus 1 year"
In Part 3 I will go over the 404 template and how to create tighter controls for your URLs.