Sunday, September 05, 2010

.Htaccess rewrites, Mod_Rewrite Tricks and Tips

Mod_Rewrite Tips and Tricks are lot of .htaccess rewrite examples that show specific uses for creating .htaccess rewrites to do all kinds of cool and profitable stuff for your site. Htaccess Rewrites are enabled by using the Apache module mod_rewrite, which is one of the most powerful Apache modules and features availale. Htaccess Rewrites through mod_rewrite provide the special ability to Rewrite requests internally as well as Redirect request externally.
When the url in your browser’s location bar stays the same for a request it is an internal rewrite, when the url changes an external redirection is taking place. This is one of the first, and one of the biggest mental-blocks people have when learning about mod_rewrite… But I have a secret weapon for you to use, a new discovery from years of research that makes learning mod_rewrite drastically quicker and easier. It truly does or I wouldn’t be saying so in the introduction of this article.
Despite the tons of examples and docs, mod_rewrite is voodoo.
Damned cool voodoo, but still voodoo.
Brian Moore
Note: After years of fighting to learn my way through rewriting urls with mod_rewrite, I finally had a breakthrough and found a way to outsmart the difficulty of mod_rewrite that I just couldn’t seem to master. The Mod_Rewrite RewriteCond/RewriteRule Variable Value Cheatsheet is the one-of-a-kind tool that changed the game for me and made mod_rewriting no-harder than anything else.
So keep that mod_rewrite reference bookmarked and you will be able to figure out any RewriteRule or RewriteCond, an amazing feat considering it took me a LONG time to figure this stuff out on my own. But that was before the craziness, one of the most challenging and productive .htaccess experiments I’ve done… An experiment so ILL it’s sick like a diamond disease on your wrist! $$$. That mod_rewrite experiment/tutorial was the culmination of many different advanced mod_rewrite experiments I had done in the past and included most of my very best .htaccess tricks. With the cheatsheet it’s no longer Voodoo.. Its just what you do. Now lets dig in!

If you really want to take a look, check out the mod_rewrite.c and mod_rewrite.h files.
Be aware that mod_rewrite (RewriteRule, RewriteBase, and RewriteCond) code is executed for each and every HTTP request that accesses a file in or below the directory where the code resides, so it’s always good to limit the code to certain circumstances if readily identifiable.
For example, to limit the next 5 RewriteRules to only be applied to .html and .php files, you can use the following code, which tests if the url does not end in .html or .php and if it doesn’t, it will skip the next 5 RewriteRules.

RewriteRule !\.(html|php)$ - [S=5]
RewriteRule ^.*-(vf12|vf13|vf5|vf35|vf1|vf10|vf33|vf8).+$ - [S=1]
Options +FollowSymLinks
 
RewriteEngine On
RewriteBase /
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.askapache\.com$ [NC]
RewriteRule ^(.*)$ http://www.askapache.com/$1 [R=301,L]
Sometimes your rewrites cause infinite loops, stop it with one of these rewrite code snippets.
RewriteCond %{REQUEST_URI} ^/(stats/|missing\.html|failed_auth\.html|error/).* [NC]
RewriteRule .* - [L]
 
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule .* - [L]
This is probably my favorite, and I use it on every site I work on. It allows me to update my javascript and css files in my visitors cache’s simply by naming them differently in the html, on the server they stay the same name. This rewrites all files for /zap/j/anything-anynumber.js to /zap/j/anything.js and /zap/c/anything-anynumber.css to /zap/c/anything.css
RewriteRule ^zap/(j|c)/([a-z]+)-([0-9]+)\.(js|css)$ /zap/$1/$2.$4 [L]
When you use flash on your site and you properly supply a link to download flash that shows up for non-flash aware browsers, it is nice to use a shortcut to keep your code clean and your external links to a minimum. This code allows me to link to site.com/getflash/ for non-flash aware browsers.
RewriteRule ^getflash/?$ http://www.adobe.com/shockwave/download/download.cgi?P1_Prod_Version=ShockwaveFlash [NC,L,R=307]
On many sites, the page will be displayed for both page.html and page.html?anything=anything, which hurts your SEO with duplicate content. An easy way to fix this issue is to redirect external requests containing a query string to the same uri without the query_string.
RewriteCond %{THE_REQUEST} ^GET\ /.*\;.*\ HTTP/
RewriteCond %{QUERY_STRING} !^$
RewriteRule .* http://www.askapache.com%{REQUEST_URI}? [R=301,L]
This .htaccess rewrite example invisibly rewrites requests for all Adobe pdf files to be handled by /cgi-bin/pdf-script.php
RewriteRule ^(.+)\.pdf$  /cgi-bin/pdf-script.php?file=$1.pdf [L,NC,QSA]
For sites using multiviews or with multiple language capabilities, it is nice to be able to send the correct language automatically based on the clients preferred language.
RewriteCond %{HTTP:Accept-Language} ^.*(de|es|fr|it|ja|ru|en).*$ [NC]
RewriteRule ^(.*)$ - [env=prefer-language:%1]
This allows access to all files by php fopen, but denies anyone else.
RewriteEngine On
RewriteBase /
RewriteCond %{THE_REQUEST} ^.+$ [NC]
RewriteRule .* - [F,L]
If you are looking for ways to block or deny specific requests/visitors, then you should definately read Blacklist with mod_rewrite. I give it a 10/10
This can be very handy if you want to serve media files or special downloads but only through a php proxy script.
RewriteEngine On
RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+)/.*\ HTTP [NC]
RewriteRule .* - [F,L]
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^askapache\.com$ [NC]
RewriteRule ^(.*)$ http://askapache.com/$1 [R=301,L]
Uses a RewriteCond Directive to check QUERY_STRING for passkey, if it doesn’t find it it redirects all requests for anything in the /logged-in/ directory to the /login.php script.
RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} !passkey
RewriteRule ^/logged-in/(.*)$ /login.php [L]
If the QUERY_STRING has any value at all besides blank than the?at the end of /login.php? tells mod_rewrite to remove the QUERY_STRING from login.php and redirect.
RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} .
RewriteRule ^login.php /login.php? [L]
An error message related to this isRequest exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.or you may seeRequest exceeded the limit,probable configuration error,Use 'LogLevel debug' to get a backtrace, orUse 'LimitInternalRecursion' to increase the limit if necessary
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule .* - [L]
RewriteRule ^(.*)\.php$ /$1.html [R=301,L]
Redirects all files that end in .html to be served from filename.php so it looks like all your pages are .html but really they are .php
RewriteRule ^(.*)\.html$ $1.php [R=301,L]
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
# If the hour is 16 (4 PM) Then deny all access
RewriteCond %{TIME_HOUR} ^16$
RewriteRule ^.*$ - [F,L]
Converts all underscores “_” in urls to hyphens “-” for SEO benefits… See the full article for more info.
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
 
RewriteRule !\.(html|php)$ - [S=4]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5 [E=uscor:Yes]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4 [E=uscor:Yes]
RewriteRule ^([^_]*)_([^_]*)_(.*)$ $1-$2-$3 [E=uscor:Yes]
RewriteRule ^([^_]*)_(.*)$ $1-$2 [E=uscor:Yes]
 
RewriteCond %{ENV:uscor} ^Yes$
RewriteRule (.*) http://d.com/$1 [R=301,L]
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.[a-z-]+\.[a-z]{2,6} [NC]
RewriteCond %{HTTP_HOST} ([a-z-]+\.[a-z]{2,6})$     [NC]
RewriteRule ^/(.*)$ http://%1/$1 [R=301,L]
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} \.([a-z-]+\.[a-z]{2,6})$ [NC]
RewriteRule ^/(.*)$ http://%1/$1 [R=301,L]
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} \.([^\.]+\.[^\.0-9]+)$
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} ^/feed\.gif$
RewriteRule .* - [L]
 
RewriteCond %{HTTP_USER_AGENT} !^.*(FeedBurner|FeedValidator) [NC]
RewriteRule ^feed/?.*$ http://feeds.feedburner.com/apache/htaccess [L,R=302]
 
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_METHOD} !^(GET|PUT)
RewriteRule .* - [F]
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?askapache.com/.*$ [NC]
RewriteRule \.(gif|jpg|swf|flv|png)$ /feed/ [R=302,L]
RewriteEngine On
SetEnvIfNoCase X-Forwarded-For .+ proxy=yes
SetEnvIfNoCase X-moz prefetch no_access=yes
 
# block pre-fetch requests with X-moz headers
RewriteCond %{ENV:no_access} yes
RewriteRule .* - [F,L]
This module uses a rule-based rewriting engine (based on a regular-expression parser) to rewrite requested URLs on the fly. It supports an unlimited number of rules and an unlimited number of attached rule conditions for each rule, to provide a really flexible and powerful URL manipulation mechanism. The URL manipulations can depend on various tests, of server variables, environment variables, HTTP headers, or time stamps. Even external database lookups in various formats can be used to achieve highly granular URL matching.
This module operates on the full URLs (including the path-info part) both in per-server context (httpd.conf) and per-directory context (.htaccess) and can generate query-string parts on result. The rewritten result can lead to internal sub-processing, external request redirection or even to an internal proxy throughput.
Further details, discussion, and examples, are provided in the detailed mod_rewrite documentation.
If you aren’t already comfortable using mod_rewrite then I recommend this excellent mod_rewrite guide by one of my favorite mod_rewrite gurus that I’ve met.


Thursday, September 02, 2010

mod rewrite

what is mod_rewrite for?

Simply, mod_rewrite is used for rewriting a URL at the server level, giving the user output for that final page. So, for example, a user may ask for http://www.somesite.com/widgets/blue/, but will really be given http://www.somesite.com/widgets.php?colour=blue by the server. Of course, the user will be none the wiser to this little bit of chicanery. 

What do I need to get mod_rewrite working?

There’s pretty much only one thing you’ll need to get mod_rewrite working for you, and that’s to have the mod_rewrite module installed on your Apache server!
For the purpose of this article, I’m going to assume that you don’t have access to view or edit the Apache server httpd.conf file, so the easiest way to check whether the mod_rewrite module is installed will be to look on your phpinfo page. If you’ve not already created one of these for yourself, just copy and paste the following code into an new text file using your favourite text editor, save it as phpinfo.php, and upload it to your server:
Load that page up in your web browser, and perform a search for “mod_rewrite”. All being well, you’ll find it in the “Apache loaded modules” section of the page. If it isn’t there, you’ll have to contact your hosting company and politely ask them to add it to the Apache configuration.
Assuming the mod_rewrite module is loaded, then you’re good to go!

A simple mod_rewrite example

So, let’s write a simple mod_rewrite example. This isn’t going to be anything fancy; we’re just going to redirect people who ask for alice.html to the page bob.html instead. First, let’s create the Alice and Bob pages. Below is Alice’s webpage - create a similar one for Bob.
(title)Alice's webpage(/title)
   
(html)
    (head)
       (title)Second mod_rewrite example(/title)
    (/head)
    (body)
      

The requested page was:

(/body) (/html)
Upload both of these to your web server, and check that you can view both of them. Now comes the fun - we’re going to add a couple of lines to your .htaccess file. The .htaccess file is a text file which contains Apache directives. Any directives which you place in it will apply to the directory which the .htaccess file sits in, and any below it. To ours, we’re going to add the following:
RewriteEngine on
RewriteRule ^alice.html$ bob.html
Upload this .htaccess file to the same directory as alice.html and bob.html, and reload Alice’s page. You should see Bob’s page being displayed, but Alice’s URL. If you still see Alice’s page being displayed, then check you’ve followed the instructions correctly (you may have to clear your cache). If things still aren’t working for you, then contact your technical support people and ask them to enable mod_rewrite and the FileInfo override in their httpd.conf file for you

The structure of a RewriteRule

RewriteRule Pattern Substitution [OptionalFlags]
The general structure of a RewriteRule is fairly simple if you already understand regular expressions. This article isn’t intended to be a tutorial about regular expressions though - there are already plenty of those available. RewriteRules are broken up as follows:
RewriteRule
This is just the name of the command.
Pattern
A regular expression which will be applied to the “current” URL. If any RewriteRules have already been performed on the requested URL, then that changed URL will be the current URL.
Substitution
Substitution occurs in the same way as it does in Perl, PHP, etc. You can include backreferences and server variable names (%{VARNAME}) in the substitution. Backreferences to this RewriteRule should be written as $N, whereas backreferences to the previous RewriteCond should be written as %N. A special substitution is -. This substitution tells Apache to not perform any substitution. I personally find that this is useful when using the F or G flags (see below), but there are other uses as well.
OptionalFlags
This is the only part of the RewriteRule which isn’t mandatory. Any flags which you use should be surrounded in square brackets, and comma separated. The flags which I find to be most useful are:
  • F - Forbidden. The user will receive a 403 error.
  • L - Last Rule. No more rules will be proccessed if this one was successful.
  • R[=code] - Redirect. The user’s web browser will be visibly redirected to the substituted URL. If you use this flag, you must prefix the substitution with http://www.somesite.com/, thus making it into a true URL. If no code is given, then a HTTP reponse of 302 (temporarily moved) is sent.
A full list of flags is given in the Apache mod_rewrite manual.

A slightly more complicated mod_rewrite example

Let’s try a slightly more meaty example now. Suppose you have a web page which takes a parameter. This parameter tells the page how to be displayed, and what content to pull into it. Humans don’t tend to like remembering the additional syntax of query strings for URLs, and neither do search engines. Both sets of people seem to much prefer a straight URL, with no extra bits tacked onto the end.
In our example, you’ve created a main index page with takes a page parameter. So, a link like index.php?page=software would take you to a software page, while a link to index.php?page=interests would take you to an interests page. What we’ll do with mod_rewrite is to silently redirect users from page/software/ to index.php?page=software etc.
The following is what needs to go into your .htaccess file to accomplish that:
RewriteEngine on
RewriteRule ^page/([^/\.]+)/?$ index.php?page=$1 [L]
Let’s walk through that RewriteRule, and work out exactly what’s going on:
^page/
Sees whether the requested page starts with page/. If it doesn’t, this rule will be ignored.
([^/.]+)
Here, the enclosing brackets signify that anything that is matched will be remembered by the RewriteRule. Inside the brackets, it says “I’d like one or more characters that aren’t a forward slash or a period, please”. Whatever is found here will be captured and remembered.
/?$
Makes sure that the only thing that is found after what was just matched is a possible forward slash, and nothing else. If anything else is found, then this RewriteRule will be ignored.
index.php?page=$1
The actual page which will be loaded by Apache. $1 is magically replaced with the text which was captured previously.
[L]
Tells Apache to not process any more RewriteRules if this one was successful.
Let’s write a quick page to test that this is working. The following test script will simply echo the name of the page you asked for to the screen, so that you can check that the RewriteRule is working.
(title)Second mod_rewrite example(/title)
   
   
      
         The requested page was:
         
      
Again, upload both the index.php page, and the .htaccess file to the same directory. Then, test it! If you put the page in http://www.somesite.com/mime_test/, then try requesting http://www.somesite.com/mime_test/page/software. The URL in your browser window will show the name of the page which you requested, but the content of the page will be created by the index.php script! This technique can obviously be extended to pass multiple query strings to a page - all you’re limited by is your imagination.

Conditional Statements and mod_rewrite

But what happens when you start getting people hotlinking to your images (or other files)? Hot linking is the act of including an image, media file, etc from someone else’s server in one of your own pages as if it were your own. Obviously, as a webmaster, there are plenty of times when you don’t want people doing that. You’ll almost certainly have seen examples where someone has linked to one image on a website, only for a completely different, “nasty” one to be shown instead. So, how is this done?
It’s pretty simple really. All it takes are a couple of RewriteCond statements in your .htaccess file.
RewriteCond statements are as they sound - conditional statements for RewriteRules. The basic format for a RewriteCond is RewriteCond test_string cond_pattern. For our purpose, we will set the test_string to be the HTTP_REFERER. If the test string is neither empty nor our own server, then we will serve an alternative (low bandwidth) image, which tells the person who is hotlinking off for stealing our bandwidth.
Here’s how we do that:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?somesite.com/.*$ [NC]
RewriteRule \.(gif|jpg|png)$ http://www.somesite.com/nasty.gif [R,L]
Here, the RewriteRule will only be performed if all the preceeding RewriteConds are fulfilled. In the second RewriteCond, [NC] simply means “No Case”, so it doesn’t matter whether the domain name was written in upper case, lower case or a mixture of the two. So, any requests for gif, jpg or png files from referers other than somesite.com will result in your “nasty” image being shown instead.
The [R,L] in the RewriteRule simply means “Redirect, Last”. So, the RewriteRule will visibly redirect output to “nasty.gif” and no more RewriteRules will be performed on this URL.
If you simply don’t want the hot linkers to see any image at all when they hot link to your images, then simply change the final line to RewriteRule \.(gif|jpg|png)$ - [F]. The - means “don’t rewrite the requested URL”, and the [F] means “Forbidden”. So, the hot linker will get a “403 Forbidden message”, and you don’t end up wasting your bandwidth.

Conclusion

mod_rewrite is an incredibly handy tool to have in your arsenal. This article only scratched the surface of what is possible with mod_rewrite, but should have given you enough information to go out and start mod_rewriting history yourself!

Wednesday, September 01, 2010

Search Engine Friendly URLs with WAMP & Apache

It has been a very hot topic for a long time. The Friendly URLs have taken the Internet by storm, mostly because the expanding use of Ruby on Fai..Rails. :) Today we are going to look at how we can create friendly URLs using the Apache HTTP server found in the WAMP software installation. It will actually be very simple! Depending on your environment and knowledge we are first looking at how we can add the support to use friendly URLs in WAMP and a quick look on how we can do it without the WAMP software. If you want to use the WAMP software you can download it below:
  • Download the WAMP software

We are going to start looking at the WAMP software first. After you have downloaded and installed the WAMP software you should find a short cut item on your Desktop. Click on that, in the right corner of your screen you should see some sort speed limit panel turning red, yellow and lastly white. If it doesn’t turn white something is wrong with the setup. If you haven’t changed anything from the default settings when installing the WAMP software. The activate the module making it possible to create friendly URLs we simple do this:
  1. Left click on the WAMP icon in the right corner of the screen.
  2. Move you mouse over the ‘Apache’ link.
  3. In the popup menu, move your mouse over the ‘Apache Modules’ link.
  4. Move down in the list by pressing the down arrow image until you find rewrite_module.
  5. Click on the rewrite_module.
  6. The WAMP software should restart, if it isn’t do it your self be left clicking on the icon again and choose ‘Restart All Services’.
So now we have the rewrite_module activated in WAMP. To active the rewrite_module without WAMP you need to go to your Apache directory, or make a search with the keyword httpd.conf. That file contains a lot of config options for your Apache HTTP server. Inside that file, search for:
LoadModule rewrite_module modules/mod_rewrite.so
That module will have a # in front of it, remove that and save the httpd.conf file and restart your Apache server. This is how you activate the rewrite_module in Apache and not via the WAMP software. Seems rather easy, right?
Okay, so what we have done so far is to activate a module we are going to use when we want to write friendly URLs. Friendly URLs you say? What is that? In my 12 SEO Tips post I talked about friendly URLs and how they are much better for web spiders, visitors and for yourself. The result after turning your unfriendly URLs into friendly once are this:
We turn this dynamically created, unfriendly URLs:
http://www.yourdomain.com/users.php?id=12
Into this friendly URL:
http://www.yourdomain.com/users/12/
Does that look a lot better than the first one? Especially if your website is targeting people with a lack of Internet experience. The simpler and cleaner your URL look like, the better. We will even have the chance of enter a name.html and turn it into name.php without the visitor knowing anything about it. Why should we like to  do that? Once again, it’s better for the web spider and in the end your chance of being found in a search will expand.
So, the rewrite module activated and running, what do to next? We are going to create a file with the name .htaccess. This file is a configuration file working tightly with the Apache HTTP server.  This is how the Apache team describes it:
.htaccess files (or “distributed configuration files”) provide a way to make configuration changes on a per-directory basis. A file, containing one or more configuration directives, is placed in a particular document directory, and the directives apply to that directory, and all subdirectories thereof.
Inside this file we are able to create patterns which will match the incoming URL. If the URL match any of our patterns we are able to transform the incoming URL into something else. What does this mean? It means that you don’t have to change any PHP query strings code to transform your URLs. With the .htaccess file you are able to redirect your visitors to the correct page you want them to look at, but the address looks the same to them. Here is an example:
Your visitor writes:
http://www.yourdomain.com/users/12/
In your .htaccess file you do this with the incoming URL:
http://www.yourdomain.com/users.php?id=12
Looks like magic? It is! Naa, not really but it looks a lot nicer and, as I have told you several times now, it helps your search rankings.
The following will be an example of a .htaccess file. We will go through it line by line and see what it does.
1. RewriteEngine on
2. RewriteRule ^(/)?$ /index.php [L]
3. RewriteRule ^([^\/\.]+)\.html$ $1.php [L]
4. RewriteRule ^([a-z]+)/([0-9]+)$ /$1/$2/ [R]
5. RewriteRule ^([a-z]+)/([0-9]+)/$ /$1.php?id=$2
Line 1: Activates the rewrite module for this particular folder. This line is required for ALL .htaccess files.
Line 2: We rewrite http://www.yourdomain.com/ to http://www.yourdomain.com/index.php. The [L] at the end is telling the .htaccess file to stop rewriting if this pattern has been match.
Line 3: We rewrite all incoming .html files into .php. We do this because it’s easier for visitors to enter .html instead of .php.
Line 4. We rewrite http://www.yourdomain.com/users into http://www.yourdomain.com/users/. We just add the extra / at the end to say: “Hey, this is how we like it”. The [R] at the end is telling the .htaccess to continue the rewrite regardless if the pattern matched or not.
Line 5. Is rewriting http://www.yourdomain.com/users/12 into http://www.yourdomain.com/users.php?id=12.
Very simple. The pattern is using the cross-language script regexp, or regular expressions.  A good place to start looking at regexp if you haven’t heard about it before is the link below:

So by using the rewrite module found in the Apache HTTP server and creating a .htaccess file inside our main directory we can accomplish the friendly URLs task. Excited? I know I was the first time. Start right away!