17488 Posts in 4473 Topics by 1978 members
|
Page:
1
|
Go to End | |
| Author | Topic: | 2652 Views |
-
Dealing with robots.txt requests?

4 March 2008 at 11:43am
Guys
Just a quick one I hope. Have started to migrate a SilverStripe installation from a test bench system to a live system and have noticed that search engine requests for the /robots.txt file are being returned a 404 not found in the web server access log files.
With the redirection, rewrites and other things that mangle the request on a SilverStripe installation, what is the best way to handle requests for the robots.txt file and what is a good list of URLs to deny on a SilverStripe installation?
Cheers
3d -
Re: Dealing with robots.txt requests?

8 March 2008 at 11:44am Last edited: 8 March 2008 11:47am
My research into this:
Placing the robots.txt file in the root directory of the silverstripe installation seems to work.
ie. a ls listing should show something like:.
.
.
cms/
robots.txt
mysite/
tutorials/
favicon.ico
.
.
.The common Disallows would be
/images/
/page-not-found/Also a Sitemap directive eg.
Sitemap:
Check out: http://www.sitemap.org for more details on this file that SilverStripe automatically generates for you.
So a good starting point might be:
---
# Show the way to the site map file
Sitemap: http://www.example.co.nz/sitemap.xml# LinkWalker knows where to go.
User-agent: LinkWalker
Disallow: /# Archive.org was hammering site - Now knows where to get off
User-agent: archive.org_bot
Disallow: /User Agent: *
Disallow: /images/
Disallow: /page-not-found/---
Cheers
3d
| 2652 Views | ||
|
Page:
1
|
Go to Top |

