Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

Archive /

Our old forums are still available as a read-only archive.

Moderators: martimiz, Sean, biapar, Willr, Ingo, simon_w

Dealing with robots.txt requests?


Reply


2 Posts   2928 Views

Avatar
3dkiwi

Community Member, 17 Posts

4 March 2008 at 11:43am

Guys

Just a quick one I hope. Have started to migrate a SilverStripe installation from a test bench system to a live system and have noticed that search engine requests for the /robots.txt file are being returned a 404 not found in the web server access log files.

With the redirection, rewrites and other things that mangle the request on a SilverStripe installation, what is the best way to handle requests for the robots.txt file and what is a good list of URLs to deny on a SilverStripe installation?

Cheers
3d

Avatar
3dkiwi

Community Member, 17 Posts

8 March 2008 at 11:44am

Edited: 08/03/2008 11:47am

My research into this:

Placing the robots.txt file in the root directory of the silverstripe installation seems to work.
ie. a ls listing should show something like:

.
.
.
cms/
robots.txt
mysite/
tutorials/
favicon.ico
.
.
.

The common Disallows would be

/images/
/page-not-found/

Also a Sitemap directive eg.

Sitemap:

http://www.example.com/sitemap.xml

Check out: http://www.sitemap.org for more details on this file that SilverStripe automatically generates for you.

So a good starting point might be:

---

# Show the way to the site map file
Sitemap: http://www.example.co.nz/sitemap.xml

# LinkWalker knows where to go.
User-agent: LinkWalker
Disallow: /

# Archive.org was hammering site - Now knows where to get off
User-agent: archive.org_bot
Disallow: /

User Agent: *
Disallow: /images/
Disallow: /page-not-found/

---

Cheers
3d