Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

We've moved the forum!

Please use forum.silverstripe.org for any new questions (announcement).
The forum archive will stick around, but will be read only.

You can also use our Slack channel or StackOverflow to ask for help.
Check out our community overview for more options to contribute.

Archive /

Our old forums are still available as a read-only archive.

Moderators: martimiz, Sean, Ed, biapar, Willr, Ingo

Dealing with robots.txt requests?


Go to End


2 Posts   3350 Views

Avatar
3dkiwi

Community Member, 18 Posts

4 March 2008 at 11:43am

Guys

Just a quick one I hope. Have started to migrate a SilverStripe installation from a test bench system to a live system and have noticed that search engine requests for the /robots.txt file are being returned a 404 not found in the web server access log files.

With the redirection, rewrites and other things that mangle the request on a SilverStripe installation, what is the best way to handle requests for the robots.txt file and what is a good list of URLs to deny on a SilverStripe installation?

Cheers
3d

Avatar
3dkiwi

Community Member, 18 Posts

8 March 2008 at 11:44am

Edited: 08/03/2008 11:47am

My research into this:

Placing the robots.txt file in the root directory of the silverstripe installation seems to work.
ie. a ls listing should show something like:

.
.
.
cms/
robots.txt
mysite/
tutorials/
favicon.ico
.
.
.

The common Disallows would be

/images/
/page-not-found/

Also a Sitemap directive eg.

Sitemap:

http://www.example.com/sitemap.xml

Check out: http://www.sitemap.org for more details on this file that SilverStripe automatically generates for you.

So a good starting point might be:

---

# Show the way to the site map file
Sitemap: http://www.example.co.nz/sitemap.xml

# LinkWalker knows where to go.
User-agent: LinkWalker
Disallow: /

# Archive.org was hammering site - Now knows where to get off
User-agent: archive.org_bot
Disallow: /

User Agent: *
Disallow: /images/
Disallow: /page-not-found/

---

Cheers
3d