Manual installation of Magento robots.txt
The Magento robots.txt file we use a lot on websites is one that is around on the net since January 2010. I would love to credit the creator of the file, but the original forum post on Magento is not available anymore.
Implementation is easy. Just copy the content below, paste it in a new file called robots.txt, change the location of the sitemap.xml and upload the file into the root of your website. Be sure to upload it to the root even if your Magento installation is in a subdirectory. Search engines will only read robots.txt in the root of a website.
We commented out the allow of catalogsearch/results. This is because we use Google CSE for our Magento shops. Read our previous blogpost on how to implement Google CSE on your Magento shop. Since we want to let the search engines index the images of our products we set /media/catalog to Allow and Disallow the rest of the directories in /media.
Use an extension for robots.txt instead
If you’re more comfortable working from Magento backend instead of changing files by hand there is an extension you can use to generate a Magento robots.txt file. Robots.txt management tool and can be downloaded via Magento connect. Out of the box this module can generate a robots.txt file for Magento. Via the settings you can alter some main options. After that go to CMS >> Robots.txt >> Manage and install some standard rules for Magento. You might want to change some standard settings. I would like to allow the search engines to index /media/catalog/. Out of the box this rule is not present therefor you have to add it as a new rule. When you’re done setting up the rules click the button to generate robots.txt
Sometimes it might take a while for search engines to read the changed Magento robots.txt. Using Google Webmasters Tools you can see when Google indexed your robots.txt for the last time. Google should download robots.txt every 24 hours or after 100 visits. If you want Google or other search engines to get the updated version sooner you can use Header Cache-Control in your .htaccess file. Copy the statement below into your .htaccess file.
With this statement you tell that all .txt files will expire after 60 seconds and require the user to download the file again. Depending on how often Google crawls your site it will notice an outdated robots.txt and downloads a fresh copy. Increase the max-age once you notice that Google uses the new robots.txt and set it back to 60 seconds the day before you’re going to change the file again. It will save resources.
Update 05-11-2013: Link naar Robots Management Tool aangepast. – broken link –
Update 18-09-2012: naar aanleiding van blogpost Kennisartikel: serverload verlagen Byte hebben we de Magento robots.txt aangepast. URL-parameters welke we eerst blokkeerden worden nu weer toegelaten. Het is aan Google Webmasters tools om deze te blokkeren.
Update 07-06-2013: naar aanleiding van blogpost robots.txt in a multi domain website wil ik graag melden dat het super eenvoudig is om robots.txt in te zetten in een multi domein website.
source of image: www.sxc.hu/photo/1171276