Magento robots.txt

Magento comes without robots.txt functionality. It can be useful to add one yourself to tell the search engines where they are not allowed to index. It will hide your javascript files, hide SID parameters and prevent some duplicate content. It will help your SEO process and reduces resources on your server. In this blogpost I explain you how to set your own Magento robots.txt using an existing example and using an extension. Both solutions are easy to handle. 

Manual installation of Magento robots.txt

The Magento robots.txt file we use a lot on websites is one that is around on the net since January 2010. I would love to credit the creator of the file, but the original forum post on Magento is not available anymore.

Implementation is easy. Just copy the content below, paste it in a new file called robots.txt, change the location of the sitemap.xml and upload the file into the root of your website. Be sure to upload it to the root even if your Magento installation is in a subdirectory. Search engines will only read robots.txt in the root of a website.

 

We commented out the allow of catalogsearch/results. This is because we use Google CSE for our Magento shops. Read our previous blogpost on how to implement Google CSE on your Magento shop. Since we want to let the search engines index the images of our products we set /media/catalog to Allow and Disallow the rest of the directories in /media.

Use an extension for robots.txt instead

If you’re more comfortable working from Magento backend instead of changing files by hand there is an extension you can use to generate a Magento robots.txt file. Robots.txt management tool and can be downloaded via Magento connect. Out of the box this module can generate a robots.txt file for Magento. Via the settings you can alter some main options. After that go to CMS >> Robots.txt >> Manage and install some standard rules for Magento. You might want to change some standard settings. I would like to allow the search engines to index /media/catalog/. Out of the box this rule is not present therefor you have to add it as a new rule. When you’re done setting up the rules click the button to generate robots.txt

Reindex robots.txt

Sometimes it might take a while for search engines to read the changed Magento robots.txt. Using Google Webmasters Tools you can see when Google indexed your robots.txt for the last time. Google should download robots.txt every 24 hours or after 100 visits. If you want Google or other search engines to get the updated version sooner you can use Header Cache-Control in your .htaccess file. Copy the statement below into your .htaccess file.

With this statement you tell that all .txt files will expire after 60 seconds and require the user to download the file again. Depending on how often Google crawls your site it will notice an outdated robots.txt and downloads a fresh copy. Increase the max-age once you notice that Google uses the new robots.txt and set it back to 60 seconds the day before you’re going to change the file again. It will save resources.

Update 30-10-2014: Adjusted robots.txt -> Don’t block CSS, Javascript and other resource files by default. This prevents Google bot from properly rendering the page and understanding that it’s optimized for mobile. Explained by Matt Cutts in SMX Advanced 2014

Update 05-11-2013: Link naar Robots Management Tool aangepast. – broken link –

Update 18-09-2012: naar aanleiding van blogpost Kennisartikel: serverload verlagen Byte hebben we de Magento robots.txt aangepast. URL-parameters welke we eerst blokkeerden worden nu weer toegelaten. Het is aan Google Webmasters tools om deze te blokkeren.

Update 07-06-2013: naar aanleiding van blogpost robots.txt in a multi domain website wil ik graag melden dat het super eenvoudig is om robots.txt in te zetten in een multi domein website.

source of image: www.sxc.hu/photo/1171276

 op

Hans Kuijpers is Joomla! en Magento specialist. Zijn eerste website maakte hij in 1995 tijdens zijn studie Technische Bedrijfskunde en sindsdien zijn er tientallen meer verschenen. Met de totstandkoming van deze Byte blog mag hij zichzelf ook wel een ver gevorderde WordPress ontwikkelaar noemen. Kennis delen en plezier hebben in het leven is wat Hans wil.