The robots.txt file is a directive file that tells search crawlers what they can and cannot access. It should always exist on the domain (or subdomain) root and be accessible from the web. Search crawlers will normally try to download your robots.txt file before any other pages on the site; if the file cannot be found or contains errors, the search crawler will delay its crawl of the site until it can successfully process the robots file.

Robots.txt URL:


The robots.txt file is split into sections based on the robot’s user-agent. Only the most specific section for a bot applies – googlebot will ignore the directives for all robots if a section of googlebot-specific directives exists.


Syntax Description
User-agent The user agent substring that the next set of commands apply to. i.e. ‘googlebot’, ‘googlebot-images’. (‘*’ means all robots.)
Disallow and Allow Disallow (or allow) the bot from accessing specific files or directories.
NoIndex (unofficial) NoIndex is supported unofficially by Google. It allows for URL patterns to be deindexed via robots.txt
Sitemap A reference to the XML sitemap.
* The wildcard character can be used within the URI, /search/*?text= for example.


User-agent: *
Disallow: /admin
Disallow: /files/secret-plans.pdf

User-agent: googlebot
Disallow: /admin
Disallow: /files/secret-plans.pdf
Disallow: /dont-show-to-google
Disallow: /*?search=

User-agent: googlebot-images
Disallow: /admin


Uses of robots.txt

Control crawling (Disallow:)

Robots.txt is most commonly used to prevent search engine crawlers from attempting to access certain parts of a website.

  • Preventing access to pages which are dynamically generated (such as search result pages), low value/duplicated
  • Preventing access to development and staging environments
  • Stopping search crawlers from accessing crawler traps (such as a never-ending calendar)
  • Preventing crawling of potentially insecure files (such as the WordPress plugins folder)

Control indexing (NoIndex:)

While there is very little written about it, our empirical studies show that Google unofficially supports the “NoIndex” robots.txt directive. This is distinct from a page-level noindex (which exists in the HTML head or HTTP header) and allows SEOs to easily encourage the removal of URLs from the index en masse without forcing a crawler to individually visit all pages.


User-agent: *
NoIndex: /path/*?*sortby=