robots.txt

A plain-text file at the root of a domain that instructs crawlers which pages they may or may not access.

robots.txt uses the Robots Exclusion Protocol (REP) to specify crawl rules. A Disallow rule prevents crawlers from fetching those URLs, but does not remove them from the index — a disallowed page can still appear in search results if other pages link to it. To fully remove a page, combine robots.txt Disallow with a noindex meta tag (or use a noindex HTTP header).

rules are matched per crawler using User-agent directives. Common crawlers include Googlebot, Bingbot, AhrefsBot, and GPTBot (OpenAI). A single file can contain multiple user-agent blocks with different rules. The Sitemap: directive at the end of the file points crawlers to your XML sitemap.

Common mistakes: blocking /wp-admin/ but also accidentally blocking /wp-content/uploads/ (which blocks image indexing), disallowing CSS/JS files that Google needs to render pages, and using robots.txt to attempt to remove pages from the index (use noindex instead).

Test this on your site

Check robots.txt issues on any URL — free, no signup

Robots.txt Tester →

Related SEO Terms

Crawl Budget

The number of pages Googlebot will crawl on your site within a given t…

Noindex

A directive that instructs search engines not to include a page in the…

XML Sitemap

A structured XML file listing your site's important URLs to help searc…

Crawling

The process by which search engine bots systematically browse the web …

← All SEO Terms·Technical SEO terms →