robots.txt
A plain-text file at the root of a domain that instructs crawlers which pages they may or may not access.
robots.txt uses the Robots Exclusion Protocol (REP) to specify crawl rules. A Disallow rule prevents crawlers from fetching those URLs, but does not remove them from the index — a disallowed page can still appear in search results if other pages link to it. To fully remove a page, combine robots.txt Disallow with a noindex meta tag (or use a noindex HTTP header).
rules are matched per crawler using User-agent directives. Common crawlers include Googlebot, Bingbot, AhrefsBot, and GPTBot (OpenAI). A single file can contain multiple user-agent blocks with different rules. The Sitemap: directive at the end of the file points crawlers to your XML sitemap.
Common mistakes: blocking /wp-admin/ but also accidentally blocking /wp-content/uploads/ (which blocks image indexing), disallowing CSS/JS files that Google needs to render pages, and using robots.txt to attempt to remove pages from the index (use noindex instead).