Web Crawlers & Bots Directory

Complete reference for web crawlers, AI bots, and SEO spiders. Learn each bot's purpose, user-agent string, robots.txt behavior, and whether allowing or blocking it helps your SEO.

Search EngineAI / LLMSEO ToolSocial MediaArchiveSecurity

AI / LLM

GPTBotOpenAI's crawler for collecting training data for its GPT language models. Content crawled…Respects robots.txtOpenAIChatGPT-UserOpenAI's bot used when ChatGPT browses the web in real-time to answer user questions. Unli…Respects robots.txtOpenAIClaudeBotAnthropic's crawler for collecting web content used in Claude AI training and web retrieva…Respects robots.txtAnthropicPerplexityBotPerplexity AI's crawler that indexes content for use in its AI-powered search engine. Cont…Respects robots.txtPerplexity AICCBotCommon Crawl's crawler that builds open datasets used to train many major AI models includ…Respects robots.txtCommon Crawl

SEO Tool

SemrushBotSemrush's crawler used for powering its SEO tools: backlink analysis, keyword research, si…Respects robots.txtSemrushAhrefsBotAhrefs' crawler used for building its backlink index and powering its SEO research tools. …Respects robots.txtAhrefsMJ12botMajestic's web crawler used for building its backlink index (Trust Flow, Citation Flow). O…Respects robots.txtMajestic

Social Media

facebookexternalhitMeta's bot that fetches Open Graph metadata when a URL is shared on Facebook or Instagram.…Ignores robots.txtMetaTwitterbotX's (formerly Twitter) crawler that fetches Twitter Card metadata when URLs are shared in …Ignores robots.txtX (Twitter)LinkedInBotLinkedIn's crawler that fetches Open Graph and Twitter Card metadata when URLs are shared …Ignores robots.txtLinkedIn

Archive

ia_archiverThe Internet Archive's crawler for saving snapshots of web pages to the Wayback Machine (w…Respects robots.txtInternet Archive
Test how bots see your site
Check your robots.txt rules and see which bots are blocked or allowed
Robots.txt Tester →AI Visibility →