IndexingGoogle Search ConsoleTechnical SEO

How to Check If Your Website
Is Indexed by Google (7 Methods)

·13 min read·By SEOCheckPilot

You can't rank if you're not indexed. That sounds obvious, but you'd be surprised how often it isn't. A single misplaced noindex tag, a robots.txt misconfiguration, or a canonical pointing at the wrong URL can silently exclude a page from Google's index — and it looks identical to a page that just doesn't rank. Here are seven ways to tell the difference.

7 methods covered
  1. site: operator in Google Search
  2. URL Inspection in Google Search Console
  3. Google Search Console Coverage report
  4. Check for noindex tags
  5. Check robots.txt
  6. Canonical tag audit
  7. Crawl the site with Googlebot user agent

The indexing pipeline

Before checking whether a page is indexed, it helps to understand why it might not be. Getting a page into Google's index requires surviving four stages:

🔗DiscoverySitemap / linksStep 1🤖CrawledGooglebot fetchesStep 2⚙️ProcessedJS renderedStep 3Indexed!Appears in SearchStep 4Blocked at any step → not indexednoindex tag · robots.txt · canonical mismatch · crawl error · thin content
A page must survive all four stages to appear in Google Search. Blocks at any stage prevent indexing.

A page can fail at discovery (Googlebot never finds the URL), at crawling (Googlebot is blocked by robots.txt or a crawl error), at processing (JavaScript render fails, canonical is wrong, noindex is found), or at the indexing decision itself (Google decides the content isn't worth indexing — thin, duplicate, or low quality). The seven methods below let you check each of these failure points.

Method 1: site: operator in Google Search

The fastest way to get a rough picture of your indexed pages is a site: search in Google:

google search operators
# Check how many pages are indexed from your domain:
site:yourdomain.com

# Check if a specific page is indexed:
site:yourdomain.com/specific-page

# Check a section of your site:
site:yourdomain.com/blog/

If your domain appears in the results for the first search, you have indexed pages. If a specific page appears for the second search, that page is indexed. No results means Google hasn't indexed that URL — though the reason could be anything from “never crawled yet” to “actively excluded.”

Important
The site: operator is not precise. Google's official position is that the result count from site: searches is an approximation and not representative of the actual index size. Use it for directional checks, not for counting indexed pages.

Method 2: URL Inspection in Google Search Console

This is the most authoritative way to check a specific URL. URL Inspection tells you exactly what Google knows about the page: when it was last crawled, whether it's indexed, what canonical it resolved to, and any coverage issues detected.

To use it: open Google Search Console → any property → URL Inspection (top search bar or left sidebar). Paste any URL from your domain. The result will show one of several states:

  • URL is on Google: Indexed. Clicking through shows crawl date, sitemap status, and enhancements detected.
  • URL is not on Google — not indexed: The page has been crawled but excluded. The reason is shown: noindex, canonical mismatch, crawl anomaly, soft 404, duplicate content.
  • URL is not on Google — not crawled: Googlebot hasn't fetched this URL recently. May be new, may be too low-priority, may be blocked by robots.txt at the crawl stage.
  • URL has coverage issues: Something is wrong — the report will specify what.

URL Inspection also has a “Request Indexing” button. Using this for new or recently-updated pages typically results in a crawl within hours to a few days, significantly faster than waiting for the regular crawl schedule. Note: it doesn't guarantee indexing — it just schedules a crawl.

Live URL test
URL Inspection has a “Test Live URL” option that fetches the page right now as Googlebot and shows you what it sees — including JavaScript-rendered content. This is invaluable for diagnosing client-side rendering issues where the raw HTML is correct but the rendered page has problems.

Method 3: Search Console Coverage report

For a full-site view: Google Search Console → Indexing → Pages. This report categorizes every URL Google has encountered on your site into one of four buckets:

Indexed
In the index and eligible to rank
Not indexed
Crawled but excluded with reason
Crawled, not indexed
Seen but Google chose not to index
Discovered, not crawled
URL known but not yet fetched

The most actionable category is “Not indexed” with specific reasons. Click any reason to see which pages are affected. Common reasons and what they mean:

  • Excluded by 'noindex' tag: The page has a noindex directive. This is intentional if it's a thank-you page, but a problem if it's an important content page.
  • Duplicate, Google chose different canonical: Google found what it thinks is the canonical version at a different URL. Check whether your canonical tags are set correctly and whether there are redirect issues.
  • Crawled, currently not indexed: Google crawled it but decided not to include it. Often: thin content, near-duplicate of another page, or content that doesn't meet quality thresholds.
  • Page with redirect: The URL redirects to another. The redirect destination is what Google indexes.
  • Soft 404: The page returns a 200 status code but contains content that indicates “not found” — like “Page not found” or empty search results.

Method 4: Check for noindex tags

A noindex directive can exist in two places: the HTML meta tag, or an HTTP response header.

html / bash
<!-- Meta tag noindex (in HTML <head>): -->
<meta name="robots" content="noindex" />
<meta name="googlebot" content="noindex" />

# HTTP header noindex (check via curl):
curl -I https://yourdomain.com/page | grep -i x-robots

Many CMS platforms add noindex to certain page types by default — archives, tag pages, search results, author pages, pagination pages. In WordPress, this is controlled by your SEO plugin. In Next.js, it's set via export const metadata = { robots: { index: false } }.

SEOCheckPilot's full audit includes an indexability module that checks both the meta robots tag and the X-Robots-Tag response header, and shows you the current directives for any URL you audit.

Method 5: Check robots.txt

Your robots.txt file (at yourdomain.com/robots.txt) tells crawlers which paths they may or may not access. A Disallow: directive for a path prevents Googlebot from crawling URLs at that path — which means any noindex or canonical tags on those pages are never seen, and the pages won't be indexed.

robots.txt
# Typical robots.txt:
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /private/

# A problem: accidentally disallowing content you want indexed
User-agent: *
Disallow: /blog/  # This blocks ALL blog pages from crawling

# Another common mistake:
User-agent: *
Disallow: /  # This blocks ALL pages from crawling
Important
A critical thing many people get wrong: robots.txt blocks crawling, not indexing. If a URL is blocked by robots.txt but is linked to from external sites, Google may still add it to the index without crawling it — showing just the URL without any content. To prevent a URL from appearing in search entirely, you need noindex, not robots.txt disallow.

Method 6: Canonical tag audit

A canonical tag tells Google which URL is the “official” version of a page. If your canonical tag points to a different URL (especially if it points to a URL that isn't indexed itself), Google will index the canonical destination, not the page you're looking at.

html
<!-- Correct: canonical points to this page's own URL -->
<link rel="canonical" href="https://yourdomain.com/blog/this-article" />

<!-- Problem: canonical points to a different URL (maybe from copy/paste) -->
<link rel="canonical" href="https://yourdomain.com/blog/different-article" />

<!-- Serious problem: canonical points to HTTP when the page is HTTPS -->
<link rel="canonical" href="http://yourdomain.com/blog/this-article" />

Common canonical issues:

  • Canonical pointing to a redirect destination instead of the final URL
  • Canonical using HTTP on an HTTPS page
  • Canonical pointing to www. from a non-www. page or vice versa
  • Canonical pointing to a page that itself has a different canonical (chained canonicals)
  • CMS template generating incorrect canonical due to URL parameter issues

Method 7: Crawl with Googlebot user agent

Sometimes pages serve different content to Googlebot than to regular browsers — a technique called cloaking (which violates Google's guidelines) or simply a server misconfiguration. The Render Check tool compares the raw HTML response with the Googlebot user agent against the JavaScript-rendered DOM, showing any differences.

You can also test this yourself with curl:

bash
# Fetch as Googlebot:
curl -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"   https://yourdomain.com/page -s | grep -i "noindex|canonical|robots"

# Fetch as regular browser for comparison:
curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"   https://yourdomain.com/page -s | grep -i "noindex|canonical|robots"

If the two outputs differ in ways that affect SEO directives — different canonical, one has noindex and the other doesn't, different content — you have a rendering or cloaking issue to investigate.

How to speed up indexing for new pages

01

Submit via URL Inspection

Go to Google Search Console → URL Inspection → enter the URL → "Request indexing". This is the fastest signal you can send Google that a page exists and is ready.

02

Add to your XML sitemap

A sitemap is a list of URLs you want Google to crawl. New pages should be in your sitemap within hours of publishing. Make sure your sitemap is submitted in Search Console under Sitemaps.

03

Link from already-indexed pages

Internal links from high-authority pages that Googlebot crawls frequently are the fastest organic discovery path. Link to new content from your homepage, navigation, or related articles.

04

Share on external sites

A link from a frequently-crawled external site (social media, news sites, high-traffic blogs) can cause Googlebot to discover and crawl your URL within hours through the external site's crawl budget.

05

Ensure Googlebot can access it

No noindex tag. Not blocked by robots.txt. Returns 200. Has a correct canonical. Has enough content for Google to consider it worth indexing. All five need to be true.

Frequently asked questions

My page was indexed but now it's not showing up. What happened?

Several things can cause de-indexing: a recently added noindex tag, Google recrawling and deciding the content no longer meets quality thresholds, a canonical change that shifted the preferred URL, a manual action (penalty) from Google, or the page now returning a non-200 status. URL Inspection in Search Console will tell you the current status and any issues detected.

Can Google index a page that's blocked by robots.txt?

Technically yes. If a URL is referenced by external links, Google may include it in the index without crawling it, showing the URL with a message like "A description for this result is not available." To fully prevent a URL from appearing in search, use noindex (not robots.txt disallow).

How many pages from my site should be indexed?

Ideally, all content pages you want to rank, and none of the pages you don't (admin, private, duplicate). Check by comparing your indexed page count in Search Console against your actual content page count. A large gap usually means indexing problems — noindex being applied too broadly, or a lot of thin content Google is choosing not to index.

What is "crawled, currently not indexed" and should I worry about it?

"Crawled, currently not indexed" means Googlebot visited the page but Google chose not to add it to the index. Common causes: thin content, near-duplicate of other pages, low-quality content, or too many similar pages. For important pages showing this status: improve content depth and uniqueness, reduce duplicate content, and build internal/external links to the page. For pages that aren't important, this is fine — it reduces crawl waste.

Check indexability now

Run an indexability audit on any URL

Full Audit (includes indexability) →Render Check →
← Back to all guides