Meta Robots Tag: When to Use Each Directive
Master meta robots tags for SEO. Learn every directive, when to use noindex, follow, nofollow, and avoid conflicts. Step-by-step guide for founders.
Prerequisites: What You Need to Know Before Starting
Before you touch a single meta robots tag, get clear on three things:
First, understand the difference between page-level and site-level control. The meta robots tag lives in your page's HTML head and tells search engines how to treat that specific page. It's granular. It's powerful. And it overrides robots.txt in most cases—which is why you need to know when to use it.
Second, know your tools. You'll need a text editor (VS Code, Sublime, whatever you ship with), access to your site's HTML head, and ideally Google Search Console to verify implementation. If you're on WordPress, you'll need access to either your theme's header.php or a plugin like Yoast or Rank Math. No fancy tools required.
Third, accept that conflicts happen. If you tell Google to noindex a page in the meta robots tag but allow it in robots.txt, Google follows the most restrictive directive. You need to understand this hierarchy to avoid shooting yourself in the foot.
If you haven't already, read through when to use noindex vs. robots.txt—a decision tree to understand the broader context of crawl control. Then come back here for the tactical implementation.
What Is a Meta Robots Tag and Why It Matters
The meta robots tag is a single line of HTML that sits in your page's <head> section. It looks like this:
<meta name="robots" content="noindex, follow">
That's it. One line. But that one line tells Google, Bing, and every other search engine crawler exactly what to do with your page.
Why does this matter? Because visibility is the difference between shipping and staying invisible. A page that's accidentally noindexed is a page that's invisible to organic search. A page with the wrong directives wastes crawl budget on pages that shouldn't be crawled. A page with conflicting directives confuses search engines about what you actually want.
According to Google's official robots meta tag documentation, the meta robots tag provides granular, page-specific control over crawling, indexing, and serving content in search results. It's the fastest way to tell search engines "index this, crawl this, don't cache this" without waiting for DNS changes or robots.txt updates to propagate.
Founders need this because you're shipping fast. You don't have time to wait for agency consultants to debate robots.txt strategy. You need to block a duplicate page. You need to noindex your staging environment. You need to tell Google to serve your mobile version in search results. The meta robots tag is your weapon.
The Core Directives: What Each One Does
There are five main directives you'll encounter. Learn these. Use these. Ignore the rest.
Index vs. Noindex
This is binary. It answers one question: "Should this page appear in search results?"
Index (the default) means "crawl this page and add it to your search results." You almost never write this explicitly because it's the default behavior. But you'll see it in documentation.
Noindex means "crawl this page, but don't add it to search results." This is the directive you'll use most often. Use it for:
- Duplicate pages (like pagination, filtered product lists, or session-specific URLs)
- Staging environments and development sites
- Thin pages that exist only to funnel traffic (doorway pages)
- Admin pages that leaked into your sitemap
- Pages with sensitive information that shouldn't rank
Example:
<meta name="robots" content="noindex">
Google will still crawl the page and see its links, but it won't show the page in search results. This is crucial: noindex doesn't block crawling. It blocks indexing. If you want to block crawling entirely, you need robots.txt or the nofollow directive (more on that below).
Follow vs. Nofollow
This directive answers: "Should search engines follow the links on this page?"
Follow (the default) means "crawl and pass authority through the links on this page." Again, this is the default, so you rarely write it explicitly.
Nofollow means "crawl this page, but don't follow or pass authority through its links." Use it for:
- Pages with untrusted content (comments, user-generated content)
- Pages where you don't want to pass link juice (affiliate pages, sponsored content)
- Pages with links to external sites you don't want to endorse
Example:
<meta name="robots" content="nofollow">
Important distinction: nofollow on a meta robots tag affects all links on the page. If you want to nofollow just one link, use the rel="nofollow" attribute on that specific anchor tag instead.
Noarchive
This directive tells search engines not to store a cached version of your page. Use it for:
- Pages with time-sensitive information (pricing, stock levels)
- Pages with sensitive data
- Pages that change frequently and cached versions would be misleading
Example:
<meta name="robots" content="noarchive">
This is rarely critical for founders, but it's useful when you have pages that become outdated fast.
Noimageindex
This tells search engines not to index images on your page. Use it when:
- You have copyrighted images you don't want appearing in Google Images
- You have images you want to keep private
- You're hosting images for internal use only
Example:
<meta name="robots" content="noimageindex">
Again, rarely critical for most founders, but good to know.
Nocache (Deprecated)
This is old. Google doesn't support it anymore. Skip it.
How to Combine Directives (And What Combinations Actually Work)
Here's where most founders mess up: they combine directives without understanding which combinations are valid.
You can combine multiple directives in a single meta robots tag by separating them with commas:
<meta name="robots" content="noindex, follow">
This tells Google: "Don't index this page, but follow its links and pass authority through them."
Let's walk through the valid combinations and when to use them:
Noindex, Follow — This is the most common combination. Use it for duplicate pages, thin pages, or pages you want to keep crawlable but invisible. Google crawls the page, sees its links, passes authority through them, but doesn't show the page in search results.
<meta name="robots" content="noindex, follow">
Noindex, Nofollow — This tells Google "don't index this page and don't follow its links." Use it for pages you want to completely isolate: staging sites, internal tools, admin dashboards that accidentally got indexed.
<meta name="robots" content="noindex, nofollow">
Index, Nofollow — This tells Google "index this page but don't follow its links." Use it for pages with untrusted content (user comments, forums) where you want the page to rank but don't want to pass authority through its links.
<meta name="robots" content="index, nofollow">
Warning: Writing "index" explicitly is unusual. Most developers just omit it and let follow/nofollow do the work. But if you're being explicit, this combination works.
Index, Follow — This is the default. You almost never write it. But if you do, it means "index this page and follow its links." Every normal page on your site should behave this way.
<meta name="robots" content="index, follow">
Now, what about combinations that DON'T work?
Noindex, Noindex — Redundant. Just write noindex once.
Noindex, Index — Conflicting. Google will choose the most restrictive directive, which is noindex. So the page won't be indexed.
Nofollow, Follow — Conflicting. Google chooses the most restrictive, which is nofollow. So links won't be followed.
According to Conductor's meta robots tag guide, handling conflicting directives means understanding that search engines apply the most restrictive rule. This is why you need to be intentional about what you write.
Step-by-Step Implementation Guide
Now let's actually put this into practice. Here's how to add meta robots tags to your site without breaking anything.
Step 1: Audit Your Current Site
Before you add a single meta robots tag, understand what's already there. Use Google Search Console URL Inspection to check if pages are indexed. Run a quick site search:
site:yourdomain.com
This shows you everything Google has indexed. Compare it to your actual sitemap. Are there pages indexed that shouldn't be? Are there pages missing that should be indexed?
If you're on WordPress, install Yoast SEO or Rank Math and check their site audit. They'll flag indexation issues automatically. If you're on a custom stack, use SE Ranking's guide on meta robots tags to understand what you're looking for.
Step 2: Identify Pages That Need Meta Robots Tags
Not every page needs a meta robots tag. Only pages that deviate from the default (index, follow) need one.
Common pages that need meta robots tags:
- Duplicate pages: Pagination (page=2, page=3), filtered product lists, sorted results
- Staging/development: Any pre-production environment
- Thin pages: Tag pages with minimal content, category pages with only product lists
- User-generated content: Comment threads, forum posts, user profiles
- Admin/internal pages: Login pages, settings, dashboards
- Sensitive pages: Pricing pages you want to keep private, internal docs
For each page, ask: "Do I want this page to appear in Google search results?" If the answer is no, it needs a noindex tag.
Step 3: Write the Meta Robots Tag
Decide which directive your page needs. Use this decision tree:
Question 1: Should this page appear in search results?
- Yes → Don't add a meta robots tag (or add "index, follow")
- No → Add "noindex"
Question 2: Should search engines follow the links on this page?
- Yes → Use "follow" (or omit it, it's the default)
- No → Use "nofollow"
Once you've answered both questions, combine the directives. For most founders, you'll end up with one of three tags:
<!-- Default: index everything, follow all links -->
<!-- Don't write this; it's the default -->
<!-- Duplicate/thin pages: hide from search, but crawl links -->
<meta name="robots" content="noindex, follow">
<!-- Untrusted content: index it, but don't pass authority -->
<meta name="robots" content="index, nofollow">
<!-- Staging/admin: hide completely -->
<meta name="robots" content="noindex, nofollow">
Step 4: Add the Tag to Your HTML Head
Now place the tag in your page's <head> section. It goes between <head> and </head>, ideally near the top after your title and meta charset.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>My Page Title</title>
<meta name="robots" content="noindex, follow">
<!-- rest of your head -->
</head>
<body>
<!-- content -->
</body>
</html>
If you're on WordPress, don't edit the HTML manually. Use your SEO plugin:
Yoast SEO:
- Edit the post/page
- Scroll to the Yoast SEO block
- Click "Advanced" tab
- Find "Allow search engines to show this post in search results?"
- Toggle to "No" (this adds noindex)
- Check "Should search engines follow links on this page?"
- Toggle to "No" if needed (this adds nofollow)
Rank Math:
- Edit the post/page
- Find the Rank Math panel on the right
- Click "Advanced" tab
- Find "Robots" section
- Check/uncheck "Index" and "Follow" as needed
For detailed setup, see setting up Yoast or Rank Math.
Step 5: Verify the Tag Is Actually There
Don't assume it worked. Verify it.
In your browser:
- Go to the page
- Right-click → "View Page Source"
- Press Ctrl+F (or Cmd+F on Mac)
- Search for "robots"
- Confirm you see your meta robots tag
In Google Search Console:
- Go to Google Search Console
- Select your property
- Use URL Inspection to check a specific page
- It will show you the robots directives it detected
Programmatically (if you're technical):
curl -s https://yoursite.com/page | grep -i "robots"
This returns your meta robots tag if it exists.
Step 6: Monitor Indexation Changes
After you add noindex tags, Google won't instantly remove pages from search results. It takes time—usually a few days to a few weeks depending on your crawl frequency.
Check progress in Google Search Console:
- Go to Coverage report
- Look for "Excluded" pages
- Expand "Excluded" and look for "Noindex tag" as the reason
- This confirms Google detected your directive
You can also use the site: operator to check:
site:yoursite.com/old-page
If the page was noindexed, it should disappear from results within a few weeks.
User-Agent Targeting: Controlling Crawlers Individually
Sometimes you need different rules for different search engines. The meta robots tag has a solution: user-agent targeting.
Instead of name="robots" (which applies to all crawlers), you can target specific crawlers:
<!-- Block Google specifically -->
<meta name="googlebot" content="noindex">
<!-- Block Bing specifically -->
<meta name="bingbot" content="noindex">
<!-- Block all crawlers -->
<meta name="robots" content="noindex">
When would you use this? Rarely. But here are edge cases:
- You want to rank in Bing but not Google (unusual, but possible for niche markets)
- You want to block a specific crawler that's hammering your server
- You're testing noindex on Google while keeping a page indexed in Bing
For most founders, just use the generic name="robots" tag. It applies to all crawlers, which is what you want.
According to SE Ranking's comprehensive guide, user-agent targeting is supported by most major search engines, but the generic robots tag is the safest approach for broad coverage.
X-Robots-Tag: When to Use Header-Based Directives
The meta robots tag works for HTML pages. But what about PDFs, images, or other file types? What about dynamic content that doesn't have an HTML head?
Enter the X-Robots-Tag header.
Instead of adding a meta tag to the HTML, you add a directive to the HTTP response header:
X-Robots-Tag: noindex, follow
This is identical to the meta robots tag, but it's sent in the HTTP header instead of the HTML. It's useful for:
- PDFs and other non-HTML files: You can't add HTML meta tags to a PDF, so use the header instead
- Dynamic content: If your page is generated on-the-fly, adding headers is sometimes easier than modifying HTML
- API responses: If you're serving content via API, use headers for crawl control
- Images: Control how images are indexed without modifying the image file itself
How to add X-Robots-Tag headers depends on your stack:
On Apache (add to .htaccess):
<FilesMatch "\.pdf$">
Header set X-Robots-Tag "noindex, follow"
</FilesMatch>
On Nginx (add to nginx.conf):
location ~\.pdf$ {
add_header X-Robots-Tag "noindex, follow";
}
In Node.js/Express:
app.use((req, res, next) => {
res.set('X-Robots-Tag', 'noindex, follow');
next();
});
In Python/Django:
response['X-Robots-Tag'] = 'noindex, follow'
According to WebFX's robots meta directives guide, X-Robots-Tag headers are particularly useful for non-HTML content and provide the same flexibility as meta tags with the advantage of working on any file type.
For most founders shipping HTML sites, you won't need X-Robots-Tag headers. Stick with meta robots tags. But if you're serving PDFs, images, or APIs, X-Robots-Tag is your solution.
Common Mistakes and How to Avoid Them
Here are the mistakes founders make most often with meta robots tags:
Mistake 1: Accidentally Noindexing Your Homepage
This happens more than you'd think. A developer copies a template with a noindex tag from a staging environment and deploys it to production. Your homepage disappears from search.
How to avoid it: Before deploying any code, search your codebase for "noindex". Make sure it's only on pages that actually need it. Use version control and code review. Better yet, use an SEO plugin that requires explicit confirmation before adding noindex.
Mistake 2: Using Noindex When You Should Use Robots.txt
Noindex is for pages you still want crawled. Robots.txt is for pages you want to block from crawling entirely. If you noindex a page with 50 internal links, Google will crawl that page and all 50 links targets just to see that the page is noindexed. That's wasted crawl budget.
How to avoid it: Read when to use noindex vs. robots.txt. Use noindex for duplicate pages and thin pages you want to keep crawlable. Use robots.txt for pages you want to block from crawling entirely.
Mistake 3: Noindexing Pages You Want to Rank
A founder noindexes a page, forgets about it, then wonders why it's not ranking three months later. The page is invisible to search engines because of that one-line directive.
How to avoid it: Document why you noindexed each page. Add a comment in your code:
<!-- NOINDEX: This is a duplicate of /canonical-page, kept for backward compatibility -->
<meta name="robots" content="noindex, follow">
Review noindexed pages quarterly. If a page is no longer a duplicate, remove the directive.
Mistake 4: Conflicting Directives
You add a noindex meta tag but forget to remove the page from your sitemap. Or you noindex in the meta tag but allow it in robots.txt. Google sees conflicting signals and might get confused.
How to avoid it: Keep your crawl control consistent. If a page is noindexed:
- Remove it from your XML sitemap
- Block it in robots.txt if you want to save crawl budget
- Don't link to it from other pages (or use rel="nofollow" if you must link)
According to Moz's robots meta directives guide, consistent directives across meta tags, robots.txt, and sitemaps prevent confusion and ensure search engines understand your intent.
Mistake 5: Forgetting About Mobile vs. Desktop
If you have separate mobile and desktop versions of your site, you might need different directives for each. For example, you might want to index the desktop version but noindex the mobile version (though this is rare in 2024 with mobile-first indexing).
How to avoid it: If you have separate mobile and desktop URLs, add user-agent targeting:
<!-- For desktop crawlers -->
<meta name="googlebot" content="index, follow">
<!-- For mobile crawlers -->
<meta name="googlebot-mobile" content="index, follow">
But honestly, if you're building a new site, use responsive design. One URL, one version, one set of directives. Much simpler.
Checking and Auditing Meta Robots Tags at Scale
Once you've added meta robots tags to individual pages, you need a way to audit them across your entire site. Manual checking doesn't scale.
Google Search Console shows you pages Google has excluded due to noindex tags, but it's not a complete audit tool.
Screaming Frog (paid tool, but worth it) crawls your entire site and reports every meta robots tag it finds. You can export the data and filter by directive.
Semrush Site Audit (paid) crawls your site and flags pages with meta robots issues. It'll catch accidental noindex tags and conflicting directives.
Free option: If you're technical, write a simple script:
import requests
from bs4 import BeautifulSoup
urls = ['https://yoursite.com/page1', 'https://yoursite.com/page2']
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
robots_tag = soup.find('meta', {'name': 'robots'})
if robots_tag:
print(f"{url}: {robots_tag.get('content')}")
else:
print(f"{url}: No robots tag (default index, follow)")
This crawls your URLs and reports the robots directive on each one. Run it monthly to catch unexpected changes.
According to Semrush's robots meta guide, Site Audit tools are essential for detecting implementation issues at scale, especially on large sites where manual checking is impossible.
Meta Robots Tags for Different Content Types
Different types of content have different crawl needs. Here's a reference for common content types:
Blog Posts: Default (index, follow). You want these ranking.
Product Pages: Default (index, follow) unless it's a duplicate variant (color, size, etc.). Then use noindex, follow.
Category/Tag Pages: If they're thin (just a list of products with no unique content), use noindex, follow. If they have unique descriptions and meta, use default.
Pagination: Use noindex, follow. You don't want page=2 ranking separately; you want the first page to rank.
Login/Account Pages: Use noindex, nofollow. These shouldn't be indexed or crawled.
Checkout/Cart: Use noindex, nofollow. Definitely not for search results.
Search Results Pages: Use noindex, follow. These are thin pages that shouldn't rank.
User-Generated Content (Comments, Forums): Use index, nofollow. You want the page to rank (it has unique content), but you don't want to pass authority through user links.
Staging/Development: Use noindex, nofollow on everything. Nothing should leak into search results.
PDF Downloads: Use X-Robots-Tag header: noindex if it's a duplicate, or default if it's unique content you want to rank.
Integration with Your SEO Strategy
Meta robots tags don't exist in isolation. They're part of a broader SEO strategy that includes robots.txt, sitemaps, canonicals, and crawl budget management.
Here's how meta robots tags fit into your overall technical SEO:
Crawl Budget: Use robots.txt to block pages you don't want crawled at all. Use noindex in meta robots tags for pages you want crawled but not indexed. This preserves crawl budget for pages that matter.
Canonicals: Use canonical tags to tell Google which version of a page is the "official" one. Use noindex, follow on duplicate pages to make sure Google crawls them but doesn't index them. Together, these prevent duplicate content issues.
Sitemaps: Include only pages you want indexed in your XML sitemap. If a page is noindexed, remove it from the sitemap. This sends a consistent signal to Google.
Redirects: If you're consolidating pages, use 301 redirects instead of noindex. Redirects pass authority; noindex doesn't. Use noindex only for pages you want to keep around but hide from search.
For a complete reference on how these pieces fit together, read robots, sitemaps, and canonicals: the three files founders always get wrong.
Pro Tips and Advanced Tactics
Tip 1: Use Noindex as a Temporary Measure
You don't have to permanently noindex a page. Use it while you're working on content. Once the page is ready to rank, remove the noindex directive. Google will re-index it on the next crawl.
Tip 2: Noindex Staging Environments Automatically
If you have a staging environment, add a global noindex directive to the entire site:
<!-- In your staging environment template -->
<meta name="robots" content="noindex, follow">
This prevents the entire staging site from being indexed, even if a page accidentally gets linked from an external source.
Tip 3: Monitor Crawl Errors in Search Console
After adding noindex tags, check Google Search Console for crawl errors. If Google can't reach a page, it can't see the noindex directive. Fix any crawl errors before relying on noindex.
Tip 4: Combine with Robots.txt for Maximum Control
For pages you want to completely hide from search engines and save crawl budget, use both robots.txt and noindex:
# robots.txt
User-agent: *
Disallow: /admin/
Disallow: /staging/
And on the page itself:
<meta name="robots" content="noindex, nofollow">
This is redundant (robots.txt already blocks it), but it's a safety net. If someone accidentally links to the page or removes the robots.txt rule, the noindex tag still protects you.
Tip 5: Use Noindex for A/B Tests
If you're A/B testing two versions of a page, noindex the variant you don't want to rank:
<!-- Original page -->
<meta name="robots" content="index, follow">
<!-- Test variant -->
<meta name="robots" content="noindex, follow">
This lets you run the test without confusing Google about which version to rank. Once the test is done, remove the noindex directive from the winning variant.
Troubleshooting: Why Your Noindex Tag Isn't Working
You added a noindex tag, but the page is still indexed. Here's how to debug:
Step 1: Verify the Tag Is Actually There
View the page source and search for "robots". If it's not there, it wasn't added correctly. Check your code, your CMS, your template.
Step 2: Check for Conflicting Directives
Search for multiple robots tags on the same page. Some CMS platforms add robots tags automatically, and your custom tag might conflict with it. Keep only one.
Step 3: Check robots.txt
If robots.txt blocks the page from crawling, Google can't see the noindex directive. Make sure robots.txt allows the page:
# robots.txt - Allow crawling so Google sees the noindex tag
User-agent: *
Disallow:
Step 4: Use URL Inspection in Google Search Console
Go to URL Inspection and check if Google detected the noindex directive. If it shows "Excluded: Noindex tag", Google knows about it. The page will be removed from results within a few weeks.
Step 5: Wait
Google doesn't instantly remove pages from search results. It takes time. Check back in a few weeks. If the page is still indexed after a month, then you have a real problem.
Step 6: Request Removal (Nuclear Option)
If the page is still indexed after a month and you need it gone immediately, use Google Search Console to request removal. This removes the page from results for 6 months. But first, make sure your noindex tag is actually there and correct.
Key Takeaways and Your Action Plan
Meta robots tags are simple in theory but powerful in practice. Here's what you need to remember:
The Directives:
- Noindex = hide from search results (but still crawl)
- Nofollow = don't follow links on this page
- Noarchive = don't cache this page
- Noimageindex = don't index images
The Combinations:
- noindex, follow = most common (hide duplicates, crawl links)
- index, nofollow = for user-generated content (rank it, don't pass authority)
- noindex, nofollow = for staging/admin (hide completely)
The Implementation:
- Audit your site for pages that shouldn't rank
- Add noindex to duplicates, thin pages, and staging environments
- Verify the tags are there using View Source or Search Console
- Monitor indexation changes in Search Console
- Review quarterly and remove noindex from pages that are ready to rank
Your Next Steps:
- Right now: Check your homepage's page source. Confirm it doesn't have a noindex tag.
- Today: Identify 5 pages on your site that should be noindexed (duplicates, staging, thin pages).
- This week: Add noindex, follow to those pages using your CMS or by editing HTML.
- This month: Check Google Search Console to confirm Google detected the noindex directives.
Meta robots tags are one of the fastest wins in technical SEO. A single line of HTML can save you crawl budget, prevent duplicate content issues, and focus Google's attention on pages that actually matter.
For a complete technical SEO foundation, also set up SSL certificates properly, configure your canonical domain, and add organization schema.
Ship smart. Control what Google crawls and indexes. Rank what matters.
Get the next one on Sunday.
One short email a week. What is working in SEO right now. Unsubscribe in one click.
Subscribe on Substack →