Writing Your First robots.txt File: A Founder's Template

You shipped. Your product works. Customers love it. But Google can't find you because your site tells crawlers to stay out—or worse, you never told them what to do at all.

A robots.txt file is a single text file that sits at the root of your domain and tells search engine crawlers (and other bots) which pages to crawl, which to skip, and where your sitemap lives. It's not sexy. It won't make you viral. But it's the difference between Google indexing your product pages and indexing your admin panel by accident.

This guide gives you a plain-English template, step-by-step instructions, and the exact directives you need to ship. No agency fees. No confusion. Just a working robots.txt file in under 10 minutes.

Prerequisites: What You Need Before You Start

Before you write your first robots.txt file, make sure you have:

Access to your domain's root directory. You need to upload a file to https://yourdomain.com/robots.txt. If you're on shared hosting (GoDaddy, Bluehost), you can use the file manager. If you're on a custom server or cloud infrastructure (AWS, Vercel, Netlify), you know where to put files. If you don't, ask your hosting provider where the root directory is.
A text editor. Notepad, VS Code, Sublime—anything works. Don't use Microsoft Word. It adds invisible formatting that breaks robots.txt.
Your sitemap URL. You should already have a sitemap at https://yourdomain.com/sitemap.xml. If you don't, generate one using a free tool like XML-Sitemaps.com or your site builder's built-in feature. Your sitemap tells crawlers where all your important pages live.
A list of directories or file types you want to block. Common ones: /admin, /api, /private, /test, .pdf files you don't want indexed, /search? query strings. If you're not sure, don't block anything yet. You can always add restrictions later.
5 minutes to test. After you upload your robots.txt file, use Google Search Console to validate it. This takes 30 seconds and prevents embarrassing mistakes.

If you're starting SEO from scratch, read Crawlability for Founders: A Plain-English Primer first. It explains why robots.txt matters in the context of your entire SEO strategy.

What robots.txt Actually Does (And Doesn't)

Here's the brutal truth: robots.txt is a suggestion, not a law.

Google's crawlers respect your robots.txt file. So do crawlers from Bing, DuckDuckGo, and most legitimate bots. But bad actors—scrapers, copyright thieves, malicious bots—ignore it completely. robots.txt is not a security tool. If you have sensitive data, use authentication (password protection, login walls, HTTP basic auth) instead.

What robots.txt actually does:

Tells crawlers which pages to crawl. If you block /admin with a Disallow directive, Googlebot won't waste time crawling your admin panel. This saves crawl budget—the number of pages Google crawls on your site per day. For small sites, crawl budget is unlimited. For large sites with thousands of pages, crawl budget matters. Block the junk, let crawlers focus on content.
Tells crawlers which pages not to index. This is important. If a page is in your robots.txt Disallow list, Google might not index it. But it's not guaranteed. If other sites link to that page, Google might index it anyway. For guaranteed blocking, use a meta noindex tag or HTTP 401 authentication.
Points to your sitemap. The Sitemap directive tells crawlers where to find your XML sitemap. This is especially useful for large sites, sites with lots of dynamic content, or sites where not all pages are linked from the homepage.
Manages crawl delays. You can tell crawlers to slow down (Crawl-delay directive) if your server is overloaded. Most founders don't need this. Your hosting can handle it.

What robots.txt doesn't do:

It doesn't prevent indexing. Use <meta name="robots" content="noindex"> in the page's HTML head to prevent indexing. robots.txt blocks crawling; meta tags block indexing. They're different. Learn the distinction in The Difference Between Indexing and Ranking — And Why It Matters.
It doesn't hide your content from public view. If someone knows the URL, they can still visit the page in their browser. robots.txt only affects crawlers.
It doesn't prevent scraping or theft. Bad actors ignore robots.txt. Use real security (authentication, rate limiting, legal agreements) if you're worried about data theft.
It doesn't improve your rankings. robots.txt is a technical foundation. It doesn't make you rank higher. But a broken robots.txt can tank your visibility by blocking important pages from crawling.

For a complete understanding of how robots.txt fits into your broader SEO strategy, check out SEO Basics: The 12 Concepts a Busy Founder Can't Skip.

The Anatomy of a robots.txt File: Directives Explained

A robots.txt file is plain text. Each line contains a directive. Directives follow a simple pattern:

Directive: Value

Here are the directives you actually need to know:

User-Agent

What it does: Specifies which crawler(s) the following rules apply to.

Syntax:

User-Agent: *

The asterisk (*) means "all crawlers." This is what you'll use 99% of the time.

You can also target specific crawlers:

User-Agent: Googlebot
User-Agent: Bingbot
User-Agent: Slurp

But unless you're running a massive site and need fine-grained control, just use *.

Disallow

What it does: Tells crawlers not to crawl pages matching the specified path.

Syntax:

Disallow: /admin
Disallow: /private/
Disallow: /*.pdf$

A blank Disallow means "don't block anything":

Disallow:

Examples:

Disallow: /admin — Block all pages starting with /admin (e.g., /admin, /admin/users, /admin/settings).
Disallow: /search? — Block pages with query strings starting with ? (e.g., /search?q=test). The ? is a wildcard.
Disallow: /*.pdf$ — Block all PDF files. The $ means "end of path."
Disallow: /api/ — Block your API endpoints so crawlers don't waste time on machine-readable data.

Allow

What it does: Tells crawlers to crawl pages even if they match a Disallow rule above.

Syntax:

Disallow: /private/
Allow: /private/public-page

This blocks everything under /private/ except /private/public-page. Use this when you have a broad Disallow but want to allow specific exceptions.

Sitemap

What it does: Points crawlers to your XML sitemap.

Syntax:

Sitemap: https://yourdomain.com/sitemap.xml

You can include multiple sitemaps:

Sitemap: https://yourdomain.com/sitemap.xml
Sitemap: https://yourdomain.com/sitemap-blog.xml
Sitemap: https://yourdomain.com/sitemap-products.xml

This is optional but recommended. It helps crawlers discover all your important pages faster.

Crawl-Delay

What it does: Tells crawlers to wait X seconds between requests.

Syntax:

Crawl-Delay: 1

This tells Googlebot to wait 1 second between page requests. Most founders don't need this. Google is smart about crawl speed. Use it only if your server is consistently overloaded.

Request-Rate

What it does: Limits the number of pages a crawler can request per second.

Syntax:

Request-Rate: 10/1s

This means "10 pages per 1 second." Again, most founders don't need this. It's for massive sites with millions of pages.

For a deeper dive into how crawlers interact with your site, see What Googlebot, GPTBot, and ClaudeBot Actually See on Your Site in 2026.

Step 1: Create Your robots.txt File

Open your text editor and create a new file. Name it exactly robots.txt (lowercase, no extensions like .txt.txt).

Don't use Microsoft Word. Use Notepad, VS Code, Sublime, or any plain-text editor.

Step 2: Write the Basic Template

Here's a starter template that works for 90% of founders:

# robots.txt for yourdomain.com
# Last updated: [Today's date]

User-Agent: *
Disallow: /admin
Disallow: /private
Disallow: /api
Disallow: /search?
Disallow: /cart
Disallow: /checkout
Disallow: /*.pdf$

Sitemap: https://yourdomain.com/sitemap.xml

Let's break this down:

Lines starting with # are comments. Crawlers ignore them. Use comments to document your rules so you remember why you wrote them.
User-Agent: * means these rules apply to all crawlers.
Disallow directives block common junk paths: admin panels, private areas, APIs, search results, shopping carts, and PDFs.
Sitemap points crawlers to your sitemap.

This template is conservative. It blocks obvious stuff that shouldn't be indexed. If your site doesn't have an admin panel at /admin, you can remove that line. If you want PDFs indexed, remove the /*.pdf$ line.

Step 3: Customize for Your Site

Now adapt the template to your actual site structure. Ask yourself:

What paths should never be indexed?

Admin panels: /admin, /dashboard, /wp-admin (if WordPress)
User accounts: /account, /profile, /settings
APIs: /api, /v1, /graphql
Internal tools: /staging, /test, /dev
Duplicate content: /old-site, /archive, /backup
Query strings: /search?, /filter?, /sort?

What file types should never be indexed?

PDFs: /*.pdf$
Images: /*.jpg$, /*.png$
Videos: /*.mp4$
Spreadsheets: /*.xlsx$

Add these to your Disallow list. Here's an example for a SaaS founder:

User-Agent: *
Disallow: /app/
Disallow: /dashboard/
Disallow: /api/
Disallow: /admin/
Disallow: /settings/
Disallow: /account/
Disallow: /billing/
Disallow: /search?
Disallow: /filter?
Disallow: /*.pdf$

Sitemap: https://yourdomain.com/sitemap.xml

Here's an example for a content site (blog, news, etc.):

User-Agent: *
Disallow: /admin/
Disallow: /draft/
Disallow: /private/
Disallow: /search?
Disallow: /tag?
Disallow: /category?
Disallow: /*.pdf$

Sitemap: https://yourdomain.com/sitemap.xml
Sitemap: https://yourdomain.com/sitemap-blog.xml

The key: only block what you're sure about. When in doubt, leave it open. You can always add restrictions later.

Step 4: Handle Edge Cases

Some situations require special handling:

Blocking Specific File Types While Allowing Others

If you want to block PDFs but allow everything else:

Disallow: /*.pdf$

If you want to block all files except HTML:

Disallow: /*.jpg$
Disallow: /*.png$
Disallow: /*.gif$
Disallow: /*.css$
Disallow: /*.js$

Or use a wildcard to block everything, then Allow specific exceptions:

Disallow: /
Allow: /public/
Allow: /blog/

This blocks your entire site except /public/ and /blog/. Useful if you're running a private site with public sections.

Blocking Query Strings

Query strings (the ?something=value part of a URL) create duplicate content problems. Block them:

Disallow: /*?

This blocks all URLs with query strings. If you need to allow specific query strings (like pagination), be more specific:

Disallow: /*?utm_
Disallow: /*?fbclid=

This blocks tracking parameters but allows other query strings.

Multiple User-Agents with Different Rules

If you want different rules for different crawlers:

User-Agent: Googlebot
Disallow: /api/

User-Agent: Bingbot
Disallow: /api/
Disallow: /private/

User-Agent: *
Disallow: /admin/

But again, most founders just use User-Agent: * for simplicity.

Blocking AI Training Bots

If you don't want your content used to train AI models, block the relevant bots. According to Robots.txt Introduction and Guide from Google Search Central, you can block specific user agents:

User-Agent: GPTBot
Disallow: /

User-Agent: CCBot
Disallow: /

User-Agent: anthropic-ai
Disallow: /

User-Agent: *
Disallow: /admin/
Disallow: /api/

This blocks OpenAI's GPTBot, Commoncrawl's CCBot, and Anthropic's bot from crawling your site, while allowing normal search engine crawlers.

Step 5: Upload Your robots.txt File

Now upload the file to your domain's root directory. The URL must be exactly https://yourdomain.com/robots.txt.

If you're on shared hosting (GoDaddy, Bluehost, etc.):

Log into your hosting control panel (cPanel, Plesk, etc.)
Open the file manager
Navigate to the public_html or www directory (the root of your domain)
Upload the robots.txt file
Make sure it's readable (permissions 644 or 755)

If you're on a custom server (AWS, DigitalOcean, etc.):

SSH into your server
Navigate to your web root (usually /var/www/yourdomain.com or similar)
Create the file: nano robots.txt
Paste your content
Save and exit (Ctrl+X, then Y, then Enter)
Make sure the file is readable: chmod 644 robots.txt

If you're on Vercel, Netlify, or similar:

Add the robots.txt file to your public directory
Deploy. The platform automatically serves it at the root.

If you're on WordPress:

Log in to your WordPress admin
Go to Settings → Reading
Scroll down to "Search Engine Visibility" and make sure "Discourage search engines from indexing this site" is unchecked
Go to Settings → Permalinks and scroll down to "robots.txt"
You can edit it directly in WordPress, or use a plugin like Yoast SEO

If you're on Shopify:

Log in to your Shopify admin
Go to Settings → Files
Create a new robots.txt file
Paste your content
Save

Once uploaded, verify the file is accessible by visiting https://yourdomain.com/robots.txt in your browser. You should see your robots.txt content as plain text.

Step 6: Test Your robots.txt File

Don't ship untested code. Same goes for robots.txt.

Method 1: Google Search Console (Recommended)

Go to Google Search Console
Select your property (your domain)
In the left sidebar, go to Settings → Crawl Settings → robots.txt Tester
Paste a URL from your site (e.g., /blog/my-post)
Click "Test"
Google will tell you whether that URL is allowed or blocked by your robots.txt

Test a few URLs:

A page you want indexed (should show "Allowed")
A page you blocked (should show "Blocked")
An admin page (should show "Blocked")

If something shows "Blocked" when it shouldn't be, fix your robots.txt and re-test.

Method 2: Online robots.txt Tester

If you don't have Google Search Console set up yet, use a free online tester like Seobility's robots.txt Tester or Screaming Frog's robots.txt Validator. Upload your robots.txt file and test specific URLs.

Method 3: Manual Inspection

Visit https://yourdomain.com/robots.txt in your browser. You should see:

Your comments (lines starting with #)
Your User-Agent directives
Your Disallow/Allow directives
Your Sitemap line

If you see an error page (404, 403), the file isn't in the right place. If you see HTML or XML instead of plain text, your hosting is misconfigured. Contact your hosting provider.

For a comprehensive understanding of how robots.txt fits into your technical SEO foundation, read The 5 Pillars of Modern SEO Every Founder Should Master.

Step 7: Monitor and Update

Your robots.txt isn't set-and-forget. Check it monthly.

Add it to your monthly SEO checklist:

Visit https://yourdomain.com/robots.txt and make sure it's still there
Check Google Search Console → Settings → Crawl Settings → robots.txt Tester
Look for any crawl errors in Google Search Console → Indexing Status

If you add new sections to your site (a new product category, a new blog), update your robots.txt accordingly. If you're blocking paths that should now be indexed, remove them.

For a step-by-step monthly SEO process, see The 10-Minute SEO Review Every Founder Should Run Monthly.

Common Mistakes Founders Make with robots.txt

Mistake 1: Blocking your entire site

Disallow: /

This tells all crawlers "don't index anything." If you want your site indexed, don't do this. Use this only if you're running a private site or a staging environment that shouldn't be public.

Mistake 2: Using robots.txt to hide sensitive data

Don't do this. robots.txt is a suggestion. Bad actors ignore it. If you have sensitive data (passwords, API keys, financial info), use authentication or encryption. robots.txt won't protect you.

Mistake 3: Blocking important pages by accident

If you block /products when you meant to block /products/admin, you've just told Google not to crawl your product pages. Test your robots.txt before shipping. Use Google Search Console's robots.txt tester.

Mistake 4: Forgetting the Sitemap directive

Include your sitemap in robots.txt:

Sitemap: https://yourdomain.com/sitemap.xml

This helps crawlers discover all your pages faster, especially if your site structure is complex.

Mistake 5: Using uppercase directives

robots.txt is case-sensitive for paths but directives are case-insensitive. This works:

user-agent: *
Disallow: /admin

But it's better practice to use standard capitalization:

User-Agent: *
Disallow: /admin

Stay consistent. Most tools expect standard capitalization.

Mistake 6: Adding spaces incorrectly

This is wrong:

User-Agent : *
Disallow : /admin

There should be no space before the colon. This is correct:

User-Agent: *
Disallow: /admin

Pro Tips for Founders

Tip 1: Start conservative, expand gradually

Don't block everything on day one. Start with obvious junk (admin panels, APIs, query strings). After a month, check Google Search Console to see if anything important got blocked. Then adjust.

Tip 2: Use a sitemap to guide crawlers

Your sitemap is a roadmap. Tell crawlers about it in robots.txt:

Sitemap: https://yourdomain.com/sitemap.xml

This is especially important if your site has pages that aren't linked from the homepage.

Tip 3: Block query strings to prevent duplicate content

Query strings create duplicate content. Block them:

Disallow: /*?

Or be specific:

Disallow: /*?utm_
Disallow: /*?fbclid=
Disallow: /*?gclid=

This prevents tracking parameters from creating indexed duplicates.

Tip 4: Document your decisions

Add comments to your robots.txt explaining why you made each decision:

# Block admin panel (internal use only)
Disallow: /admin

# Block API endpoints (machine-readable, not for humans)
Disallow: /api

# Block search results to prevent duplicate content
Disallow: /search?

When you come back to this file in six months, you'll remember why you made these choices.

Tip 5: Use robots.txt with meta tags for defense-in-depth

robots.txt blocks crawling. Meta tags block indexing. Use both:

<!-- In your HTML head -->
<meta name="robots" content="noindex">

This is your belt-and-suspenders approach. If a crawler ignores robots.txt, the meta tag will still prevent indexing.

Real-World Examples

Example 1: SaaS Founder (e.g., project management tool)

# robots.txt for projectmanager.io
# SaaS application - blocking user accounts and internal tools

User-Agent: *
Disallow: /app/
Disallow: /dashboard/
Disallow: /account/
Disallow: /settings/
Disallow: /billing/
Disallow: /api/
Disallow: /admin/
Disallow: /search?
Disallow: /filter?
Disallow: /*?sort=
Disallow: /*?page=

Sitemap: https://projectmanager.io/sitemap.xml

This blocks all user-facing app pages (they're behind login anyway), internal APIs, and query strings that create duplicates. Public pages like marketing, pricing, and blog are allowed.

Example 2: Content/Blog Founder

# robots.txt for techblog.com
# Blog site - blocking admin and duplicate content

User-Agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /draft/
Disallow: /private/
Disallow: /search?
Disallow: /tag?
Disallow: /category?
Disallow: /*?utm_
Disallow: /*?fbclid=
Disallow: /*.pdf$

Sitemap: https://techblog.com/sitemap.xml
Sitemap: https://techblog.com/sitemap-posts.xml

This blocks WordPress admin, draft posts, search results, and tracking parameters. Blog posts and public pages are fully crawlable.

Example 3: E-Commerce Founder

# robots.txt for shopstore.com
# E-commerce site - blocking cart, checkout, and duplicate content

User-Agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /api/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?page=
Disallow: /*?utm_

Sitemap: https://shopstore.com/sitemap.xml
Sitemap: https://shopstore.com/sitemap-products.xml

This blocks checkout flows, user accounts, and query strings that create duplicate product pages (same product sorted differently, same product filtered differently, etc.).

Connecting robots.txt to Your Broader SEO Strategy

robots.txt is one piece of your technical SEO foundation. It works alongside:

Your sitemap — tells crawlers where your pages are
Your meta tags — tells crawlers what to index
Your site structure — makes it easy for crawlers to navigate
Your crawlability — ensures crawlers can actually render your pages

For a complete picture, read Crawlability for Founders: A Plain-English Primer. It explains how robots.txt fits into crawlability, rendering, and indexing.

If you're building your SEO strategy from scratch, follow Week 1 of SEO: What a Busy Founder Should Actually Ship. robots.txt is part of week one.

For a complete first 100 days of SEO, see Your First 100 Days of SEO: A Day-by-Day Founder Playbook. You'll set up robots.txt early, then build on it with keyword research, content, and link building.

Key Takeaways

Your robots.txt file is a simple text file that tells crawlers what to crawl and what to skip. It's not magic. It's not a security tool. It's a technical foundation that prevents crawlers from wasting time on junk pages.

Here's what you need to ship:

Create a robots.txt file with your text editor
Use the template provided and customize it for your site
Block obvious junk: admin panels, APIs, query strings, internal tools
Include your sitemap so crawlers can find all your pages
Upload it to your domain's root at https://yourdomain.com/robots.txt
Test it using Google Search Console's robots.txt tester
Monitor it monthly and update as your site grows

Don't overthink it. A simple, well-documented robots.txt file is better than a complex one you don't understand. Start with the template, test it, and adjust based on what you learn from Google Search Console.

Remember: robots.txt is a suggestion, not a guarantee. Use it alongside authentication, meta tags, and real security for sensitive data. For more on how robots.txt fits into your complete SEO strategy, check out SEO Triage for Busy Founders: The 80/20 You Can't Skip.

You shipped. Now make sure Google can find you. A working robots.txt file takes 10 minutes. Do it today.

Additional Resources

For deeper technical details, consult Robots.txt Guide: Essential Rules & Disallow Best Practices from Conductor Academy. For official Google guidance, see Robots.txt Introduction and Guide from Google Search Central.

For best practices and validation tools, What Is A Robots.txt File? A Guide to Best Practices and Syntax from Moz provides comprehensive coverage. The ultimate guide to robots.txt from Yoast offers detailed examples and common pitfalls.

If you're managing bots beyond search engines, What is robots.txt? | Robots.txt file guide from Cloudflare explains bot management at scale. For U.S. government standards, An introduction to robots.txt files from Digital.gov provides official guidance.

For technical SEO troubleshooting, The keys to building a Robots.txt that works from Oncrawl covers testing and validation. And for absolute beginners, The Ultimate Robots.txt Guide for Beginners walks through every directive step-by-step.

Once your robots.txt is live, move on to The 30-Day SEO Sprint: A Busy Founder's First Month to build the rest of your SEO foundation. Your domain audit, keyword roadmap, and AI-generated content are next.

If you want to accelerate this process, Seoable delivers a complete domain audit, brand positioning, keyword roadmap, and 100 AI-generated blog posts in under 60 seconds for a one-time $99 fee. robots.txt is just the beginning. The real work is content.

Writing Your First robots.txt File: A Founder's Template

Prerequisites: What You Need Before You Start

What robots.txt Actually Does (And Doesn't)

The Anatomy of a robots.txt File: Directives Explained

User-Agent

Disallow

Allow

Sitemap

Crawl-Delay

Request-Rate

Step 1: Create Your robots.txt File

Step 2: Write the Basic Template

Step 3: Customize for Your Site

Step 4: Handle Edge Cases

Blocking Specific File Types While Allowing Others

Blocking Query Strings

Multiple User-Agents with Different Rules

Blocking AI Training Bots

Step 5: Upload Your robots.txt File

Step 6: Test Your robots.txt File

Step 7: Monitor and Update

Common Mistakes Founders Make with robots.txt

Pro Tips for Founders

Real-World Examples

Example 1: SaaS Founder (e.g., project management tool)

Example 2: Content/Blog Founder

Example 3: E-Commerce Founder

Connecting robots.txt to Your Broader SEO Strategy

Key Takeaways

Additional Resources

Get the nextdispatch on Monday.

Get the next
dispatch on Monday.