Robots.txt Configuration: Complete Setup Guide
Robots.txt is deceptively simple — a few lines of plain text at your domain root that control how search engines crawl your entire site. Get it right and you steer crawlers toward your important content and away from crawl-budget-wasting clutter. Get it wrong and you can accidentally deindex your most valuable pages or block the CSS and JavaScript Google needs to render them. It's one of the highest-leverage (and highest-risk) files on your site. This guide covers how to configure it correctly, with Vincony's Site Audit catching mistakes before they cost you traffic.
One important caveat up front: robots.txt controls *crawling*, not *indexing*. A disallowed page can still be indexed if it's linked elsewhere — to keep a page out of the index, use a noindex tag, not robots.txt.
Step 1: Understand Robots.txt Basics
- Robots.txt sits at your domain root (example.com/robots.txt) and uses three key directives:
- User-agent: Which crawler the rules apply to
- Disallow: Paths the crawler should not access
- Allow: Overrides Disallow for specific sub-paths
- Sitemap: Points crawlers to your XML sitemap
Step 2: Identify What to Block
- Block paths that waste crawl budget or expose unwanted content:
- Admin areas (/admin/, /wp-admin/)
- Internal search results (/search?)
- User-specific pages (/my-account/, /dashboard/)
- Faceted navigation parameters (?sort=, ?filter=)
- Duplicate content paths (/print/, /amp/ if not used)
- Staging or development environments
Step 3: Ensure Critical Content Is Allowed
- Never block:
- CSS and JavaScript files (Google needs them to render pages)
- Public content directories (/blog/, /products/)
- Image and media directories (blocks image indexing)
- Your sitemap.xml
Step 4: Test Before Deploying
Use Google Search Console's robots.txt tester to verify your rules before going live. Test URLs from every major section of your site to ensure nothing important is accidentally blocked.
Step 5: Monitor with Site Audit
- Run Vincony's Site Audit to detect robots.txt issues:
- Pages accidentally blocked from crawling
- Contradictory directives (Allow and Disallow for same path)
- Missing sitemap reference
- Rules that don't match your current site structure
Common Mistakes
- Using `Disallow: /` (blocks entire site)
- Blocking CSS/JS files (prevents rendering)
- Using robots.txt instead of noindex meta tag
- Forgetting to update after redesigns or migrations
- Not including trailing slashes correctly
Key Takeaways
- Block crawl-budget wasters — admin, internal search, user-specific, and faceted-navigation paths
- Never block CSS, JS, or public content — Google needs assets to render pages correctly
- Remember: robots.txt blocks crawling, not indexing — use noindex to keep pages out of the index
- Always test changes in Search Console before deploying
- Include your sitemap URL and re-audit after redesigns or migrations
Frequently Asked Questions
What is robots.txt used for?
It tells search-engine crawlers which parts of your site they may or may not crawl, using User-agent, Disallow, Allow, and Sitemap directives. It's mainly for managing crawl efficiency, not for keeping pages out of the index.
Does robots.txt prevent a page from being indexed?
No. Robots.txt controls crawling, not indexing — a disallowed page can still be indexed if other pages link to it. To keep a page out of the index, use a noindex meta tag (and don't block it in robots.txt, or Google can't see the noindex).
Should I block CSS and JavaScript in robots.txt?
Never. Google needs your CSS and JS to render and understand pages. Blocking them can cause Google to see a broken version of your site and hurt rankings.
What's the most dangerous robots.txt mistake?
Disallow: / — which blocks your entire site from crawling. It's commonly left in place accidentally after a site moves from staging to production, and it can deindex everything.
How do I test my robots.txt?
Use Google Search Console's robots.txt tester to verify your rules and check sample URLs from every major section before deploying. Re-test after any redesign or migration, when these files often break.