Robots.txt Configuration Guide: Control What Search Engines Crawl
Robots.txt is one of the most powerful — and dangerous — SEO tools. A single misplaced directive can block Google from crawling your entire site. Understanding robots.txt is essential for every SEO professional.
What Robots.txt Controls
Robots.txt tells search engine crawlers which URLs they can and cannot access. It doesn't remove pages from the index (use noindex for that), but it prevents crawling — which effectively blocks indexing of new pages.
Essential Directives
User-agent: Specifies which crawler the rules apply to. Use `*` for all crawlers or target specific bots. Disallow: Blocks crawling of specified paths. Allow: Overrides Disallow for specific sub-paths. Sitemap: Points crawlers to your XML sitemap location. Crawl-delay: Requests a delay between requests (not supported by Google).
Common Configuration Patterns
- Block admin areas, user dashboards, and internal search pages
- Allow all public content directories
- Block faceted navigation parameters that create duplicate content
- Point to sitemap.xml location
- Block development/staging areas if accessible
Dangerous Mistakes to Avoid
- Accidentally disallowing `/` (blocks entire site)
- Blocking CSS/JS files (prevents proper rendering)
- Using robots.txt instead of noindex (pages can still appear in index)
- Forgetting to update after site migration
- Not testing changes before deploying
- Vincony's Site Audit validates your robots.txt configuration and flags issues that could impact crawling and indexation.
Related Articles
Most brands have no idea whether AI search engines mention them. This audit playbook shows you how to measure your AI visibility end-to-end using Vincony's SEO Studio.
SEO ToolsComplete Guide to Vincony's SEO Studio: 6 Tools Every Marketer NeedsKeyword Research, Backlink Analysis, Rank Tracking, Site Audit, Content Analysis, and AI Overviews — all in one affordable platform.
SEO ToolsRank Tracking in 2026: Monitor Your Keywords with Real Google DataTrack keyword rankings across Google, Bing, and regional search engines with real-time data.