What is ExaBot?

ExaBot is a web crawler that indexes web content to power Exa's AI search engine and semantic search APIs for AI applications. You can use Known Agents (formerly Dark Visitors) Agent Analytics to see when ExaBot visits your website.

Agent Type

AI Data Provider
Crawls websites to supply structured content to AI systems as a third-party service

Expected Behavior

AI data providers are API services that crawl, scrape, and index the web to supply structured data to AI models, agents, and applications. They act as intermediaries between the open web and AI systems, converting web content into LLM-ready formats for training, retrieval-augmented generation (RAG), search, and other AI workflows. Traffic from these services can be high-volume and systematic, as they maintain their own indexes or crawl on-demand in response to API requests from their customers. A single provider may serve thousands of downstream AI applications, amplifying the reach of each crawl.

Detail

Operated By Exa
Last Updated 7 hours ago

Top Website Robots.txts

1%
1% of top websites are blocking ExaBot
Learn How →

Country of Origin

United States
ExaBot normally visits From the United States

Top Website Blocking Trend Over Time

The percentage of the world's top 1000 websites who are blocking ExaBot

Overall AI Data Provider Traffic

The percentage of all internet traffic coming from AI data providers

Top Visited Website Categories

People and Society
Sports
Computers and Electronics
Home and Garden
Business and Industrial
How Do I Get These Insights for My Website?
Use the WordPress plugin, Node.js package, or API to get started in seconds.

User Agent String

Example Mozilla/5.0 (compatible; ExaBot/1.0; +https://exa.ai)

Access other known user agent strings and recent IP addresses using the API.

Robots.txt

In this example, all pages are blocked. You can customize which pages are off-limits by swapping out / for a different disallowed path.

User-agent: ExaBot # https://knownagents.com/agents/exabot
Disallow: /
How Do I Block All AI Data Providers?
⚠️ Manually copying and pasting this rule is not scalable, because new AI data providers are discovered every day. Instead, serve a robots.txt that updates automatically.

Frequently Asked Questions About ExaBot

Should I Block ExaBot?

Consider your priorities. ExaBot crawls websites on behalf of its customers to supply data for AI training, search, and retrieval-augmented generation. Your content may be redistributed to many downstream AI applications through a single provider. You may want to block it if you're concerned about how your content is being used across those systems, or allow it if you value the discoverability and reach it can provide.

How Do I Block ExaBot?

If you want to, you can block or limit ExaBot's access by configuring user agent token rules in your robots.txt file. The best way to do this is using Automatic Robots.txt, which update automatically as new agents are discovered. While the vast majority of agents operated by reputable companies honor these robots.txt directives, bad actors may choose to ignore them entirely. In that case, you'll need to implement alternative blocking methods such as firewall rules or server-level restrictions. You can verify whether ExaBot is respecting your rules by setting up Agent Analytics to monitor its visits to your website.

Will Blocking ExaBot Hurt My SEO?

Blocking AI data providers has no direct impact on traditional SEO rankings since they don't control search engine indexing. However, these services feed content into AI search engines, RAG pipelines, and conversational AI platforms. Blocking them could reduce your content's representation across multiple AI-powered discovery channels simultaneously, since a single provider may supply data to many downstream applications.

Does ExaBot Access Private Content?

AI data providers typically crawl publicly accessible web content to build their indexes and fulfill API requests. Some providers operate large-scale proxy networks and may attempt to access content aggressively or bypass rate limits. The scope depends on what their customers request and the provider's own indexing priorities. Most focus on public content, but their scale and the diversity of downstream use cases mean your content could be accessed more broadly than with a single-purpose crawler.

How Can I Tell if ExaBot Is Visiting My Website?

Setting up Agent Analytics will give you realtime visibility into ExaBot visiting your website, along with hundreds of other AI agents, crawlers, and scrapers. This will also let you measure human traffic to your website coming from AI search and chat LLM platforms like ChatGPT, Perplexity, and Gemini.

Why Is ExaBot Visiting My Website?

ExaBot crawled your site to fulfill data requests from its customers or to build and maintain its own web index. Your site was likely identified as containing content relevant to AI training datasets, search indexes, or retrieval-augmented generation pipelines. The crawl may have been triggered by a specific customer API request or as part of the provider's broader web indexing efforts.

How Can I Authenticate Visits From ExaBot?

Agent Analytics authenticates agent visits from many agents, letting you know whether each one was actually from that agent, or spoofed by a bad actor. This helps you identify suspicious traffic patterns and make informed decisions about blocking or allowing specific user agents.

References