← Back to Blog

MCP Tool Discovery: How to Scale Beyond 1,000 Tools

The promise of MCP is that any agent can talk to any tool. But there is a problem few people talk about: what happens when your agent has access to 500 MCP servers exposing 10,000 tools?

The Tool Explosion Problem

Every MCP server exposes a list of tools, each with a name, description, and JSON schema for its parameters. The agent loads these schemas into its prompt so it knows what tools are available and how to call them.

With 10 servers and 200 tools, this works fine. The schemas fit comfortably in context. With 500 servers and 10,000 tools, the schemas alone can exceed most context windows — before the agent has done any actual work.

This is the tool explosion problem: the agent needs to know about available tools, but loading everything destroys performance and burns tokens.

Why Static Loading Fails

Current MCP clients handle tool discovery in one way: load all tools on startup. This is fine for demos with 5-10 tools. It breaks completely at scale.

The problems are compounding:

Dynamic Discovery: Load Only What Matters

The solution is dynamic tool discovery. Instead of loading all schemas upfront, the agent queries a search index to find relevant tools based on intent.

Here is how it works:

  1. The agent receives a user request: “Search GitHub for repositories about MCP.”
  2. Instead of scanning all 10,000 schemas, it sends a search query: “github, search, repository”
  3. The discovery engine returns the top 5 matching tools with their schemas.
  4. Only those 5 schemas enter context. The agent picks the right one and executes.

What a Discovery Engine Needs

A production-ready tool discovery layer needs more than keyword search:

The Bottom Line

Tool discovery is not a nice-to-have. It is the critical enabler for MCP ecosystems with more than a handful of servers. Without it, agents drown in schemas and context costs spiral.

AnyMCP solves this with a unified search layer that works across all connected MCP servers. The agent only loads what it needs, when it needs it.