📄 ArXiv Knowledge Graph · Live · EU-Hosted · Apr 2026

The AI Research Graph
for Every Agent.

papers. researchers. citations mapped.
One MCP server. Ask anything about AI research — and get the truth.

Papers
Authors
Citations
GitHub Repos
12
MCP Tools
https://arxiv.mcp.brunosan.de/mcp
What makes this different

Not a search engine. A knowledge graph.

Semantic Scholar gives you a website. OpenAlex gives you a REST API. BrunoSan gives you a chatbar MCP — fully queryable by any AI agent, deterministic, UUID-stable, EU-hosted. No other system exposes all of this as MCP.

🔍
Deterministisch

Every paper is a UUID-stable object. Every connection is an explicit edge. det_uuid("arxiv_paper", arxiv_id) — same input, same UUID. Always. No vectors, no approximation, no hallucination. Pure SQL over verified ArXiv data.

🔗
Vernetzt

Citation graph. Co-occurrence network. Author → Institution edges. Paper → GitHub repo links. Questions that no other platform answers as an API: "Which papers cite this paper?" "Where do LoRA and MMLU overlap?" Two JOINs. No inference.

📈
Historisch

Entity trends across months and years. When did LoRA explode? When did RAG become mainstream? When did Chain-of-Thought peak? Every data point is deterministic — grouped by month, quarter, or year. Reproducible. Auditable.

cs.AI — Artificial Intelligence
cs.LG — Machine Learning
cs.CL — NLP & LLMs
cs.CV — Computer Vision
cs.RO — Robotics
Queries no other platform answers

Three questions. Zero alternatives.

These are the questions every AI researcher, investor, and journalist asks first. No website answers them. No other MCP exposes them. BrunoSan does — deterministically.

01

"Which 20 papers are cited most often in AI research right now?"

arxiv_most_cited(limit=20)

COUNT(refs.target_arxiv_id) GROUP BY — pure SQL over citation edges. No inference. The most cited paper in our database is the most cited paper in our database — not an opinion.

★ No other MCP exposes this
02

"How did LoRA grow — month by month from 2023 to today?"

arxiv_entity_trend("LoRA", granularity="month")

Tracks entity mentions across papers over time. Watch LoRA explode in early 2023. Watch RAG go mainstream. Watch BERT slowly decline. Grouped by month, quarter, or year — fully deterministic.

★ 30+ monthly data points per entity
03

"Show me all papers mentioning GPT-4 AND RLHF together."

arxiv_co_occurrence("GPT-4", "RLHF")

Intersection over papers via two JOINs. Reveals research that explicitly bridges two concepts — not just papers mentioning one. The papers where LoRA meets MMLU, or RAG meets Chain-of-Thought.

★ No hallucination — explicit graph edges only
The Toolkit

12 Tools. Everything your agent needs.

Four capability blocks — Search, Trends, People, Graph. Each tool is purpose-built, read-only, deterministic. Every result is citable. Every query is reproducible.

🔍 Block A — Search & Discovery
01 arxiv_search_papers

Full-text search over + papers using FTS5 — the same engine that powers SQLite's fastest search. Searches title AND abstract simultaneously. Supports boolean operators, phrase matching, and prefix search. Filter by category, date range, empirical-only, or papers with code release.

arxiv_search_papers(
  query="LoRA fine-tuning efficiency",
  category="cs.CL",
  date_from="2024-01-01",
  has_code_only=True,
  limit=20
)
"Find all papers on LoRA fine-tuning published in cs.CL since January 2024 — only those with code release"
"Search for 'chain of thought' papers in cs.AI — phrase match, last 6 months, empirical only"
"Find papers on diffusion models NOT about image generation — use FTS5 NOT operator"
02 arxiv_get_paper

Complete paper object with all connected data — authors with their position (first/middle/last), matched entities (benchmarks, models, methods, datasets), full reference list (up to 100), and linked GitHub repos. The full knowledge graph node in one call.

arxiv_get_paper(
  arxiv_id="2302.13971" // LLaMA
)
→ paper metadata + 5 authors + 12 entities
→ 87 references + 2 GitHub repos
"Get the full details on 2402.01234 — authors, which entities it mentions, and all its references"
"What GitHub repos does the LLaMA paper link to? Who are the first and last authors?"
"Show me the complete reference list for 2309.10814 — how many of its citations are also in the database?"
📈 Block B — Trends & Entities
03 arxiv_top_entities

Entity ranking by mention count across all papers — benchmarks, models, methods, datasets. The title_only flag is a precision filter: MMLU in the title means the paper IS about MMLU. MMLU in the abstract means it just uses it. That distinction exists nowhere else.

arxiv_top_entities(
  type="benchmark",
  date_from="2025-01-01",
  title_only=True, // papers WHERE entity IS the topic
  limit=20
)
"Which benchmarks dominate cs.AI papers in 2025 — only those appearing in paper titles?"
"Show the top 20 most-used methods in cs.LG right now — ranked by mention count"
"Which datasets are used most in cs.CV research? All types, no date filter"
04 arxiv_entity_trend

How often is an entity mentioned — per month, quarter, or year? Tracks the rise and fall of any benchmark, model, method, or dataset across the entire research literature. Watch LoRA explode. Watch BERT decline. Watch RAG go from niche to mainstream. All deterministic.

arxiv_entity_trend(
  entity_name="LoRA",
  granularity="month" // or quarter, year
)
→ [{period: "2023-01", count: 12}, ...]
"Show me the monthly growth of LoRA mentions from 2023 to today — when did it peak?"
"How has RAG evolved quarterly since 2023? Is it still growing or plateauing?"
"Compare Chain-of-Thought yearly mentions — 2023, 2024, 2025. Is it still dominant?"
👥 Block C — People & Institutions
05 arxiv_top_authors

Researchers ranked by paper count — with a critical role filter. role=last_author is the PI filter: in academic AI, the last author IS the lab director, the grant holder, the research agenda setter. role=first_author finds who does the work. No other research intelligence system exposes this distinction as an API.

arxiv_top_authors(
  role="last_author", // PI filter — lab directors
  category="cs.LG",
  date_from="2025-01-01",
  limit=20
)
"Which PIs (last authors) lead the most cs.LG research in 2025? Show lab directors, not PhD students"
"Who are the top 20 most prolific first authors in cs.AI — the researchers actually writing the papers?"
"Which cs.RO researchers published the most in Q1 2026 — any role, any position?"
06 arxiv_author_papers

All papers by a specific researcher — newest first, with their position on each paper. Fuzzy name matching handles variations. Shows first/middle/last role per paper, LLM task classification, and one-sentence contribution summary (where available).

arxiv_author_papers(
  author_name="Karpathy", // partial match works
  limit=30
)
→ papers[] with position + role per paper
"Show me all papers by Andrej Karpathy — what was his role (first/last/middle) on each?"
"What has Yann LeCun published since 2023? Newest first, with contribution summaries"
"Find all papers where Geoffrey Hinton was the last author — the ones he supervised"
10 arxiv_institution_ranking

Institutions ranked by paper count — with an optional second signal from GitHub org links. Affiliation data comes from ArXiv HTML parsing. GitHub orgs (openai, google-deepmind, microsoft) are often more complete. Combining both signals gives the most accurate picture of who produces AI research — and who ships the code.

arxiv_institution_ranking(
  include_github_orgs=True, // two signals
  date_from="2025-01-01",
  limit=20
)
"Which universities and labs produce the most cs.AI research? Include GitHub org signal"
"Rank institutions by paper count in 2025 — show both affiliation data and GitHub org data"
"Which research labs published the most cs.RO papers in the last 12 months?"
🔗 Block D — Graph & Network · ★ Exclusive Queries
07 arxiv_most_cited ★ Exclusive

The most cited papers in the database — ranked by inbound citation count. COUNT(refs.target_arxiv_id) GROUP BY over + citation edges. Pure SQL. No inference. This is the question every researcher, VC, and journalist asks first. No other platform answers it as a queryable API.

arxiv_most_cited(
  category="cs.CL",
  date_from="2024-01-01",
  limit=20
)
→ [{arxiv_id, citation_count, title, ...}]
"What are the 20 most cited papers in cs.LG — the papers that define the field right now?"
"Which cs.CL papers published in 2024 are already accumulating the most citations?"
"Show me the most cited robotics papers in cs.RO from the last 12 months — ranked by inbound citations"
08 arxiv_citation_network

Citation graph for any paper — who cites it, or what does it cite. Direction cited_by finds papers in our database that reference this paper. Direction citing shows its full reference list with ArXiv IDs. Depth 2 expands one hop further — the papers that cite the papers that cite it.

arxiv_citation_network(
  arxiv_id="2302.13971", // LLaMA
  direction="cited_by",
  depth=2
)
"Which papers in the database cite the LLaMA paper (2302.13971)? Show depth 2"
"What does GPT-3 (2005.14165) cite? Show its outbound references with titles"
"Find everything building on the Attention is All You Need paper — cited_by, depth 1"
09 arxiv_co_occurrence ★ Exclusive

Papers that mention BOTH entity A and entity B — the intersection over + papers via two JOINs. Reveals research that explicitly bridges two concepts. The papers where LoRA meets MMLU. Where RAG meets Chain-of-Thought. Where GPT-4 meets RLHF. No hallucination — explicit graph edges only.

arxiv_co_occurrence(
  entity_a="LoRA",
  entity_b="MMLU",
  date_from="2024-01-01"
)
→ papers where BOTH appear — explicit edges
"Find all papers mentioning both GPT-4 AND RLHF — the fine-tuning alignment intersection"
"Which papers combine LoRA and MMLU? Show me research benchmarking fine-tuned models"
"Find papers where RAG and Chain-of-Thought appear together — both must be in the same paper"
11 arxiv_repo_landscape ★ Exclusive

GitHub organizations and repositories ranked by paper count — the open-source output of the research community. repos.org extracted from + GitHub links in paper HTML. Shows which orgs (openai, google-deepmind, microsoft, huggingface) ship the most research code. Filterable by org name and date.

arxiv_repo_landscape(
  org_filter="google-deepmind",
  date_from="2025-01-01"
)
→ org_ranking[] + repo_ranking[]
"Which GitHub organizations publish the most research code? Rank by paper count"
"Show me all repos from google-deepmind linked to papers in 2025 — how many papers per repo?"
"Which orgs released the most cs.AI code in 2025 — compare openai vs huggingface vs microsoft"
⚙️ Block E — System
12 arxiv_pipeline_status

Full system status in one call — database counts, pipeline progress percentages, frontier date (how far back the backfill has reached), quality report from the last automated check, and the 5 most recent quality log entries. The health dashboard for the entire knowledge graph.

arxiv_pipeline_status()
→ counts: papers, authors, refs, repos, entities
→ pipeline: html_fetched 6.4% · wl_done 32.2%
→ frontier: "2025-03-09" · alert: null
→ quality_status: "OK" · last_run: "2026-04-05"
"How many papers are in the database right now? What's the current frontier date?"
"What percentage of papers have been HTML-parsed? How far has the pipeline progressed?"
"Show me the quality report — any warnings or critical issues in the last check?"
Integration

Connect in 60 seconds.

One URL. Any MCP-compatible agent. API key required for all tool calls.

Direct MCP URL

For Claude.ai, Claude Code, Cursor, n8n, and any MCP-compatible platform. Copy the URL — that's it.

https://arxiv.mcp.brunosan.de/mcp
1Claude.ai: Settings → Integrations → Add MCP Server
2Paste https://arxiv.mcp.brunosan.de/mcp
3Done. Each tool call requires your api_key parameter.

Full Access Bundle

All intelligence verticals — one API key. AI News · Biotech · Crypto · Cyber · Finance · Geopolitics · Regulatory · Robotics · ArXiv.

1Get Full Access key — works across all MCP servers
2Add any vertical MCP URL — same key everywhere
3Cross-domain intelligence: research meets news meets markets

Full Bundle — €150/month →

Start with a trial. Scale when it matters.

Free Trial
Free / 24h
on request — experience the full stack
All 12 MCP tools
10 calls / min
24h data window
Citation graph access
Entity trend queries
Request Trial →
Full MCP Access
€150 / month
all 9 intelligence domains
All verticals incl. ArXiv
Unlimited MCP calls
Cross-domain intelligence
ArXiv × News × Markets
All webhook alerts
Priority support
Get All Domains →

All payments via Mollie — secure, EU-based payment processing. Questions? hello@brunosan.de

Who builds with it

Who builds with it?

📚
AI Researchers

Find the most cited papers in your area. Track which benchmarks dominate. Discover who is building on your work. Citation network in one API call.

💰
VCs & Investors

Spot research trends before they reach product. Identify the PIs whose labs become startups. Track which orgs ship the most research code on GitHub.

🤖
AI Agent Builders

Give your agent real research context — not hallucinated summaries. Ask about SOTA methods, benchmark results, and paper citations. All deterministic.

📰
Journalists & Analysts

"What is the most cited AI paper of 2025?" — one tool call. Entity trends over time. Institutional rankings. All citable, all sourced from ArXiv directly.

🎓
Educators

Live MCP demo for students — show how AI agents query research databases in real time. The ArXiv graph is the perfect hands-on MCP integration example.

🏢
Enterprise R&D

Track competitor research labs over time. Monitor which methods your competitors are publishing on. Feed research signals into internal knowledge bases.

✓ URL copied to clipboard