BrunoSan ArXiv MCP — The AI Research Graph for Every Agent. 12 Tools. 125K+ Papers.

01 arxiv_search_papers

Full-text search over —+ papers using FTS5 — the same engine that powers SQLite's fastest search. Searches title AND abstract simultaneously. Supports boolean operators, phrase matching, and prefix search. Filter by category, date range, empirical-only, or papers with code release.

arxiv_search_papers(
  query="LoRA fine-tuning efficiency",
  category="cs.CL",
  date_from="2024-01-01",
  has_code_only=True,
  limit=20
)

→"Find all papers on LoRA fine-tuning published in cs.CL since January 2024 — only those with code release"

→"Search for 'chain of thought' papers in cs.AI — phrase match, last 6 months, empirical only"

→"Find papers on diffusion models NOT about image generation — use FTS5 NOT operator"

02 arxiv_get_paper

Complete paper object with all connected data — authors with their position (first/middle/last), matched entities (benchmarks, models, methods, datasets), full reference list (up to 100), and linked GitHub repos. The full knowledge graph node in one call.

arxiv_get_paper(
arxiv_id="2302.13971" // LLaMA
)
→ paper metadata + 5 authors + 12 entities
→ 87 references + 2 GitHub repos

→"Get the full details on 2402.01234 — authors, which entities it mentions, and all its references"

→"What GitHub repos does the LLaMA paper link to? Who are the first and last authors?"

→"Show me the complete reference list for 2309.10814 — how many of its citations are also in the database?"

03 arxiv_top_entities

Entity ranking by mention count across all papers — benchmarks, models, methods, datasets. The title_only flag is a precision filter: MMLU in the title means the paper IS about MMLU. MMLU in the abstract means it just uses it. That distinction exists nowhere else.

arxiv_top_entities(
  type="benchmark",
  date_from="2025-01-01",
  title_only=True, // papers WHERE entity IS the topic
  limit=20
)

→"Which benchmarks dominate cs.AI papers in 2025 — only those appearing in paper titles?"

→"Show the top 20 most-used methods in cs.LG right now — ranked by mention count"

→"Which datasets are used most in cs.CV research? All types, no date filter"

04 arxiv_entity_trend

How often is an entity mentioned — per month, quarter, or year? Tracks the rise and fall of any benchmark, model, method, or dataset across the entire research literature. Watch LoRA explode. Watch BERT decline. Watch RAG go from niche to mainstream. All deterministic.

arxiv_entity_trend(
entity_name="LoRA",
granularity="month" // or quarter, year
)
→ [{period: "2023-01", count: 12}, ...]

→"Show me the monthly growth of LoRA mentions from 2023 to today — when did it peak?"

→"How has RAG evolved quarterly since 2023? Is it still growing or plateauing?"

→"Compare Chain-of-Thought yearly mentions — 2023, 2024, 2025. Is it still dominant?"

05 arxiv_top_authors

Researchers ranked by paper count — with a critical role filter. role=last_author is the PI filter: in academic AI, the last author IS the lab director, the grant holder, the research agenda setter. role=first_author finds who does the work. No other research intelligence system exposes this distinction as an API.

arxiv_top_authors(
  role="last_author", // PI filter — lab directors
  category="cs.LG",
  date_from="2025-01-01",
  limit=20
)

→"Which PIs (last authors) lead the most cs.LG research in 2025? Show lab directors, not PhD students"

→"Who are the top 20 most prolific first authors in cs.AI — the researchers actually writing the papers?"

→"Which cs.RO researchers published the most in Q1 2026 — any role, any position?"

06 arxiv_author_papers

All papers by a specific researcher — newest first, with their position on each paper. Fuzzy name matching handles variations. Shows first/middle/last role per paper, LLM task classification, and one-sentence contribution summary (where available).

arxiv_author_papers(
author_name="Karpathy", // partial match works
limit=30
)
→ papers[] with position + role per paper

→"Show me all papers by Andrej Karpathy — what was his role (first/last/middle) on each?"

→"What has Yann LeCun published since 2023? Newest first, with contribution summaries"

→"Find all papers where Geoffrey Hinton was the last author — the ones he supervised"

10 arxiv_institution_ranking

Institutions ranked by paper count — with an optional second signal from GitHub org links. Affiliation data comes from ArXiv HTML parsing. GitHub orgs (openai, google-deepmind, microsoft) are often more complete. Combining both signals gives the most accurate picture of who produces AI research — and who ships the code.

arxiv_institution_ranking(
  include_github_orgs=True, // two signals
  date_from="2025-01-01",
  limit=20
)

→"Which universities and labs produce the most cs.AI research? Include GitHub org signal"

→"Rank institutions by paper count in 2025 — show both affiliation data and GitHub org data"

→"Which research labs published the most cs.RO papers in the last 12 months?"

07 arxiv_most_cited ★ Exclusive

The most cited papers in the database — ranked by inbound citation count. COUNT(refs.target_arxiv_id) GROUP BY over —+ citation edges. Pure SQL. No inference. This is the question every researcher, VC, and journalist asks first. No other platform answers it as a queryable API.

arxiv_most_cited(
  category="cs.CL",
  date_from="2024-01-01",
  limit=20
)
→ [{arxiv_id, citation_count, title, ...}]

→"What are the 20 most cited papers in cs.LG — the papers that define the field right now?"

→"Which cs.CL papers published in 2024 are already accumulating the most citations?"

→"Show me the most cited robotics papers in cs.RO from the last 12 months — ranked by inbound citations"

08 arxiv_citation_network

Citation graph for any paper — who cites it, or what does it cite. Direction cited_by finds papers in our database that reference this paper. Direction citing shows its full reference list with ArXiv IDs. Depth 2 expands one hop further — the papers that cite the papers that cite it.

arxiv_citation_network(
  arxiv_id="2302.13971", // LLaMA
  direction="cited_by",
  depth=2
)

→"Which papers in the database cite the LLaMA paper (2302.13971)? Show depth 2"

→"What does GPT-3 (2005.14165) cite? Show its outbound references with titles"

→"Find everything building on the Attention is All You Need paper — cited_by, depth 1"

09 arxiv_co_occurrence ★ Exclusive

Papers that mention BOTH entity A and entity B — the intersection over —+ papers via two JOINs. Reveals research that explicitly bridges two concepts. The papers where LoRA meets MMLU. Where RAG meets Chain-of-Thought. Where GPT-4 meets RLHF. No hallucination — explicit graph edges only.

arxiv_co_occurrence(
  entity_a="LoRA",
  entity_b="MMLU",
  date_from="2024-01-01"
)
→ papers where BOTH appear — explicit edges

→"Find all papers mentioning both GPT-4 AND RLHF — the fine-tuning alignment intersection"

→"Which papers combine LoRA and MMLU? Show me research benchmarking fine-tuned models"

→"Find papers where RAG and Chain-of-Thought appear together — both must be in the same paper"

11 arxiv_repo_landscape ★ Exclusive

GitHub organizations and repositories ranked by paper count — the open-source output of the research community. repos.org extracted from —+ GitHub links in paper HTML. Shows which orgs (openai, google-deepmind, microsoft, huggingface) ship the most research code. Filterable by org name and date.

arxiv_repo_landscape(
org_filter="google-deepmind",
date_from="2025-01-01"
)
→ org_ranking[] + repo_ranking[]

→"Which GitHub organizations publish the most research code? Rank by paper count"

→"Show me all repos from google-deepmind linked to papers in 2025 — how many papers per repo?"

→"Which orgs released the most cs.AI code in 2025 — compare openai vs huggingface vs microsoft"

12 arxiv_pipeline_status

Full system status in one call — database counts, pipeline progress percentages, frontier date (how far back the backfill has reached), quality report from the last automated check, and the 5 most recent quality log entries. The health dashboard for the entire knowledge graph.

arxiv_pipeline_status()
→ counts: papers, authors, refs, repos, entities
→ pipeline: html_fetched 6.4% · wl_done 32.2%
→ frontier: "2025-03-09" · alert: null
→ quality_status: "OK" · last_run: "2026-04-05"

→"How many papers are in the database right now? What's the current frontier date?"

→"What percentage of papers have been HTML-parsed? How far has the pipeline progressed?"

→"Show me the quality report — any warnings or critical issues in the last check?"

The AI Research Graph
for Every Agent.

Not a search engine. A knowledge graph.

Three questions. Zero alternatives.

12 Tools. Everything your agent needs.

Connect in 60 seconds.

Direct MCP URL

Full Access Bundle

Start with a trial. Scale when it matters.

Who builds with it?

The AI Research Graphfor Every Agent.