OpenZIM MCP 是一款现代、安全且高性能的 MCP(Model Context Protocol)服务器,它能让 AI 模型离线访问和搜索 ZIM 格式 的知识库。该工具将静态的 ZIM 存档转变为适用于大语言模型的动态知识引擎,为大语言模型提供智能、结构化的访问方式,使其能有效浏览和理解海量的知识仓库。
# 从 PyPI 安装(推荐)
pip install openzim-mcp
对于贡献者和开发者:
# 克隆仓库
git clone https://github.com/cameronrye/openzim-mcp.git
cd openzim-mcp
# 安装依赖
uv sync
# 安装开发依赖
uv sync --dev
从 Kiwix 库 下载 ZIM 文件(例如维基百科、维基词典等),并将它们放在一个目录中:
mkdir ~/zim-files
# 将 ZIM 文件下载到 ~/zim-files/ 目录下
# 使用控制台脚本(在使用 pip 安装后)
openzim-mcp /path/to/zim/files
# 或者使用模块
python -m openzim_mcp /path/to/zim/files
# 开发环境(从源代码运行)
uv run python -m openzim_mcp /path/to/zim/files
# 或者使用 make(开发环境)
make run ZIM_DIR=/path/to/zim/files
添加到你的 MCP 客户端配置中:
{
"openzim-mcp": {
"command": "openzim-mcp",
"args": ["/path/to/zim/files"]
}
}
使用 Python 模块的替代配置:
{
"openzim-mcp": {
"command": "python",
"args": [
"-m",
"openzim_mcp",
"/path/to/zim/files"
]
}
}
开发环境(从源代码):
{
"openzim-mcp": {
"command": "uv",
"args": [
"--directory",
"/path/to/openzim-mcp",
"run",
"python",
"-m",
"openzim_mcp",
"/path/to/zim/files"
]
}
}
无需参数。
必需参数:
zim_file_path(字符串):ZIM 文件的路径。query(字符串):搜索查询词。可选参数:
limit(整数,默认值:10):返回的最大结果数。offset(整数,默认值:0):结果的起始偏移量(用于分页)。必需参数:
zim_file_path(字符串):ZIM 文件的路径。entry_path(字符串):条目的路径,例如 'A/Some_Article'。可选参数:
max_content_length(整数,默认值:100000,最小值:1000):返回内容的最大长度。智能检索特性:
必需参数:
zim_file_path(字符串):ZIM 文件的路径。返回值: 包含 ZIM 元数据的 JSON 字符串,包括条目计数、存档信息以及元数据条目,如标题、描述、语言、创建者等。
必需参数:
zim_file_path(字符串):ZIM 文件的路径。返回值: 主页内容或主页条目的相关信息。
必需参数:
zim_file_path(字符串):ZIM 文件的路径。返回值: 包含命名空间信息的 JSON 字符串,包括条目计数、描述和每个命名空间(C、M、W、X 等)的示例条目。
必需参数:
zim_file_path(字符串):ZIM 文件的路径。namespace(字符串):要浏览的命名空间(C、M、W、X、A、I 等)。可选参数:
limit(整数,默认值:50,范围:1 - 200):返回的最大条目数。offset(整数,默认值:0):分页的起始偏移量。返回值: 包含命名空间条目的 JSON 字符串,包括标题、内容预览和分页信息。
必需参数:
zim_file_path(字符串):ZIM 文件的路径。query(字符串):搜索查询词。可选参数:
namespace(字符串):可选的命名空间过滤器(C、M、W、X 等)。content_type(字符串):可选的内容类型过滤器(text/html、text/plain 等)。limit(整数,默认值:10,范围:1 - 100):返回的最大结果数。offset(整数,默认值:0):分页的起始偏移量。返回值: 包含命名空间和内容类型信息的过滤搜索结果。
必需参数:
zim_file_path(字符串):ZIM 文件的路径。partial_query(字符串):部分搜索查询(至少 2 个字符)。可选参数:
limit(整数,默认值:10,范围:1 - 50):返回的最大建议数。返回值: 基于文章标题和内容的搜索建议的 JSON 字符串。
必需参数:
zim_file_path(字符串):ZIM 文件的路径。entry_path(字符串):条目的路径,例如 'C/Some_Article'。返回值: 包含文章结构的 JSON 字符串,包括标题、章节、元数据和单词计数。
必需参数:
zim_file_path(字符串):ZIM 文件的路径。entry_path(字符串):条目的路径,例如 'C/Some_Article'。返回值: 包含分类链接(内部、外部、媒体)及其标题和元数据的 JSON 字符串。
{
"name": "list_zim_files"
}
响应:
在 1 个目录中找到 1 个 ZIM 文件:
[
{
"name": "wikipedia_en_100_2025-08.zim",
"path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
"directory": "C:\\zim",
"size": "310.77 MB",
"modified": "2025-09-11T10:20:50.148427"
}
]
{
"name": "search_zim_file",
"arguments": {
"zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
"query": "biology",
"limit": 3
}
}
响应:
找到 51 个与 "biology" 匹配的结果,显示 1 - 3 条:
## 1. Taxonomy (biology)
Path: Taxonomy_(biology)
Snippet: # Taxonomy (biology) Part of a series on
---
Evolutionary biology
Darwin's finches by John Gould
* Index
* Introduction
* [Main](Evolution "Evolution")
* Outline
## 2. Protein
Path: Protein
Snippet: # Protein A representation of the 3D structure of the protein myoglobin showing turquoise α-helices. This protein was the first to have its structure solved by X-ray crystallography. Toward the right-center among the coils, a prosthetic group called a heme group (shown in gray) with a bound oxygen molecule (red).
## 3. Ant
Path: Ant
Snippet: # Ant Ants
Temporal range: Late Aptian – Present
---
Fire ants
[Scientific classification](Taxonomy_\(biology\) "Taxonomy \(biology\)")
Kingdom: | [Animalia](Animal "Animal")
Phylum: | [Arthropoda](Arthropod "Arthropod")
Class: | [Insecta](Insect "Insect")
Order: | Hymenoptera
Infraorder: | Aculeata
Superfamily: |
Latreille, 1809[1]
Family: |
Latreille, 1809
{
"name": "get_zim_entry",
"arguments": {
"zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
"entry_path": "Protein"
}
}
响应:
# Protein
Path: Protein
Type: text/html
## Content
# Protein
A representation of the 3D structure of the protein myoglobin showing turquoise α-helices. This protein was the first to have its structure solved by X-ray crystallography. Toward the right-center among the coils, a prosthetic group called a heme group (shown in gray) with a bound oxygen molecule (red).
**Proteins** are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.
A linear chain of amino acid residues is called a polypeptide. A protein contains at least one long polypeptide. Short polypeptides, containing less than 20–30 residues, are rarely considered to be proteins and are commonly called peptides.
... [内容截断,总共 56,202 个字符,仅显示前 1,500 个字符] ...
{
"name": "get_zim_entry",
"arguments": {
"zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
"entry_path": "A/Test Article"
}
}
响应(展示智能检索工作情况):
# Test Article
Requested Path: A/Test Article
Actual Path: A/Test_Article
Type: text/html
## Content
# Test Article
This article demonstrates the smart retrieval system automatically handling
path encoding differences. The system tried "A/Test Article" directly,
then automatically searched and found "A/Test_Article".
... [内容继续] ...
无需参数。
返回值:
示例响应:
{
"status": "healthy",
"server_name": "openzim-mcp",
"allowed_directories": 1,
"cache": {
"enabled": true,
"size": 1,
"max_size": 100,
"ttl_seconds": 3600
},
"instance_tracking": {
"active_instances": 1,
"conflicts_detected": 0
}
}
无需参数。
返回值: 全面的服务器配置,包括诊断信息、验证结果和冲突检测。
示例响应:
{
"configuration": {
"server_name": "openzim-mcp",
"allowed_directories": ["/path/to/zim/files"],
"cache_enabled": true,
"config_hash": "abc123...",
"server_pid": 12345
},
"diagnostics": {
"validation_status": "healthy",
"conflicts_detected": [],
"warnings": [],
"recommendations": []
}
}
无需参数。
返回值: 详细的诊断信息,包括实例冲突、配置验证、文件可访问性检查和可操作的建议。
示例响应:
{
"status": "healthy",
"server_info": {
"pid": 12345,
"server_name": "openzim-mcp",
"config_hash": "abc123..."
},
"conflicts": [],
"issues": [],
"recommendations": ["Server appears to be running normally"],
"environment_checks": {
"directories_accessible": true,
"cache_functional": true
}
}
无需参数。
返回值: 冲突解决的结果,包括清理操作和建议。
示例响应:
{
"status": "success",
"cleanup_results": {
"stale_instances_removed": 2
},
"conflicts_found": [],
"actions_taken": ["Removed 2 stale instance files"],
"recommendations": ["No active conflicts detected"]
}
{
"name": "search_zim_file",
"arguments": {
"zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
"query": "computer",
"limit": 2
}
}
响应:
找到 39 个与 "computer" 匹配的结果,显示 1 - 2 条:
## 1. Video game
Path: Video_game
Snippet: # Video game First-generation _Pong_ console at the Computerspielemuseum Berlin
---
Platforms
## 2. Protein
Path: Protein
Snippet: # Protein A representation of the 3D structure of the protein myoglobin showing turquoise α-helices. This protein was the first to have its structure solved by X-ray crystallography. Toward the right-center among the coils, a prosthetic group called a heme group (shown in gray) with a bound oxygen molecule (red).
{
"name": "get_zim_entry",
"arguments": {
"zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
"entry_path": "Evolution",
"max_content_length": 1500
}
}
响应:
# Evolution
Path: Evolution
Type: text/html
## Content
# Evolution
Part of the Biology series on
---
****
Mechanisms and processes
* Adaptation
* Genetic drift
* Gene flow
* History of life
* Maladaptation
* Mutation
* Natural selection
* Neutral theory
* Population genetics
* Speciation
... [内容截断,总共 110,237 个字符,仅显示前 1,500 个字符] ...
{
"name": "get_zim_metadata",
"arguments": {
"zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim"
}
}
响应:
{
"entry_count": 100000,
"all_entry_count": 120000,
"article_count": 80000,
"media_count": 20000,
"metadata_entries": {
"Title": "Wikipedia (English)",
"Description": "Wikipedia articles in English",
"Language": "eng",
"Creator": "Kiwix",
"Date": "2025-08-15"
}
}
{
"name": "browse_namespace",
"arguments": {
"zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
"namespace": "C",
"limit": 5,
"offset": 0
}
}
响应:
{
"namespace": "C",
"total_in_namespace": 80000,
"offset": 0,
"limit": 5,
"returned_count": 5,
"has_more": true,
"entries": [
{
"path": "C/Biology",
"title": "Biology",
"content_type": "text/html",
"preview": "Biology is the scientific study of life..."
}
]
}
{
"name": "search_with_filters",
"arguments": {
"zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
"query": "evolution",
"namespace": "C",
"content_type": "text/html",
"limit": 3
}
}
{
"name": "get_article_structure",
"arguments": {
"zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
"entry_path": "C/Evolution"
}
}
响应:
{
"title": "Evolution",
"path": "C/Evolution",
"content_type": "text/html",
"headings": [
{"level": 1, "text": "Evolution", "id": "evolution"},
{"level": 2, "text": "History", "id": "history"},
{"level": 2, "text": "Mechanisms", "id": "mechanisms"}
],
"sections": [
{
"title": "Evolution",
"level": 1,
"content_preview": "Evolution is the change in heritable traits...",
"word_count": 150
}
],
"word_count": 5000
}
{
"name": "get_search_suggestions",
"arguments": {
"zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",
"partial_query": "bio",
"limit": 5
}
}
响应:
{
"partial_query": "bio",
"suggestions": [
{"text": "Biology", "path": "C/Biology", "type": "title_start_match"},
{"text": "Biochemistry", "path": "C/Biochemistry", "type": "title_start_match"},
{"text": "Biodiversity", "path": "C/Biodiversity", "type": "title_start_match"}
],
"count": 3
}
{
"name": "get_server_health"
}
响应:
{
"status": "healthy",
"server_name": "openzim-mcp",
"uptime_info": {
"process_id": 12345,
"started_at": "2025-09-14T10:30:00"
},
"cache_performance": {
"enabled": true,
"size": 15,
"max_size": 100,
"hit_rate": 0.85
},
"instance_tracking": {
"active_instances": 1,
"conflicts_detected": 0
}
}
{
"name": "diagnose_server_state"
}
响应:
{
"status": "healthy",
"server_info": {
"pid": 12345,
"server_name": "openzim-mcp",
"config_hash": "abc123def456..."
},
"conflicts": [],
"issues": [],
"recommendations": ["Server appears to be running normally. No issues detected."],
"environment_checks": {
"directories_accessible": true,
"cache_functional": true,
"zim_files_found": 5
}
}
{
"name": "resolve_server_conflicts"
}
响应:
{
"status": "success",
"cleanup_results": {
"stale_instances_removed": 2,
"files_cleaned": ["/home/user/.openzim_mcp_instances/server_99999.json"]
},
"conflicts_found": [],
"actions_taken": ["Removed 2 stale instance files"],
"recommendations": ["No active conflicts detected after cleanup"]
}
OpenZIM MCP 实现了一个智能条目检索系统,该系统可以自动处理 ZIM 文件中常见的路径编码不一致问题:
A/Test Article → A/Test_Article(空格转换为下划线)C/Café → C/Caf%C3%A9(URL 编码差异)A/Some-Page → A/Some_Page(连字符转换为下划线)直接条目访问:
{
"name": "get_zim_entry",
"arguments": {
"zim_file_path": "/path/to/file.zim",
"entry_path": "A/Article_Name"
}
}
条目未找到时: 系统将自动提供指导:
条目未找到:'A/Article_Name'。
该条目路径可能不存在于这个 ZIM 文件中。
尝试使用 search_zim_file() 查找可用条目,
或使用 browse_namespace() 探索文件结构。
get_zim_entry 的 max_content_length 参数必须至少为 1000 个字符。wikipedia_en_100_2025-08.zim)进行测试。\\)。当检测到问题时,OpenZIM MCP 会自动在搜索结果和文件列表中包含冲突警告:
🔍 **检测到服务器冲突**
⚠️ 与服务器 PID 12345 的配置不匹配。搜索结果可能不一致。
💡 使用 'resolve_server_conflicts()' 解决这些问题。
diagnose_server_state() 检查冲突。resolve_server_conflicts() 清理陈旧实例。get_server_health() 监控服务器健康状况,获取实例跟踪信息。OpenZIM MCP 支持通过带有 OPENZIM_MCP_ 前缀的环境变量进行配置:
# 缓存配置
export OPENZIM_MCP_CACHE__ENABLED=true
export OPENZIM_MCP_CACHE__MAX_SIZE=200
export OPENZIM_MCP_CACHE__TTL_SECONDS=7200
# 内容配置
export OPENZIM_MCP_CONTENT__MAX_CONTENT_LENGTH=200000
export OPENZIM_MCP_CONTENT__SNIPPET_LENGTH=2000
export OPENZIM_MCP_CONTENT__DEFAULT_SEARCH_LIMIT=20
# 日志配置
export OPENZIM_MCP_LOGGING__LEVEL=DEBUG
export OPENZIM_MCP_LOGGING__FORMAT="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
# 服务器配置
export OPENZIM_MCP_SERVER_NAME=my_openzim_mcp_server
| 设置 | 默认值 | 描述 |
|---|---|---|
OPENZIM_MCP_CACHE__ENABLED |
true |
启用/禁用缓存 |
OPENZIM_MCP_CACHE__MAX_SIZE |
100 |
最大缓存条目数 |
OPENZIM_MCP_CACHE__TTL_SECONDS |
3600 |
缓存的 TTL(秒) |
OPENZIM_MCP_CONTENT__MAX_CONTENT_LENGTH |
100000 |
最大内容长度 |
OPENZIM_MCP_CONTENT__SNIPPET_LENGTH |
1000 |
最大片段长度 |
OPENZIM_MCP_CONTENT__DEFAULT_SEARCH_LIMIT |
10 |
默认搜索结果限制 |
OPENZIM_MCP_LOGGING__LEVEL |
INFO |
日志级别 |
OPENZIM_MCP_LOGGING__FORMAT |
%(asctime)s - %(name)s - %(levelname)s - %(message)s |
日志消息格式 |
OPENZIM_MCP_SERVER_NAME |
openzim-mcp |
服务器实例名称 |
该项目包括全面的测试,覆盖率超过 90%,使用了模拟数据和真实的 ZIM 文件:
OpenZIM MCP 使用混合测试方法:
ZIM_TEST_DATA_DIR 指定自定义测试数据位置。# 运行带覆盖率报告的测试
make test-cov
# 查看覆盖率报告
open htmlcov/index.html
# 使用真实的 ZIM 文件运行全面测试
make test-with-zim-data
测试使用 pytest 标记进行组织:
@pytest.mark.requires_zim_data:需要 ZIM 测试数据文件的测试。@pytest.mark.integration:集成测试。@pytest.mark.slow:长时间运行的测试。OpenZIM MCP 提供内置的监控功能:
本项目使用 语义化版本控制,并通过 release-please 进行自动版本管理。
版本升级和发布基于 Conventional Commits 自动进行:
feat: - 新特性(次版本升级)fix: - 错误修复(补丁版本升级)feat!: 或 BREAKING CHANGE: - 重大变更(主版本升级)perf: - 性能改进(补丁版本升级)docs:、style:、refactor:、test:、chore: - 不进行版本升级该项目使用一个 改进的、统一的发布系统,并进行自动验证:
关键特性:
详细说明请参阅 发布流程指南。
<类型>[可选范围]: <描述>
[可选正文]
[可选脚注]
示例:
feat: add search suggestions endpoint
fix: resolve path traversal vulnerability
feat!: change API response format
docs: update installation instructions
git checkout -b feature/amazing-feature)。make check)。git commit -m 'feat: add amazing feature')。git push origin feature/amazing-feature)。本项目采用 MIT 许可证 - 详情请参阅 LICENSE 文件。