Introduction
mnem is a knowledge-graph substrate. It stores nodes as content-addressed objects, retrieves them with vector + sparse + graph signals, and exposes the result over CLI, HTTP, and MCP surfaces.
What it does
- Content-addressed nodes - every node has a CID; identical content collapses to one node.
- Versioned commits - every change is a commit with a parent chain (Git-style for graphs).
- Hybrid retrieval - vector (HNSW), sparse (BM25 / SPLADE), and graph traversal in one query.
- In-process embedder - bundled ONNX MiniLM-L6-v2 (no Ollama / API keys required).
- MCP-native - drop-in memory layer for Claude / Cursor / any MCP client.
- WASM target - same core compiles to wasm32 for in-browser use.
What it is not
- A vector database (it’s a graph; vectors are one signal among several).
- An LLM (mnem holds memory; the LLM uses it).
- A finished product. 0.1.0 is the first public cut.
Where to next
- Install - single command per platform.
- Quickstart - five minutes from zero to retrieve.
- Core concepts - what’s a CID, what’s a commit, what’s a label.
Install
mnem ships a single mnem binary plus optional Python and HTTP daemons. Pick
the source that matches your platform.
From Cargo (any platform with Rust toolchain)
cargo install --locked mnem-cli
mnem --version
Requires Rust 1.95+ (see rust-toolchain.toml).
From npm (Node.js users)
npm install -g mnem-cli
mnem --version
# or one-shot via npx
npx mnem-cli --version
Downloads the prebuilt native binary for your platform at install time. Node 18+ required. No Rust toolchain needed.
From PyPI (Python users)
pip install mnem-cli
mnem --version
The PyPI package ships the same mnem binary as a manylinux / macOS / Windows wheel.
From a release binary
Download the platform tarball from the latest GitHub release:
curl -L https://github.com/Uranid/mnem/releases/latest/download/mnem-linux-x86_64.tar.gz | tar xz
sudo mv mnem /usr/local/bin/
mnem --version
Replace linux-x86_64 with linux-aarch64 / macos-x86_64 / macos-aarch64 / windows-x86_64.zip as appropriate.
Per-OS package managers
After v0.2.0, mnem ships only via Cargo and PyPI. The Homebrew
tap, AUR, Nix, winget, and scoop channels have been dropped in favour
of a lean three-channel model (cargo / PyPI / npm). The Cargo channel
supports bundled-embedder, bundled-embedder-cuda,
bundled-embedder-directml feature flags.
macOS / Linux / Windows
# npm (Node 18+, no Rust toolchain needed)
npm install -g mnem-cli
# Cargo (any platform with Rust 1.95+)
cargo install --locked mnem-cli --features bundled-embedder
# or via cargo-binstall (faster, downloads prebuilt)
cargo binstall mnem-cli
# PyPI (Python users)
pip install mnem-cli
Docker
docker run --rm -p 9876:9876 ghcr.io/uranid/mnem:latest http serve
WASM (in-browser)
cargo build --release --target wasm32-unknown-unknown -p mnem-core
See crates/mnem-core/README.md for embedding examples.
Verify
mnem --version
mnem doctor
mnem doctor probes embedder, store, and config - useful first command after install.
Quickstart
Five minutes from zero to retrieve.
1. Install
cargo install --locked mnem-cli
(See Install for other platforms.)
2. Initialise a repo
mkdir my-graph && cd my-graph
mnem init
This creates .mnem/ with default config (in-process MiniLM embedder, redb store).
3. Ingest
mnem ingest README.md
mnem ingest docs/*.md
mnem ingest <(echo '{"text": "the cat sat on the mat", "label": "demo"}') --json
4. Retrieve
mnem retrieve "what does this project do"
mnem retrieve "what is X" --label demo --top-k 5
5. Serve over HTTP (optional)
mnem http serve --repo . # bind 127.0.0.1:9876
curl http://127.0.0.1:9876/v1/retrieve -d '{"text": "what does this do"}'
6. Wire into Claude / Cursor (optional)
mnem mcp install
Adds an MCP server entry to your client config; subsequent agent turns can
call mnem_retrieve and mnem_ingest natively.
Next steps
- CLI reference for every flag.
- MCP server for agent integrations.
- Retrieval tuning for top-K, hybrid, and graph traversal options.
CLI reference
mnem is the single entry point. Subcommands wrap repo operations.
Common subcommands
mnem init [path] # create .mnem/ in path (default: cwd)
mnem ingest <file|-> [...] # add nodes from file or stdin
mnem retrieve <text> [...] # query (vector + sparse + graph)
mnem mcp # start the MCP JSON-RPC server over stdio
mnem mcp --repo ~/notes # point the MCP server at a specific graph
mnem http serve # start the HTTP JSON API (loopback by default)
mnem integrate # wire as MCP server in your agent host
mnem doctor # probe embedder + store + config
Inspection
mnem stats # commits, nodes, embeddings, store size
mnem log [-n N] # commit history
mnem cat-file <cid> # dump a node by CID
mnem diff <cid> <cid> # diff two commits
mnem export # export as CAR archive
Advanced retrieve flags
--limit N # number of items to return (default 10); short: -n
--vector-cap N # candidate pool from vector lane (default 256)
--graph-expand N # multi-hop expansion budget
--graph-mode <decay|ppr> # graph scoring: decay (default) or PPR
--rerank <provider:model> # post-rerank with a model
--summarize # add community summarization layer
--community-filter # Leiden community filter; drop low-coverage communities
Ingest flags
--chunker <auto|paragraph|recursive|session> # chunking strategy (default: auto)
--extractor keybert # enable KeyBERT keyphrase extraction
--max-tokens N # token budget per chunk (default: 512)
--recursive # ingest a directory recursively
For complete option lists run mnem <subcommand> --help. Long-form
documentation for each subcommand lives in guides.
MCP server
mnem implements the Model Context Protocol over stdio. Drop it into any MCP client (Claude Desktop, Cursor, Zed, custom).
Install
mnem integrate # auto-detect installed hosts and wire everything
mnem integrate claude-code # wire a specific host
For manual registration in any MCP client:
{
"mcpServers": {
"mnem": {
"command": "mnem",
"args": ["mcp", "--repo", "/path/to/your-graph"]
}
}
}
Tools exposed
| Tool | Purpose |
|---|---|
mnem_stats | Repo overview: op-head, commit count, label list, embedder health |
mnem_schema | List every node label and edge label in the current commit |
mnem_search | Exact property-match search with optional outgoing-edge expansion |
mnem_get_node | Fetch a single node by UUID (full props + content) |
mnem_traverse | One-hop neighbour walk from a start node via named edge labels |
mnem_list_nodes | Enumerate nodes at head, optionally filtered by label |
mnem_retrieve | Hybrid retrieval: vector + sparse + graph, fused via RRF |
mnem_commit | Add nodes and/or edges as a single commit |
mnem_commit_relation | Resolve-or-create subject + object + edge in one call |
mnem_resolve_or_create | Find-or-create a node by a primary-key property |
mnem_recent | Walk the op-log backwards (last N operations) |
mnem_vector_search | Cosine nearest-neighbour search over stored embeddings |
mnem_delete_node | Hard-remove a node from the current head |
mnem_tombstone_node | Soft-delete (forget) a node; subsequent retrieves exclude it |
mnem_ingest | Ingest a file or inline text as Doc + Chunk + Entity subgraph |
mnem_global_retrieve | Semantic search on the global graph (~/.mnemglobal/.mnem/) only |
mnem_global_ingest | Ingest a file or inline text into the global graph |
mnem_global_add | Write nodes/edges directly to the global graph |
mnem_community_summarize | Extractive centroid + MMR summarizer over a set of node UUIDs (summarize feature) |
Notes
- The server runs in-process — no separate daemon, no port to manage.
- Embedder is bundled (MiniLM-L6-v2, ONNX). No network calls unless you wire one.
- Local vs global:
mnem_retrievesearches the repo the server is pointed at.mnem_global_retrievealways searches~/.mnemglobal/.mnem/regardless of--repo. - For the full field-level schema of each tool, run
mnem mcp --list-toolsor inspectcrates/mnem-mcp/src/tools/descriptions.rs.
Core concepts
Three primitives. Everything else is composed from these.
Node
A node is content + metadata, addressed by its CID (content identifier derived from a hash of canonical bytes). Two nodes with identical content collapse to one CID. Nodes carry:
text- the unit of content (a sentence, a chunk, a fact)label- string scope; queries can filter to a labelmetadata- opaque JSON map for caller-defined tags
The embedding lives in a per-commit sidecar bucket, not on the node, so two nodes with the same text but different embedders share one CID.
Commit
A commit is a snapshot of the graph at a point in time. Every ingest, every edit, every tombstone produces a new commit. Commits chain by parent CID; the head commit is the working tree’s “current state”. Older commits are immutable and reachable.
Label
A label is an opt-in namespace string attached to nodes at ingest time. Used for:
- per-user / per-conversation isolation in agent memory
- bench harness scoping (per-question, per-document)
- coarse multi-tenancy
A query without a label sees the whole repo; a query with a label sees only nodes carrying that label.
Retrieval lanes
Every retrieve call fans out across three lanes and fuses the results:
- Vector - HNSW over the per-commit sidecar embeddings
- Sparse - BM25 / SPLADE (optional, feature-gated)
- Graph - n-hop traversal over authored edges, optionally PPR-scored
Lanes are configurable. Vector-only is the default and is what the 0.1.0 benchmarks measure.
Configuration
mnem reads config from three sources, in priority order:
- Environment variables -
MNEM_*(highest precedence) - Per-repo config -
<repo>/.mnem/config.toml - User-global config -
~/.mnem/config.toml
Defaults
# .mnem/config.toml
[embed]
provider = "onnx"
model = "all-MiniLM-L6-v2"
[store]
backend = "redb" # "redb" | "in-memory"
[retrieve]
top_k = 10
vector_cap = 256
Common environment overrides
| Variable | Effect |
|---|---|
MNEM_EMBED_PROVIDER | onnx / ollama / openai / mock |
MNEM_EMBED_MODEL | model name (e.g. all-MiniLM-L6-v2) |
MNEM_EMBED_BASE_URL | for ollama / openai providers |
MNEM_EMBED_API_KEY_ENV | name of env var holding the API key |
MNEM_ORT_INTRA_THREADS | pin ONNX runtime thread count (bench harness) |
MNEM_BENCH | enable bench-only label scoping |
MNEM_HTTP_ALLOW_NON_LOOPBACK | allow mnem http to bind 0.0.0.0 (Docker) |
Provider switching
Embedder, sparse encoder, reranker, and LLM are all configured via
provider:model strings - no code change to switch from local ONNX to
hosted Cohere.
[embed]
provider = "cohere"
model = "embed-english-v3.0"
api_key_env = "COHERE_API_KEY"
See Embedding providers for the full provider matrix.
Methodology
Every published number ships with the harness, the dataset hash, and the raw artifacts. If you cannot reproduce a number, that is a bug.
Dataset matrix
| Dataset | Version | n queries | Source |
|---|---|---|---|
| LongMemEval | longmemeval_s_cleaned.json | 500 | xiaowu0162/longmemeval-cleaned |
| LoCoMo | locomo10.json | 1986 (session-level) | snap-research/LoCoMo |
| ConvoMem | 5 cat × 50 items (250) | 250 | Salesforce/ConvoMem |
| MemBench simple/roles | 100 items | 100 | import-myself/Membench |
| MemBench highlevel/movie | 100 items | 100 | import-myself/Membench |
Embedder
ONNX MiniLM-L6-v2 (sentence-transformers/all-MiniLM-L6-v2 via
Xenova/all-MiniLM-L6-v2), bundled in-process via the onnx-bundled
feature. No network calls, no API keys, no per-call model load.
Hardware
Pinned 4 cores per lane (cpuset 0-3 / 4-7 / 8-11 / 12-15),
MNEM_ORT_INTRA_THREADS=4, mem cap 3 GiB per lane. Bench host is
documented per run in benchmarks/results/.
Scoring
| Metric | Definition |
|---|---|
| R@K | hit if any gold item is in top-K retrieved |
| avg recall | mean per-item recall (ConvoMem) |
| Hybrid v4 | dense + sparse score boost (mirrors MP harness helper) |
Apple-to-apple pledge
- Same dataset version, same query count.
- Same scoring code (
benchmarks/harness/). - No secret post-filters, no LLM rerank in the headline numbers.
- Latency reported alongside recall, not separately.
Reproduce in 1 command
bash benchmarks/harness/run_bench.sh
See Reproduce for the full step-by-step.
Reproduce
End-to-end recipe to regenerate the 0.1.0 benchmark numbers locally.
Prerequisites
- Docker 24+ (or
podmanwith compose plugin) - 16 cores recommended, 8 cores minimum
- 16 GiB RAM
- Datasets downloaded:
bash benchmarks/harness/download-datasets.sh
One-shot run
bash benchmarks/harness/run_bench.sh
Wall ETA: 30-50 min on a 16-core box. Output: benchmarks/results/<UTC-stamp>/.
What happens
- Build Docker image (release, FEATURES=onnx-bundled):
- Bring up 4 lanes with cpuset pinning + thread caps.
- Run 6 benches (LongMemEval, LoCoMo, ConvoMem, MemBench × 2, Hybrid v4) sequentially across the lanes via a token-bucket dispatcher.
- Render
RESULTS.mdfrom per-bench JSONs.
Per-bench manual run
docker compose -f benchmarks/harness/compose.yml up -d mnem-bench-1
python benchmarks/harness/adapters/longmemeval_session.py \
--dataset benchmarks/datasets/longmemeval/longmemeval_s_cleaned.json \
mnem http serve --bind 127.0.0.1:9876 \
--limit 500 --top-k 10 \
--out benchmarks/results/longmemeval-500q.json
docker compose -f benchmarks/harness/compose.yml down
Verify against shipped numbers
python benchmarks/harness/comparison_table.py \
--results benchmarks/results/<UTC-stamp> \
--out /tmp/RESULTS.md
diff /tmp/RESULTS.md benchmarks/results/RESULTS.md
If your numbers diverge by more than ±0.01 on recall, open an issue with the host spec and the bench logs.
Run benchmarks locally with mnem bench
mnem bench is the 0.1.0 first-class entrypoint for running
mnem against published memory benchmarks. It replaces the legacy
bash benchmarks/harness/run_bench.sh flow as the default;
the Bash harness stays around for reproducing the headline
numbers from the project README until 0.2.0 wires the same set of
embedders into mnem bench.
Quickstart
# 1. Interactive setup wizard (lists every bench; toggles unshipped
# options behind [0.2.0] tags so you see what is on the roadmap).
mnem bench
# 2. CI-friendly explicit form.
mnem bench run \
--benches longmemeval,locomo \
--with mnem \
--mode cpu-local \
--top-k 10 \
--out ./bench-out \
--non-interactive
# 3. Cache datasets without running anything (network step isolated
# so you can pre-warm a CI image).
mnem bench fetch longmemeval # ~264 MB from HuggingFace
mnem bench fetch locomo # ~3 MB from snap-research/LoCoMo
mnem bench fetch # fetch every shipped bench in one go
# 4. Re-render RESULTS.md from a previous run directory.
mnem bench results ./bench-out
Output layout:
bench-out/
RESULTS.md markdown table, one row per (bench, adapter)
timing.log per-bench wall-time breakdown
longmemeval.json summary
longmemeval.jsonl per-question rows
locomo.json
locomo.jsonl
logs/<bench>.log
What ships in 0.1.0
| Component | Status | Notes |
|---|---|---|
| LongMemEval (per-session) | shipped | R@5 / R@10 over LmeQs:<qid> per-question repos. |
| LoCoMo (session granularity) | shipped | MAX-aggregate dialog scores up to session keys. |
| mnem cpu-local adapter | shipped | In-process Repo::open_in_memory + bag-of-tokens. |
| ConvoMem | 0.2.0 | TUI lists; runtime prints “coming 0.2.0” and skips. |
| MemBench (simple-roles) | 0.2.0 | Same. |
| MemBench (highlevel-movie) | 0.2.0 | Same. |
| LongMemEval-hybrid-v4 | 0.2.0 | MemPalace v4 hybrid post-filter port. |
| mem0 adapter | 0.2.0 | Same. |
| MempalaceAdapter | 0.2.0 | Same. |
| CPU parallel mode | 0.2.0 | Falls back to cpu-local with a stderr note. |
| Docker compose mode | 0.2.0 | Same. |
| ONNX MiniLM / Ollama / OpenAI embedders | 0.2.0 | Falls back to bag-of-tokens with a note. |
The bag-of-tokens embedder ships built into mnem-bench. It is
deterministic, network-free, and good enough to deliver
recall@5 > 0 on the smoke test. It is NOT the embedder we use for
the headline R@5 numbers in the project README - those still come
from the legacy Bash harness driving Ollama / ONNX MiniLM /
OpenAI. 0.2.0 swaps mnem-bench onto the same provider stack so the
two harnesses produce identical numbers.
Pre-flight smoke test
cargo run --example smoke -p mnem-bench
Runs a 5-question LongMemEval canary and exits non-zero if
recall@5 == 0. Used as the gate for releases of
mnem-bench and mnem-cli.
See also
benchmarks/README.mdfor the legacy Bash harness (still the source of the published headline numbers; sunset after 0.2.0 ports the embedder stack).
Results
mnem vs MemPalace published numbers. Dense retrieval (vector + top-k); hybrid-v4 row mirrors MemPalace’s harness helper. No LLM rerank.
ONNX MiniLM-L6-v2 (bundled, in-process). 4 cores per lane.
| Benchmark | Split | Metric | MP | mnem | Δ vs MP | Latency (ms) |
|---|---|---|---|---|---|---|
| LongMemEval | 500 Q (full) | R@5 session | 0.966 | 0.966 | ±0 | 711 (retr) |
| LongMemEval | 500 Q (full) | R@10 session | 0.982 | 0.982 | ±0 | 711 (retr) |
| LoCoMo | 1986 Q (full) | R@5 session | 0.508 | $\color{green}{\textbf{0.726}}$ | +0.218 | 333 (retr) |
| LoCoMo | 1986 Q (full) | R@10 session | 0.603 | $\color{green}{\textbf{0.855}}$ | +0.252 | 333 (retr) |
| ConvoMem | 5 cat × 50 items (250) | avg recall | 0.929 | $\color{green}{\textbf{0.976}}$ | +0.047 | 398 (retr) |
| MemBench | simple/roles, 100 items | R@5 | 0.840 | $\color{green}{\textbf{0.960}}$ | +0.120 | 1874 (e2e) |
| MemBench | highlevel/movie, 100 items | R@5 | 0.950 | $\color{green}{\textbf{1.000}}$ | +0.050 | 491 (e2e) |
| LongMemEval | 500 Q, Hybrid v4 | R@5 session | 0.982 | $\color{red}{\textbf{0.976}}$ | -0.006 | 729 (retr) |
(retr) = retrieve-only mean (from summary timing).
(e2e) = end-to-end mean (runtime / n) when adapter doesn’t expose phase timing.
Headlines
- Matches MemPalace exactly on LongMemEval (0.966 / 0.982).
- Beats by +0.218 / +0.252 on LoCoMo session-level retrieval.
- Beats by +0.047 on ConvoMem.
- Beats by +0.120 / +0.050 on MemBench tasks.
- Within ±0.006 on Hybrid v4 (no LLM rerank).
Raw artifacts
Per-bench JSON + JSONL in benchmarks/results/v0.1.0/. Each artifact carries
the question, the gold set, the retrieved top-K, and per-item recall.
Reproduce
See Reproduce. One command:
bash benchmarks/harness/run_bench.sh
Ingest pipeline
mnem ingest is the only path content takes into the graph. The pipeline:
parse -> chunk -> extract -> embed -> commit
Sources
- file path (
mnem ingest README.md) - glob (
mnem ingest 'docs/**/*.md') - stdin (
cat data.txt | mnem ingest -) - structured JSON (
mnem ingest data.json --json)
Chunking
Default: ~1k-token chunks with sentence-boundary alignment. Override via config:
[ingest]
chunk_size_tokens = 512
chunk_overlap_tokens = 50
Document-aware chunkers exist for code (Tree-sitter) and for Markdown (heading-aware). Auto-detected by file extension.
Extractors
Optional ingest-time enrichment:
| Extractor | What it does |
|---|---|
none (default) | raw text only |
keybert | KeyBERT keyphrase extraction; phrases stored in node metadata |
Enable via flag:
mnem ingest README.md --extractor keybert
Labels
Pass --label <str> to scope the ingested nodes:
mnem ingest user-42-chat.json --label user-42 --json
Subsequent retrieve calls with --label user-42 will see only this scope.
Idempotency
Ingesting the same content twice produces the same CID; the second commit is a no-op (parent points at the same tree). Edit-and-reingest produces a new CID and a child commit.
Embedding providers
mnem decouples embedder from store. Switch providers without re-ingesting.
Built-in providers
| Provider | Model | Network? | Notes |
|---|---|---|---|
onnx | all-MiniLM-L6-v2 (bundled) | no | default; in-process; fastest cold-start |
ollama | any pulled model | local HTTP | e.g. bge-large, nomic-embed-text |
openai | text-embedding-3-small/-large | yes | needs OPENAI_API_KEY |
cohere | embed-english-v3.0 | yes | needs COHERE_API_KEY |
voyage | voyage-3 | yes | needs VOYAGE_API_KEY |
mock | deterministic blake3 | no | tests / smoke |
Switching
Edit <repo>/.mnem/config.toml:
[embed]
provider = "ollama"
model = "bge-large"
base_url = "http://127.0.0.1:11434"
Or override per-process:
MNEM_EMBED_PROVIDER=ollama MNEM_EMBED_MODEL=bge-large mnem retrieve "..."
After switching, run mnem reindex to regenerate the per-commit
embedding sidecar. Node CIDs are unchanged (they don’t carry embeddings);
only the sidecar changes.
Sidecar layout
.mnem/
store.redb # nodes + commits
sidecars/
<embedder-id>/ # one dir per (provider, model) pair
<commit-cid>.bin # embedding bucket for that commit
Multiple sidecars co-exist. retrieve picks the sidecar matching the active
embedder; if missing, it builds on-demand.
Adding a provider
Implement the Embedder trait in mnem-embed-providers/src/<your>.rs,
gate behind a feature flag, register in the provider registry. See
for the
contract.
Comparisons
How mnem stacks up against other agent-memory and knowledge-graph systems. Each comparison is honest: where they win, where mnem wins, when to pick which.
mnem is open source (Apache-2.0). Numbers come from public artefacts; where a competitor’s claim is closed-source we say so. Where a benchmark is not directly comparable, we say so rather than fabricate a single-number league table.
| Competitor | License | Server / Embedded | LLM at ingest | Bitemporal | Stars | Compare |
|---|---|---|---|---|---|---|
Graphiti (getzep/graphiti) | Apache-2.0 | server (Neo4j / Kuzu / FalkorDB / Neptune) | mandatory | yes | 25,409 | graphiti.md |
mem0 (mem0ai/mem0) | Apache-2.0 | library + cloud | default-on (opt-out) | no | 54,113 | mem0.md |
MemPalace (MemPalace/mempalace) | MIT | embedded (Python + ChromaDB) | no | partial | 49,768 | mempalace.md |
Supermemory (supermemoryai/supermemory) | MIT (repo) / closed (cloud) | hosted cloud | yes | no | 22,218 | supermemory.md |
Cognee (topoteretes/cognee) | Apache-2.0 | library + cloud | yes (cognify) | no | 16,807 | cognee.md |
Letta (letta-ai/letta) | Apache-2.0 | server + CLI | yes (agent is the writer) | partial | 22,305 | letta.md |
graphify (safishamsi/graphify) | MIT | one-shot CLI | yes (Claude subagents) | no | 35,262 | graphify.md |
| mnem | Apache-2.0 | embedded + four surfaces | no | no | small / pre-launch | (this repo) |
Star counts pulled from the GitHub API on 2026-04-26. License columns reflect the repository SPDX identifier; commercial / hosted layers above some of these projects ship under different terms.
mnem positioning
mnem is the substrate underneath the products in the table: a content- addressed, versioned, hybrid-retrieval graph that runs in-process, ingests without an LLM, and exposes token-budget telemetry on every retrieve. We are not building a memory product; we are building the thing the next memory product is built on.
Reading order
If you have read about agent memory before, the most useful first read is one of:
- mnem vs Graphiti if you have been thinking about bitemporal knowledge graphs.
- mnem vs mem0 if you have been using the LangChain / LlamaIndex / CrewAI defaults.
- mnem vs MemPalace if you care about no-LLM-on- write retrieval and reproducible benchmarks.
- mnem vs Supermemory if you have been weighing the closed cloud vs self-host trade-off.
- mnem vs Cognee if you have been looking at ECL- pipeline-shaped knowledge engines.
- mnem vs Letta if you have been looking at the MemGPT lineage of agent platforms.
- mnem vs graphify if you have been using one-shot folder-to-graph extractors.
mnem vs mem0
mem0: “Universal memory layer for AI Agents” (repo description, mem0ai/mem0) mnem: a content-addressed, versioned graph substrate underneath the memory layer.
At a glance
| mnem | mem0 | |
|---|---|---|
| License | Apache-2.0 | Apache-2.0 |
| Stars | small / pre-launch | 54,113 (GitHub API, 2026-04-26) |
| Embedded / Server | embedded | library + optional managed Platform |
| LLM at ingest | no | yes by default (single-pass ADD-only since v3, Apr 2026); infer=False opt-out exists |
| Content-addressed | yes | no (UUIDs over a vector store) |
| Bitemporal | no | no (event log, not bitemporal) |
| WASM target | yes | no (Python + external vector DB) |
| MCP server | yes | yes (mem0 MCP exists) |
| Hybrid retrieval | yes (vector + sparse + graph + RRF) | yes (semantic + BM25 + entity matching, fused) since v3 |
| Token-budget retrieval metadata | yes | no |
| 3-way merge | yes | no (event log with add/update/delete) |
| Reproducible benchmarks in-repo | yes | partial (separate memory-benchmark repo) |
Feature comparison
| # | Dimension | mnem | mem0 | Source |
|---|---|---|---|---|
| 1 | Data model | open-schema content-addressed nodes + edges | rows in a vector store with {role, content} history; user_id / agent_id / run_id scoping | mem0 README “Basic Usage” + docs |
| 2 | Default ingest | parse + chunk + statistical extract | LLM (gpt-5-mini default) extracts atomic facts on every add | mem0 README “Basic Usage” sha bd9d27ff509f |
| 3 | LLM requirement | optional | required by default; infer=False opts out but loses the “magic” | mem0 v3 README “New Memory Algorithm” |
| 4 | Identity | BLAKE3 CID over DAG-CBOR | UUIDs over a vector row | mem0 docs |
| 5 | History | signed commit DAG, diff / log / branch / merge | history event log of add/update/delete records | mem0 SDK |
| 6 | Conflict resolution | 3-way merge over graph | “latest LLM extraction wins” before v3; v3 is ADD-only and accumulates | mem0 v3 release notes |
| 7 | Vector backends | redb default, pluggable via Blockstore | 20+ (Qdrant, Chroma, PGVector, Pinecone, Weaviate, etc.) | mem0 docs “Supported Vector Stores” |
| 8 | LLM providers | optional, 16 via mnem-llm-providers | 16+ (OpenAI, Anthropic, Gemini, Groq, Ollama, …) | mem0 docs “Supported LLMs” |
| 9 | Embedding model | bundled ONNX MiniLM-L6-v2 in-process | configurable; default OpenAI text-embedding-3-small | mem0 README |
| 10 | Retrieval lanes | dense (HNSW) + sparse (BM25/SPLADE) + graph + RRF | semantic + BM25 + entity match (v3) | mem0 v3 README |
| 11 | Token-budget metadata | first-class on every retrieve | not exposed | mnem CLI / HTTP API |
| 12 | Multi-tenancy | repo-per-tenant or scope by node label | hardcoded user_id / agent_id / run_id triple | mem0 SDK |
| 13 | Bindings | Rust + Python + HTTP + MCP + CLI | Python + TypeScript + REST + MCP | mem0 README badges |
| 14 | Cloud | none yet | “mem0 Platform”: Hobby free, Starter $19, Pro $249, Enterprise | mem0.ai pricing |
| 15 | Distribution | pre-launch | YC S24, ~2.6M monthly PyPI downloads | mem0 README badge |
Benchmarks (where comparable)
mem0 v3 (April 2026) reports on LoCoMo and LongMemEval as a full pipeline (LLM extract + retrieve + answer). mnem reports retrieval-only (R@K) under an identical embedder, no LLM in the loop.
We have a same-harness, same-embedder reproduction of mem0 with
infer=False (LLM extraction off) so the comparison lands on the
retrieval layer:
| Benchmark | Split | Metric | mem0 (infer=False, MiniLM) | mnem | Delta |
|---|---|---|---|---|---|
| LongMemEval | 500 Q | R@5 session | 0.946 | $\color{green}{\textbf{0.966}}$ | +0.020 |
| LongMemEval | 500 Q | R@10 session | 0.962 | $\color{green}{\textbf{0.982}}$ | +0.020 |
| LoCoMo | 1986 Q | R@5 session | 0.466 | $\color{green}{\textbf{0.726}}$ | +0.260 |
| LoCoMo | 1986 Q | R@10 session | 0.676 | $\color{green}{\textbf{0.855}}$ | +0.179 |
Adapter notes: infer=False, persistent Memory, per-item user_id
scoping. See benchmarks/methodology.md.
mem0’s own v3 numbers (LoCoMo 91.6, LongMemEval 93.4) are full-pipeline end-to-end accuracy, not retrieval R@5; not directly comparable to the table above.
Latency (where measured)
| System | Setup | Latency |
|---|---|---|
| mnem | LongMemEval 500 Q, MiniLM-L6-v2, embedded redb | 711 ms mean retrieve |
| mnem | LoCoMo 1986 Q, same setup | 333 ms mean retrieve |
| mem0 | LongMemEval, v3 single-pass | 1.09 s p50 (mem0 README, Apr 2026) |
| mem0 | LoCoMo, v3 single-pass | 0.88 s p50 (mem0 README) |
mem0 v3 latency includes one LLM retrieval call per query; mnem’s numbers are pure retrieval. Different mechanisms, useful only as an order-of-magnitude check.
Architecture differences
mem0 is a Python (and TS) memory layer designed to drop into LLM apps.
The default flow is: mem.add(messages, user_id=...) runs an LLM to
extract atomic facts, embeds them into a configured vector store, and
returns a UUID per memory. Retrieval (mem.search(...)) does semantic
- keyword + entity matching, optionally with a reranker. Multi-tenancy
is hardcoded as
user_id/agent_id/run_id. mem0 Platform layers a managed cloud, dashboards, and SOC 2 / GDPR on top.
mnem is one layer below: a content-addressed, versioned graph substrate. There is no fixed conversation schema; you commit nodes and edges with whatever labels and properties you need. Identity is a CID over canonical DAG-CBOR + BLAKE3, so the same fact on two machines collapses to the same node. History is a signed commit DAG, not an event log, so old facts remain addressable after newer ones supersede them. The write path runs no LLM by default; ingest is statistical parse + chunk + key extract. Retrieval is 3-lane RRF (HNSW dense + sparse + graph) with token-budget telemetry on every response.
Where mem0 clearly wins
- Distribution. ~2.6M monthly PyPI downloads, default memory in LangChain / LlamaIndex / CrewAI / Vercel AI SDK / LiveKit / Pipecat / AWS Bedrock. mem0 is the path of least resistance.
- Backend breadth. 20+ vector stores, 16 LLMs, 10 embedders work out of the box.
- Managed product. Hobby tier is free; Pro is $249/mo with dashboards, SOC 2, on-prem.
- LLM-assisted ingest.
mem.add("I met Alice in Berlin")auto-extracts{entity: Alice, city: Berlin}with no upstream modelling effort. - YC + commercial momentum. YC S24, $24M raised, weekly release cadence on v3.
Where mnem clearly wins
- No LLM in the write path. Regulated, offline, or cost-sensitive workloads ingest deterministically. mem0 v3 reduced the LLM cost to one call per add but did not eliminate it.
- Content-addressed CIDs. Globally stable identity; CID-citations stay reproducible. mem0’s UUIDs are per-instance random.
- Versioned history with 3-way merge. Diff / log / branch / merge / signed commits. mem0 ships an event log, not a commit graph.
- Embedded + single binary. ~40 MB Docker image, no external vector DB. Runs offline.
- WASM target. mnem-core compiles to
wasm32; mem0 cannot. - Retrieval-quality lead under identical-embedder conditions. +0.20 R@5 on LongMemEval, +0.260 R@5 on LoCoMo (same MiniLM weights, dense lane only).
- Token-budget telemetry.
tokens_used/droppedper retrieve.
When to pick mem0, when to pick mnem
Pick mem0 if: you want drop-in agent memory with the broadest LangChain / LlamaIndex / CrewAI footprint, you are happy paying an LLM call per add for “magic” extraction, or you want a managed cloud and dashboards today.
Pick mnem if: you want an embedded substrate with no LLM at ingest, you need content-addressing and a real commit graph, you care about reproducibility and audit, or you are shipping to the edge / WASM / offline.
Sources
- mem0 repo, sha
bd9d27ff509fonmain, 2026-04-26: https://github.com/mem0ai/mem0 - mem0 README (“New Memory Algorithm (April 2026)”, “Basic Usage”, “CLI”): https://github.com/mem0ai/mem0/blob/main/README.md
- mem0 docs: https://docs.mem0.ai
- mem0 evaluation framework: https://github.com/mem0ai/memory-benchmark
- mnem benchmarks:
/benchmarks/proofs/v0.1.0/ - mnem README:
/README.md
mnem vs MemPalace
MemPalace: “The best-benchmarked open-source AI memory system. And it’s free.” (repo description, MemPalace/mempalace) mnem: a content-addressed, versioned graph substrate that shares MemPalace’s no-LLM-on-write philosophy and pushes further on identity and history.
At a glance
| mnem | MemPalace | |
|---|---|---|
| License | Apache-2.0 | MIT |
| Stars | small / pre-launch | 49,768 (GitHub API, 2026-04-26) |
| Embedded / Server | embedded | embedded (Python + ChromaDB) |
| LLM at ingest | no | no (verbatim store) |
| Content-addressed | yes | no (ChromaDB row IDs) |
| Bitemporal | no | partial (valid_from / valid_to on KG entries) |
| WASM target | yes | no |
| MCP server | yes (18 tools) | yes (29 tools) |
| Hybrid retrieval | yes (vector + sparse + graph) | yes (semantic + hybrid v4 / v5 with keyword + temporal boost) |
| Token-budget retrieval metadata | yes | no |
| 3-way merge | yes | no |
| Reproducible benchmarks in-repo | yes | yes (per-question JSONL committed) |
Feature comparison
| # | Dimension | mnem | MemPalace | Source |
|---|---|---|---|---|
| 1 | Schema | open (any labels / properties) | fixed: wings, rooms, halls, drawers | MemPalace README “What it is” sha 6890948e092b |
| 2 | Storage | redb embedded | ChromaDB + SQLite | MemPalace README + mempalace/backends/base.py |
| 3 | Default embedder | bundled ONNX MiniLM-L6-v2 | ChromaDB default (MiniLM-L6-v2 implied) | MemPalace requirements |
| 4 | LLM at ingest | none | none | MemPalace README |
| 5 | LLM at retrieval | optional rerank | optional hybrid-v4 + LLM rerank tier | MemPalace Benchmarks table |
| 6 | Identity | content CID (BLAKE3 over DAG-CBOR) | ChromaDB row IDs | implementation |
| 7 | History | signed commit DAG | append-only with valid_from / valid_to | MemPalace KG section |
| 8 | Conflict resolution | 3-way merge | manual invalidate tool | MemPalace MCP tool list |
| 9 | Sparse lane | BM25 + SPLADE | hybrid-v4 keyword boost | MemPalace BENCHMARKS.md |
| 10 | Graph lane | first-class (label / prop / adjacency) | KG with timeline + cross-wing tunnels | MemPalace MCP tools |
| 11 | MCP surface | 18 tools | 29 tools | MemPalace README “MCP server” |
| 12 | Plugin scaffolds | mnem mcp + mnem integrate | .claude-plugin/, .codex-plugin/ in repo | MemPalace repo |
| 13 | Bindings | Rust + Python + TS + HTTP + CLI + MCP | Python + MCP | MemPalace README |
| 14 | Hosted product | none | none | n/a |
| 15 | Velocity | maturing 1.0 | 433 commits in first 12 days, 30 contributors (early 2026) | internal notes; verify on repo today |
Benchmarks (where comparable)
MemPalace publishes retrieval R@5 / R@10 numbers in the same family as
mnem’s harness. We pulled their numbers from benchmarks/BENCHMARKS.md
and ran ours on the same datasets and embedder weights:
| Benchmark | Split | Metric | MemPalace | mnem | Delta |
|---|---|---|---|---|---|
| LongMemEval | 500 Q | R@5 session, raw dense | 0.966 | 0.966 | 0 |
| LongMemEval | 500 Q | R@10 session, raw dense | 0.982 | 0.982 | 0 |
| LongMemEval | 500 Q hybrid-v4 | R@5 session | 0.982 | $\color{red}{\textbf{0.976}}$ | -0.006 |
| LoCoMo | 1986 Q | R@5 session, raw dense | 0.508 | $\color{green}{\textbf{0.726}}$ | +0.218 |
| LoCoMo | 1986 Q | R@10 session, raw dense | 0.603 | $\color{green}{\textbf{0.855}}$ | +0.252 |
| ConvoMem | 250 Q | Avg recall | 0.890 | $\color{green}{\textbf{0.976}}$ | +0.086 |
| MemBench | 100 Q (movie) | R@5 | 0.950 | $\color{green}{\textbf{1.000}}$ | +0.050 |
Method: identical MiniLM-L6-v2 ONNX weights, no reranker, no LLM, no lexical lane on the raw-dense rows. The LoCoMo gap comes from mnem’s adapter aggregating user-turn text per session before embedding; MemPalace’s adapter embeds at a finer grain. Mechanism, not magic.
MemPalace’s hybrid-v4 numbers tune on dev splits; the held-out 98.4% they report is the honest figure to compare against.
Latency (where measured)
| System | Setup | Latency |
|---|---|---|
| mnem | LongMemEval 500 Q, MiniLM ONNX | 711 ms mean retrieve |
| mnem | LoCoMo 1986 Q, MiniLM ONNX | 333 ms mean retrieve |
| MemPalace | LongMemEval, raw dense | not headlined; ChromaDB-default latency |
MemPalace does not publish a single mean-latency number; their benchmark tables focus on accuracy.
Architecture differences
MemPalace stores conversation history verbatim in ChromaDB and indexes
people / projects as wings, topics as rooms, flows as
halls, content as drawers. The retrieval layer is pluggable
behind mempalace/backends/base.py. A SQLite-backed knowledge graph
adds valid_from / valid_to windows, an invalidate verb, and a
timeline view. The MCP server exposes 29 tools including agent
diaries and cross-wing tunnels. The product is opinionated: the palace
metaphor is the user experience.
mnem ships no metaphor. Nodes and edges are open-schema; you commit
whatever shape your application needs. Identity is a CID over canonical
DAG-CBOR + BLAKE3, so identical content collapses to the same node
across machines. History is a signed commit DAG with diff / log /
branch / 3-way merge. Retrieval is 3-lane RRF (HNSW dense + BM25/SPLADE
sparse + graph traversal) with first-class token-budget telemetry on
every response. mnem-core is no-tokio / no-fs / no-net and compiles
to WASM unchanged.
Where MemPalace clearly wins
- Verbatim store with measured 96.6% R@5 on LongMemEval, no API key. Same as mnem on raw dense, and reproducible from their repo.
- MCP breadth. 29 tools to mnem’s 18. Agent diaries and cross-wing tunnels are original ideas.
- Plugin scaffolds in-repo.
.claude-plugin/and.codex-plugin/lower install friction for Claude Code / Codex users. - Velocity and community. Hundreds of commits, dozens of contributors, rapid issue response.
- Reproducibility culture. Per-question JSONL result files committed for every benchmark run.
- Working temporal KG.
valid_from/valid_to/invalidate/timelineshipped today.
Where mnem clearly wins
- Open schema. No fixed wings/rooms/halls/drawers hierarchy. Use any labels and properties for any domain.
- Content-addressed identity. Same fact = same CID across machines. Stable citations forever.
- Real commit DAG. Branch, diff, 3-way merge, signed Ed25519 history. MemPalace stores facts and a timeline; mnem stores commits over a graph.
- WASM target. Same retrieval logic in browsers, Workers, Lambda. Python + ChromaDB cannot.
- Retrieval-quality lead on LoCoMo. +0.218 R@5 raw dense, same embedder.
- Token-budget telemetry.
tokens_used,candidates_seen,droppedreturned on every retrieve.
When to pick MemPalace, when to pick mnem
Pick MemPalace if: the wings / rooms / halls / drawers metaphor matches your domain, you want the largest MCP tool surface available, or you specifically want a Claude-Code-paired personal memory appliance with reproducible benchmark numbers today.
Pick mnem if: you want an open-schema substrate, you need content-addressing and a real commit DAG, you are shipping to multiple languages or to the edge / WASM, or you want token-budget telemetry as a first-class response field.
Sources
- MemPalace repo, sha
6890948e092bondevelop, 2026-04-26: https://github.com/MemPalace/mempalace - MemPalace README (license MIT, “What it is”, Benchmarks table): https://github.com/MemPalace/mempalace/blob/develop/README.md
- MemPalace
benchmarks/BENCHMARKS.mdfor benchmark provenance - mnem benchmark artefacts:
/benchmarks/proofs/v0.1.0/ - mnem README + benchmark methodology:
/README.md,benchmarks/methodology.md
mnem vs Supermemory
Supermemory: “Memory engine and app that is extremely fast, scalable. The Memory API for the AI era.” (repo description, supermemoryai/supermemory) mnem: an open-source, embedded, content-addressed knowledge-graph substrate. Self-host or nothing.
At a glance
| mnem | Supermemory | |
|---|---|---|
| License | Apache-2.0 | MIT (repo); cloud product is closed-core |
| Stars | small / pre-launch | 22,218 (GitHub API, 2026-04-26) |
| Embedded / Server | embedded | hosted cloud; self-host issue (#707) closed without resolution |
| LLM at ingest | no | yes (Extractors layer; entity / fact extraction) |
| Content-addressed | yes | no (custom vector graph engine, internals undisclosed) |
| Bitemporal | no | no |
| WASM target | yes | n/a (cloud only) |
| MCP server | yes | yes (https://mcp.supermemory.ai/mcp, OAuth + bearer) |
| Hybrid retrieval | yes | yes (multi-mode search across graph) |
| Token-budget retrieval metadata | yes | not exposed |
| 3-way merge | yes | no |
| Reproducible benchmarks in-repo | yes | self-reported; memorybench skill is a benchmarking harness vs supermemory |
Feature comparison
| # | Dimension | mnem | Supermemory | Source |
|---|---|---|---|---|
| 1 | Deployment | embedded; single binary | cloud only; self-host requested + closed (issue #707) | internal research |
| 2 | Storage | redb embedded | Postgres via Cloudflare Hyperdrive + Cloudflare AI vector embeddings + R2 + KV | internal research |
| 3 | Vector engine | HNSW via mnem-ann | undisclosed; “custom vector graph engine with ontology-aware edges” | internal research |
| 4 | Embedding model | bundled ONNX MiniLM-L6-v2; pluggable | undisclosed (Cloudflare AI) | internal research |
| 5 | Identity | content CID | undisclosed | n/a |
| 6 | Multi-tenancy | by repo or graph scope | containerTag and project scoping | internal research |
| 7 | Ingest pipeline | parse + chunk + statistical extract | five stacked layers: User Profiles, Memory Graph, Retrieval, Extractors, Connectors | internal research |
| 8 | LLM use | optional, opt-in | yes, in Extractors layer | internal research |
| 9 | Connectors | none yet | webhook-driven connectors live (Notion, GDrive, etc.) | supermemory.ai docs |
| 10 | Plugin / IDE ecosystem | MCP + mnem integrate | 12+ integration plugins, dedicated repos | internal research |
| 11 | API | local Rust / Python / HTTP / MCP / CLI | REST api.supermemory.ai/v3 + /v4, TS / Python SDKs | supermemory README |
| 12 | Pricing | self-host, free | tiered cloud (free / pro / team / enterprise) | supermemory.ai/pricing |
| 13 | Funding / brand | self-funded indie | $3M seed, ~$40M valuation, named angels (Jeff Dean, Dane Knecht, Logan Kilpatrick, …) | internal research |
| 14 | Founder reach | small | Dhravya Shah, ~51.5k X followers | internal research |
| 15 | Self-reported benchmarks | reproducible artefacts in-repo | “#1 on LongMemEval, LoCoMo, ConvoMem”; sub-300 ms recall at 85.4% accuracy | internal research |
Benchmarks (where comparable)
Not directly comparable in any apples-to-apples sense. Supermemory’s
benchmark numbers are self-reported, the engine is closed, and the
evaluation harness is bundled as the memorybench skill that points
at supermemory by default. Their headline:
Supermemory: 85.2-85.4% on LongMemEval; sub-300 ms recall; “#1 on LongMemEval, LoCoMo, ConvoMem”.
mnem’s reproducible numbers under ONNX MiniLM-L6-v2, no LLM in the loop:
| Benchmark | Split | Metric | mnem |
|---|---|---|---|
| LongMemEval | 500 Q | R@5 session | 0.966 |
| LongMemEval | 500 Q | R@10 session | 0.982 |
| LoCoMo | 1986 Q | R@5 session | 0.726 |
| ConvoMem | 250 Q | Avg recall | 0.976 |
Putting 0.852 next to 0.966 looks favorable for mnem, but the
metrics are not the same shape: Supermemory’s number is end-to-end QA
accuracy; mnem’s is retrieval R@5 with no LLM. Both columns are
honest; the column headers are not the same column.
Latency (where measured)
| System | Setup | Latency |
|---|---|---|
| mnem | LongMemEval 500 Q, MiniLM ONNX | 711 ms mean retrieve |
| mnem | LoCoMo 1986 Q, MiniLM ONNX | 333 ms mean retrieve |
| Supermemory | self-reported | sub-300 ms recall, “sub-400 ms at scale” |
Closed engine, edge network, undisclosed embedder. Supermemory cloud is fast at the retrieve hop; mnem runs in your process so total end-to-end (no network round-trip) tends to win for self-hosted users.
Architecture differences
Supermemory is a Cloudflare-native cloud product. The repo is MIT-
licensed but the production engine is closed: a “custom vector graph
engine with ontology-aware edges” sitting on top of Postgres
(Hyperdrive), Cloudflare AI vector embeddings, R2 object storage, and
KV. The product is five stacked layers behind one API: User Profiles,
Memory Graph, Retrieval, Extractors, Connectors. MCP server is
production today at mcp.supermemory.ai/mcp with OAuth or API-key
auth. Connectors (Notion, GDrive, etc.) ship as live webhook
integrations. The strength is GTM: $3M seed, named angels, 50,000+
self-reported users on the consumer app, integrations with Cluely,
Composio, Scira AI.
mnem is the opposite: open-source Apache-2.0, embedded, single-binary,
no cloud. The graph substrate is content-addressed (BLAKE3 CIDs over
DAG-CBOR), versioned (signed commit DAG with 3-way merge), and runs
in-process from a cargo install away. There is no managed offering;
hosting is explicitly out of scope for 0.1.0. Where Supermemory wins
on distribution and managed
operations, mnem wins on substrate guarantees: identity, history, and
deterministic retrieval that you can run offline.
Where Supermemory clearly wins
- Hosted product with live connectors. Notion, GDrive, etc. work out of the box. mnem has none yet.
- Distribution and brand. 22k stars, $3M seed, named angels (Jeff Dean, Dane Knecht, Logan Kilpatrick, David Cramer), founder reach ~51.5k X followers.
- MCP-native cloud. Drop one URL into a client config and you have agent memory.
- IDE plugin ecosystem. 12+ integration plugins live.
- Cloudflare edge latency. Sub-300 ms recall claims are plausible given the Workers + Hyperdrive stack.
Where mnem clearly wins
- Open-source substrate. Apache-2.0, no vendor lock-in. Self-host on a laptop or a Lambda. Supermemory’s self-host issue (#707) closed without a resolution; the cloud is structural.
- No closed engine. mnem’s vector lane (HNSW), sparse lane (BM25 / SPLADE), graph lane, and RRF weights are all configurable and documented. Supermemory’s “custom vector graph engine” is a black box.
- Content-addressed identity. Same fact = same CID across machines.
- Real commit history. Diff, log, branch, 3-way merge, signed history. Supermemory has soft “versioning” in their sense; not a DAG.
- Privacy by default. Nothing leaves your machine unless you opt in.
- Reproducible benchmarks. Numbers ship with a runnable harness; Supermemory’s are self-reported.
- Token-budget retrieval metadata. First-class on every retrieve.
When to pick Supermemory, when to pick mnem
Pick Supermemory if: you want a managed memory API today with hosted connectors, you trust Cloudflare for storage and inference, you want OAuth-MCP plug-and-play for ChatGPT / Claude / Cursor, or distribution on hosted infrastructure beats substrate control for your use case.
Pick mnem if: you need self-host or air-gapped, you want an open substrate with documented internals, you need content-addressing and a commit DAG, or you are building a product on top of a memory layer rather than consuming one.
Sources
- Supermemory repo, sha
a41bbeecb395onmain, 2026-04-26: https://github.com/supermemoryai/supermemory - Supermemory README, MCP details:
mcp.supermemory.ai/mcp - Supermemory cloud and pricing: https://supermemory.ai
- mnem benchmark artefacts:
/benchmarks/proofs/v0.1.0/ - mnem README + architecture:
/README.md
mnem vs Cognee
Cognee: “Knowledge Engine for AI Agent Memory in 6 lines of code” (repo description, topoteretes/cognee) mnem: a content-addressed, versioned graph substrate that ingests without an LLM.
At a glance
| mnem | Cognee | |
|---|---|---|
| License | Apache-2.0 | Apache-2.0 |
| Stars | small / pre-launch | 16,807 (GitHub API, 2026-04-26) |
| Embedded / Server | embedded | library + Cognee Cloud |
| LLM at ingest | no | yes (remember calls add + cognify + improve) |
| Content-addressed | yes | no (extracted graph node IDs) |
| Bitemporal | no | no |
| WASM target | yes | no |
| MCP server | yes | yes (Cognee MCP exists; integrates with Claude Code, Hermes) |
| Hybrid retrieval | yes (vector + sparse + graph + RRF) | yes (auto-routing across graph + vector) |
| Token-budget retrieval metadata | yes | no |
| 3-way merge | yes | no |
| Reproducible benchmarks in-repo | yes | partial (research / paper claims; no in-repo harness) |
Feature comparison
| # | Dimension | mnem | Cognee | Source |
|---|---|---|---|---|
| 1 | Storage | redb embedded | Kuzu default + vector DB; pluggable | Cognee README “Deploy Cognee” sha f4964c31db04 |
| 2 | Default flow | commit -> CID; no LLM | remember runs add + cognify + improve (LLM extraction) | Cognee README Quickstart Step 3 |
| 3 | LLM requirement | optional | required to configure before Quickstart Step 2 | Cognee README “Step 2: Configure the LLM” |
| 4 | Identity | BLAKE3 CID over DAG-CBOR | extracted graph-node IDs (LLM-derived) | Cognee internals |
| 5 | History | signed commit DAG | none; standard graph state | Cognee docs |
| 6 | Conflict resolution | 3-way merge | re-run cognify to refresh graph | Cognee Quickstart |
| 7 | Vector lane | HNSW via mnem-ann | configurable vector store | Cognee docs |
| 8 | Sparse lane | BM25 + SPLADE | not headlined | Cognee README |
| 9 | Graph lane | first-class | first-class | Cognee README “About Cognee” |
| 10 | LLM providers | optional | OpenAI, Anthropic, Gemini, Ollama, others | Cognee docs |
| 11 | Session memory | open (model your own) | remember(..., session_id="...") first-class | Cognee Quickstart |
| 12 | Auto-routing retrieval | manual lane configuration | “picks best search strategy automatically” | Cognee README Step 3 |
| 13 | Bindings | Rust + Python + TS + HTTP + CLI + MCP | Python + CLI + MCP | Cognee README |
| 14 | Cloud | none yet | Cognee Cloud (managed) | Cognee README “Connect to Cognee Cloud” |
| 15 | Determinism | byte-identical CIDs same input | LLM extraction is non-deterministic | Cognee README Step 3 |
Benchmarks (where comparable)
Not directly comparable. Cognee publishes research and use-case narratives rather than retrieval R@K artefacts in the repo. Their strength is the ECL pipeline as a finished product (drop a PDF, get a typed knowledge graph), not retrieval-quality benchmarks at the substrate layer.
mnem’s measured retrieval numbers under ONNX MiniLM-L6-v2:
| Benchmark | Split | Metric | mnem |
|---|---|---|---|
| LongMemEval | 500 Q | R@5 session | 0.966 |
| LoCoMo | 1986 Q | R@5 session | 0.726 |
| ConvoMem | 250 Q | Avg recall | 0.976 |
| MemBench | 100 Q (movie) | R@5 | 1.000 |
If you want to compare like-for-like, run Cognee against the same
LongMemEval / LoCoMo dumps with infer=False-equivalent (skip
cognify’s LLM extraction). We have not published a Cognee adapter;
contributions welcome.
Latency (where measured)
| System | Setup | Latency |
|---|---|---|
| mnem | LongMemEval 500 Q, MiniLM ONNX | 711 ms mean retrieve |
| mnem | LoCoMo 1986 Q, MiniLM ONNX | 333 ms mean retrieve |
| Cognee | not headlined; depends on LLM provider + vector store | n/a |
Cognee’s retrieval latency depends heavily on whether the auto-router calls a vector store, the graph DB, or LLM rerank. They do not publish a single number.
Architecture differences
Cognee is a Python knowledge-engine library and a managed cloud. The write path is the ECL pipeline: Extract -> Cognify -> Load. You hand it documents, Cognee runs an LLM to extract entities and relationships, embeds them, and writes them into a graph DB (Kuzu by default) plus a vector store. Retrieval auto-routes across vector and graph based on the query. The strength is “drop a PDF and get a typed knowledge graph” with minimal modelling effort.
mnem is the substrate beneath that pattern. There is no required LLM at ingest: parse + chunk + statistical extract -> CID -> commit. The graph shape is whatever your application commits, not whatever the LLM happened to extract that hour. Identity is content-addressed, so the same document on two machines collapses to the same nodes. History is a signed commit DAG with diff / 3-way merge. Retrieval is explicit 3-lane RRF with token-budget telemetry, not auto-routed.
Where Cognee clearly wins
- Drop-in ingest. Hand it a PDF, conversation, or URL; get a typed knowledge graph. mnem expects you to model what you want stored.
- Auto-routing retrieval. The router picks vector vs graph vs hybrid for you. mnem makes you choose.
- Multi-LLM-provider support. OpenAI, Anthropic, Gemini, Ollama, others work with minimal config.
- Cloud + self-host parity. Cognee Cloud is managed; the OSS library works standalone.
- Rich ontology derivation. LLM derives the ontology from the corpus rather than forcing one upfront.
- Session memory primitive.
session_idis first-class with a background sync to the long-term graph.
Where mnem clearly wins
- No LLM in the write path. Deterministic, replayable, fuzz-tested.
Cognee’s
cognifystep is non-deterministic by design. - Content-addressed identity. Same input -> same CIDs across machines. Cognee’s extracted node IDs are extraction-run-dependent.
- Real commit DAG. Branch, diff, 3-way merge, signed Ed25519 history. Cognee has graph state, not a commit history.
- Embedded, single binary. ~40 MB Docker image. No external graph DB or vector store to operate.
- WASM target.
mnem-coreships towasm32unchanged. - Token-budget retrieval metadata. First-class on every response.
When to pick Cognee, when to pick mnem
Pick Cognee if: you want PDF / URL / document -> typed knowledge graph in 6 lines, you are happy with an LLM at ingest, you want auto- routed retrieval, or you want a managed cloud option.
Pick mnem if: you need deterministic ingest, content-addressed identity, a real commit DAG, embedded / single-binary deployment, or WASM / edge targets.
Sources
- Cognee repo, sha
f4964c31db04onmain, 2026-04-26: https://github.com/topoteretes/cognee - Cognee README (“About Cognee”, Quickstart Steps 1-3, “Use with AI Agents”, “Deploy Cognee”): https://github.com/topoteretes/cognee/blob/main/README.md
- Cognee docs: https://docs.cognee.ai
- Cognee Cloud: https://www.cognee.ai
- mnem benchmarks:
/benchmarks/proofs/v0.1.0/ - mnem README:
/README.md
mnem vs Letta
Letta: “Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.” (repo description, letta-ai/letta) mnem: a content-addressed graph substrate that stores the memory an agent uses, without assuming the agent.
At a glance
| mnem | Letta | |
|---|---|---|
| License | Apache-2.0 | Apache-2.0 |
| Stars | small / pre-launch | 22,305 (GitHub API, 2026-04-26) |
| Embedded / Server | embedded | server (Letta API) + Letta Code CLI |
| LLM at ingest | no | yes; the agent is the writer |
| Content-addressed | yes | no (DB row IDs) |
| Bitemporal | no | partial (Letta tracks message timestamps) |
| WASM target | yes | no (Python server) |
| MCP server | yes | yes (Letta supports MCP integrations) |
| Hybrid retrieval | yes | recall + archival memory; not headlined as hybrid |
| Token-budget retrieval metadata | yes | not exposed |
| 3-way merge | yes | no |
| Reproducible benchmarks in-repo | yes | partial (Letta leaderboard external) |
Feature comparison
| # | Dimension | mnem | Letta | Source |
|---|---|---|---|---|
| 1 | Product shape | memory substrate | agent platform (agents + memory + tools + runtime) | Letta README sha bb52a8900a79 |
| 2 | Memory model | open graph of content-addressed nodes + edges | tiered: core blocks (in-context) + recall + archival | MemGPT paper arXiv:2310.08560 |
| 3 | Who writes memory | the application | the agent itself, via tool calls | Letta docs |
| 4 | LLM at ingest | none | yes; agent decides what to write, promote, evict | MemGPT paper |
| 5 | Identity | content CID | DB row IDs | Letta SDK |
| 6 | History | signed commit DAG | standard DB state with timestamps | Letta SDK |
| 7 | Conflict resolution | 3-way merge | agent-to-agent messaging | Letta docs |
| 8 | Scoping | open (any node label) | agent_id first-class | Letta API |
| 9 | Vector lane | HNSW via mnem-ann | recall / archival via configurable embedder | Letta docs |
| 10 | Sparse lane | BM25 + SPLADE | not first-class | Letta docs |
| 11 | Graph lane | first-class | not first-class | Letta docs |
| 12 | Bindings | Rust + Python + TS + HTTP + CLI + MCP | Python + REST + Letta Code CLI (Node 18+) | Letta README |
| 13 | Cloud | none yet | hosted Letta API + free dashboard | https://docs.letta.com |
| 14 | Model agnosticism | yes (provider-not-tactic) | “fully model-agnostic; recommends Opus 4.5 / GPT-5.2” | Letta README |
| 15 | Headline use-case | agent-memory substrate | “stateful agents that learn and self-improve” | Letta repo description |
Benchmarks (where comparable)
Letta publishes a model leaderboard at leaderboard.letta.com ranking
LLMs on Letta’s agent benchmarks (multi-turn, tool-use, reasoning).
This measures models inside the Letta agent, not retrieval quality of
a memory layer. mnem’s benchmarks measure retrieval R@K over corpora,
not agent task success.
The two systems are not directly comparable on a single number. Letta’s “how well does this LLM run my agent” answers a different question from mnem’s “how well does the substrate retrieve under a fixed embedder.”
mnem’s retrieval numbers under ONNX MiniLM-L6-v2:
| Benchmark | Split | Metric | mnem |
|---|---|---|---|
| LongMemEval | 500 Q | R@5 session | 0.966 |
| LoCoMo | 1986 Q | R@5 session | 0.726 |
| ConvoMem | 250 Q | Avg recall | 0.976 |
Latency (where measured)
| System | Setup | Latency |
|---|---|---|
| mnem | LongMemEval 500 Q, MiniLM ONNX | 711 ms mean retrieve |
| mnem | LoCoMo 1986 Q, MiniLM ONNX | 333 ms mean retrieve |
| Letta | varies wildly with model + tool-use depth | not headlined |
Letta’s user-perceived latency is dominated by the agent loop, not the memory tier. Different mechanism; not comparable.
Architecture differences
Letta is the platform descended from MemGPT. The headline pattern is
tiered memory: core memory blocks held in the LLM’s context window,
recall memory (recent conversation history) accessible via tool calls,
and archival memory (long-term store) similarly accessed by tool. The
agent itself decides what to promote and evict, using the LLM’s own
reasoning. Letta ships as a Python framework, a hosted API, and a
local CLI (letta via @letta-ai/letta-code). The product
optimisation is “give an LLM persistent memory and let it manage the
tiers.”
mnem is one layer below that. There is no agent in mnem. mnem is a
graph substrate: content-addressed nodes and edges, signed commit
history, 3-way merge, hybrid retrieval. If you wanted to build the
MemGPT pattern on top of mnem, you would ship: core_blocks as a
small ad-hoc graph, recall as an HNSW lane over recent commits,
archival as the full graph traversal lane. mnem doesn’t impose any
of that; it gives you the storage primitives and lets you choose the
agent shape.
Where Letta clearly wins
- The agent is in the box. Drop in Letta, you have an agent with memory and tool-use today. mnem requires you to bring your own agent / framework.
- The MemGPT brand and lineage. Anyone reading the agent-memory literature has seen the paper. Letta’s the canonical implementation.
- Hosted API + leaderboard. Comparing models on Letta’s harness is one click.
- Skills + subagents. Bundled patterns for advanced memory and continual learning.
- Multi-agent reconciliation via messaging. Agent-to-agent conversations are first-class.
Where mnem clearly wins
- No agent assumed. Letta’s memory belongs to a Letta agent; mnem’s memory belongs to your application. Port your agent framework next year, the data stays.
- No LLM in the write path. Letta writes to memory through the agent (LLM tool calls). mnem writes deterministically.
- Content-addressed identity. Same fact = same CID across machines.
- Real commit DAG. Diff, log, 3-way merge, signed history. Letta has DB state, not commits.
- Structural multi-agent merge. Two agents working offline in the same scope reconcile by 3-way graph merge, not by chat messages.
- WASM, embedded, single binary. Ship to the edge. Letta is a Python server.
- Hybrid 3-lane retrieval with token-budget metadata. Explicit
RRF over dense / sparse / graph;
tokens_usedper response.
When to pick Letta, when to pick mnem
Pick Letta if: you want the MemGPT pattern in a box, you want a ready-made agent platform with skills and subagents, you want to use Letta’s leaderboard to pick a model, or you are building a single stateful agent rather than a multi-application substrate.
Pick mnem if: you want the memory layer separate from the agent, you need content-addressing and a commit DAG, you are running multiple agent frameworks against the same store, or you need embedded / edge / WASM deployment.
Sources
- Letta repo, sha
bb52a8900a79onmain, 2026-04-26: https://github.com/letta-ai/letta - Letta README (Letta Code CLI, Letta API, model-agnostic note): https://github.com/letta-ai/letta/blob/main/README.md
- Letta docs: https://docs.letta.com
- Letta leaderboard: https://leaderboard.letta.com
- MemGPT paper, arXiv:2310.08560: https://arxiv.org/abs/2310.08560
- mnem README:
/README.md
mnem vs graphify
graphify: “AI coding assistant skill (Claude Code, Codex, OpenCode, …). Turn any folder of code, docs, papers, images, or videos into a queryable knowledge graph.” (repo description, safishamsi/graphify) mnem: a content-addressed graph substrate. graphify builds a graph from a folder; mnem is the graph.
At a glance
| mnem | graphify | |
|---|---|---|
| License | Apache-2.0 | MIT |
| Stars | small / pre-launch | 35,262 (GitHub API, 2026-04-26) |
| Embedded / Server | embedded | one-shot CLI; outputs static graph.json + HTML |
| LLM at ingest | no | yes (Claude subagents extract concepts + relationships) |
| Content-addressed | yes | no (NetworkX node IDs) |
| Bitemporal | no | no |
| WASM target | yes | no (Python + faster-whisper + Claude API) |
| MCP server | yes | no native MCP; integrates via AGENTS.md / hooks / Supermemory MCP |
| Hybrid retrieval | yes | partial (Leiden communities; no vector DB) |
| Token-budget retrieval metadata | yes | no |
| 3-way merge | yes | no (re-runs against SHA256 cache) |
| Reproducible benchmarks in-repo | yes | no (benchmarking pointer is the memorybench skill which targets supermemory) |
Feature comparison
| # | Dimension | mnem | graphify | Source |
|---|---|---|---|---|
| 1 | Product shape | runtime substrate (CLI / HTTP / MCP / Python / Rust) | one-shot CLI skill that emits a static graph artefact | graphify README “How it works” sha 770d7f54c40d |
| 2 | Inputs | text, code, conversations (anything you commit) | code, docs, papers, images, videos (multimodal) via tree-sitter + Whisper + Claude | graphify README |
| 3 | Ingest pipeline | parse + chunk + statistical extract -> commit | three passes: AST extract -> Whisper transcribe -> Claude subagents extract concepts | graphify README “How it works” |
| 4 | LLM requirement | optional | yes (Claude subagents are central) | graphify README |
| 5 | Identity | BLAKE3 CID over DAG-CBOR | NetworkX node IDs | graphify implementation |
| 6 | Output | live graph + retrieve API | static graph.html, GRAPH_REPORT.md, graph.json, cache/ | graphify README directory listing |
| 7 | Retrieval | 3-lane RRF (vector + sparse + graph) | graph-topology (Leiden communities) and /graphify query slash command | graphify README |
| 8 | Vector DB | HNSW via mnem-ann | none (graph-topology-based clustering, no embeddings) | graphify README “Clustering is graph-topology-based” |
| 9 | Re-ingest | commit append | SHA256 cache only re-processes changed files | graphify README directory listing |
| 10 | Tags on relations | edge labels | EXTRACTED / INFERRED / AMBIGUOUS confidence tags | graphify README |
| 11 | Always-on assistant integration | MCP + mnem integrate | platform-specific: Claude Code PreToolUse hook, Cursor alwaysApply rule, AGENTS.md | graphify README “Make your assistant always use the graph” |
| 12 | Supported AI clients | MCP (Claude Desktop, Cursor, Zed, etc.) | 15+ named installers (Claude Code, Codex, Cursor, Aider, Gemini, Copilot CLI, …) | graphify README install table |
| 13 | Versioning | signed commit DAG | none; re-run produces a fresh graph | graphify implementation |
| 14 | License | Apache-2.0 | MIT | repo metadata |
| 15 | Cloud | none yet | Supermemory API integration documented in README (“Build with Supermemory”) | graphify README |
Benchmarks (where comparable)
Not directly comparable. graphify produces a static knowledge graph artefact for an assistant to read, not a retrieval API benchmarked on LongMemEval / LoCoMo / etc. Their headline number is a token-efficiency claim (“71.5x fewer tokens per query vs reading the raw files”), measured against a different baseline than mnem’s R@K-on-public-corpora methodology.
graphify’s README points at the memorybench skill
(npx skills add supermemoryai/memorybench) for benchmarking, but
that harness is supermemory-tilted by default; running mnem through
it would require an adapter we have not built.
mnem’s measured retrieval numbers under ONNX MiniLM-L6-v2:
| Benchmark | Split | Metric | mnem |
|---|---|---|---|
| LongMemEval | 500 Q | R@5 session | 0.966 |
| LoCoMo | 1986 Q | R@5 session | 0.726 |
| ConvoMem | 250 Q | Avg recall | 0.976 |
Latency (where measured)
| System | Setup | Latency |
|---|---|---|
| mnem | LongMemEval 500 Q, MiniLM ONNX | 711 ms mean retrieve |
| mnem | LoCoMo 1986 Q, MiniLM ONNX | 333 ms mean retrieve |
| graphify | retrieval is “open graph.json and traverse” by an LLM | not measured by them |
graphify’s user-perceived latency at query time is “LLM reads
GRAPH_REPORT.md then traverses graph.json,” which is a different
loop entirely.
Architecture differences
graphify is a one-shot CLI you point at a folder. It runs a
deterministic AST pass (tree-sitter, 25 languages), transcribes audio
and video with faster-whisper using a domain-aware prompt, then
runs Claude subagents in parallel to extract concepts and
relationships from docs, papers, images, and transcripts. Results are
merged into a NetworkX graph, clustered with Leiden community
detection (no embeddings; graph topology is the similarity signal),
and exported as graph.html (interactive viewer), GRAPH_REPORT.md
(plain-language audit), graph.json (queryable), and a SHA256 cache
for incremental re-runs. Coding assistants integrate via platform-
specific install commands that wire the graph into rules / hooks /
AGENTS.md so the assistant always considers it before searching raw
files.
mnem is a runtime substrate, not a one-shot extractor. You commit nodes and edges, then retrieve via a 3-lane fused query (HNSW dense + BM25 / SPLADE sparse + graph traversal) under explicit RRF weights and a token budget. There is no LLM in the write path. Identity is a content CID, history is a signed commit DAG, and the graph evolves by commit + diff + merge rather than by re-extraction. mnem ships embedded; the same Rust core compiles to WASM unchanged.
The two are complementary more than competitive: graphify is a great ingestor for the kind of corpora mnem stores (run graphify on a folder, ingest the resulting graph into mnem). They do not solve the same problem.
Where graphify clearly wins
- Multimodal ingest in one command. Code, PDFs, markdown, screenshots, diagrams, whiteboard photos, video, audio, in 25 languages via tree-sitter. mnem ingests text and structured commits; you bring your own multimodal extractor.
- Coding-assistant integration breadth. 15+ named platforms with install commands (Claude Code, Codex, OpenCode, Copilot CLI, VS Code Copilot Chat, Aider, Cursor, Gemini CLI, OpenClaw, Factory Droid, Trae, Hermes, Kiro, Antigravity).
- PreToolUse hooks. Inject “the graph exists, read it first” into Claude Code / Codex / OpenCode tool flows automatically.
- Static, portable artefact.
graph.htmlopens in any browser;graph.jsonships in a repo; auditors readGRAPH_REPORT.md. - Confidence tags on relations.
EXTRACTED/INFERRED/AMBIGUOUSlets a reader filter what was found vs guessed. - No vector DB needed. Leiden over graph topology produces communities without embeddings.
Where mnem clearly wins
- Runtime, not one-shot. mnem keeps serving as your corpus grows; graphify is a re-extract loop.
- No LLM in the write path. graphify’s concept extraction is Claude-subagent-driven by design. mnem ingests deterministically.
- Content-addressed identity + commit DAG. Stable identity across re-runs and machines; full diff / 3-way merge. graphify regenerates a fresh NetworkX graph.
- Hybrid retrieval API. Vector + sparse + graph fused with token- budget metadata. graphify exposes traversal slash-commands but no retrieval API.
- Embedded + WASM. Same retrieval logic in Rust, Python, TS, MCP, Workers, Lambda. graphify is a Python CLI.
- License. Apache-2.0 vs MIT (both permissive; matters in some corporate review contexts).
When to pick graphify, when to pick mnem
Pick graphify if: you want a folder -> queryable knowledge graph in one command, you need multimodal extraction (video / audio / images), you want an always-on coding-assistant integration, or you need a static artefact you can ship in a repo.
Pick mnem if: you need a runtime memory substrate, you require deterministic ingest, you want content-addressed identity and a real commit DAG, you are building a product (not a personal coding assistant), or you need embedded / WASM deployment.
You can also pair them: graphify as the multimodal extractor, mnem as the runtime substrate that holds the resulting graph and serves queries.
Sources
- graphify repo, sha
770d7f54c40donv5, 2026-04-26: https://github.com/safishamsi/graphify - graphify README (“How it works”, “Install”, “Make your assistant always use the graph”, “Benchmarks”): https://github.com/safishamsi/graphify/blob/v5/README.md
- mnem README:
/README.md - mnem benchmarks:
/benchmarks/proofs/v0.1.0/
Migrating to 0.1.0
0.1.0 is the first public release. There is no prior public version; this document exists for completeness and to fix the upgrade pathway forward.
Summary
If you are arriving at 0.1.0 fresh: skip this page. There is nothing to migrate.
What 0.1.0 establishes
| Surface | Stability guarantee |
|---|---|
| CLI subcommand names | stable; deprecation cycle for breaking changes |
HTTP /v1/* routes | stable; new routes additive only |
| MCP tool names + schemas | stable |
<repo>/.mnem/config.toml keys | stable |
| Object encoding (DAG-CBOR + CIDs) | stable; bump = new schema version |
| Internal Rust crate APIs | unstable pre-1.0 |
What may change between 0.x minor releases
- New optional config keys (with defaults).
- New retrieval lanes / scoring modes (off-by-default).
- New CLI subcommands; existing subcommands keep semantics.
- Sidecar formats (re-build required, node CIDs unchanged).
Upgrading from a 0.x.y to 0.x.(y+1)
cargo install --locked mnem-cli # or platform package manager
mnem --version
mnem doctor # config sanity check
No data migration needed within 0.x.
Future migrations
When 1.0 lands, this directory will gain a v0.x-to-v1.0.md doc covering
any on-disk schema changes. Until then, treat 0.x as the pre-launch line
where patch releases are non-breaking and minors may introduce additive
changes.