Architecture
Memori is a modular memory layer for AI applications. You connect your LLM client, set attribution, point Memori at your database, and it handles everything else — storage, augmentation, knowledge graph construction, and recall. All data stays on your infrastructure.
System Overview
Core Components
Memori Core — The central coordinator between your application and your database. Manages attribution, coordinates storage and augmentation, provides LLM wrappers, and exposes the Recall API.
LLM Provider Wrappers — Wraps your existing LLM client transparently. Intercepts calls, captures messages and responses, persists conversation data to your database. Supports sync, async, and streaming.
Attribution System — Tags every memory with who created it and in what context. Tracks three dimensions: entity (the user), process (the agent), and session (the conversation thread).
Storage System — Stores all data in your database with no external dependencies. Supports SQLAlchemy sessionmaker (PostgreSQL, MySQL, SQLite, Oracle), DB-API 2.0 connections, Django ORM, and MongoDB.
Advanced Augmentation — Turns raw conversations into structured memories. Extracts facts, preferences, and skills, generates vector embeddings locally, and builds a knowledge graph. Runs asynchronously with zero latency impact.
Configuration
Setting up Memori requires a database connection and attribution:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from memori import Memori
from openai import OpenAI
engine = create_engine("sqlite:///memori.db")
SessionLocal = sessionmaker(bind=engine)
client = OpenAI()
mem = Memori(conn=SessionLocal).llm.register(client)
mem.attribution(entity_id="user_123", process_id="my_agent")
mem.config.storage.build()
Data Flow
-
Conversation Capture — Every LLM call through the wrapped client is captured and stored in your database. Your app gets the response immediately.
-
Attribution Tracking — Attribution links every conversation to a specific entity and process so memories are properly scoped and indexed.
-
Augmentation — After a conversation completes, Memori processes it asynchronously — extracts facts, generates embeddings locally, and builds knowledge graph triples.
-
Recall — On the next LLM call, Memori embeds the query locally, performs vector similarity search against your database, and injects the most relevant memories into the system prompt.