Architecture

Memori is a modular memory layer for AI applications. You connect your LLM client, set attribution, point Memori at your database, and it handles everything else — storage, augmentation, knowledge graph construction, and recall. All data stays on your infrastructure.

System Overview

Core Components

Memori Core — The central coordinator between your application and your database. Manages attribution, coordinates storage and augmentation, provides LLM wrappers, and exposes the Recall API.

LLM Provider Wrappers — Wraps your existing LLM client transparently. Intercepts calls, captures messages and responses, persists conversation data to your database. Supports sync, async, and streaming.

Attribution System — Tags every memory with who created it and in what context. Tracks three dimensions: entity (the user), process (the agent), and session (the conversation thread).

Storage System — Stores all data in your database with no external dependencies. Supports SQLAlchemy sessionmaker (PostgreSQL, MySQL, SQLite, Oracle), DB-API 2.0 connections, Django ORM, and MongoDB.

Advanced Augmentation — Turns raw conversations into structured memories. Extracts facts, preferences, and skills, generates vector embeddings locally, and builds a knowledge graph. Runs asynchronously with zero latency impact.

Configuration

Setting up Memori requires a database connection and attribution:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from memori import Memori
from openai import OpenAI

engine = create_engine("sqlite:///memori.db")
SessionLocal = sessionmaker(bind=engine)

client = OpenAI()
mem = Memori(conn=SessionLocal).llm.register(client)
mem.attribution(entity_id="user_123", process_id="my_agent")
mem.config.storage.build()

Data Flow

Conversation Capture — Every LLM call through the wrapped client is captured and stored in your database. Your app gets the response immediately.
Attribution Tracking — Attribution links every conversation to a specific entity and process so memories are properly scoped and indexed.
Augmentation — After a conversation completes, Memori processes it asynchronously — extracts facts, generates embeddings locally, and builds knowledge graph triples.
Recall — On the next LLM call, Memori embeds the query locally, performs vector similarity search against your database, and injects the most relevant memories into the system prompt.