Algroveon Agent – An assistant without memory is just a chatbot

The Core Problem: An Assistant Without Memory Loses the Thread

An LLM "remembers" nothing. There is no permanent knowledge, no stable state, and no true continuity between individual requests. Everything a model needs to know about a person, a task, or a preference must be contained within the context of the current request. And this context is limited. Especially with local models, the practical limit in everyday use is often significantly lower than what is possible on paper—not just because of the model itself, but also due to computational effort, memory requirements, and speed.

For an assistant to be truly useful in daily life, this is a central problem:

An agreement made in a conversation a week ago? Gone.
Learned preferences from the last session? Gone.
The context of a project you have been working on for weeks? Gone.

The answer from Algroveon-Agent to this problem is a four-layer memory system. This post explains how it is structured, how the assistant's multi-stage working logic interacts with it, and which problems have emerged in practice.

The Four Layers of Memory

Layer 1: Session Log (ChatHistoryStore)

The session log is the working layer. Every user message, every model response, every tool call, and every tool result is immediately written to a SQLite database. This is not an extra feature, but the foundation of the system.

Why write immediately instead of at the end of a session? Because sessions can crash. A server restart, a browser crash, or an interrupted request is enough. If the state only exists in RAM, it is lost. Therefore, the session log writes every entry in such a way that a restored session can continue without interruption.

When opening an existing session, the latest messages are loaded from the session log and passed to the model as context. This creates conversational continuity within an active session.

Layer 2: Session Memory (Volatile Note List)

Session memory consists of short notes that the model can store for the duration of a session. It is not long-term memory—once the session ends, these entries disappear. Essentially, it is a notepad: "For this task, I need file X" or "The user wants the result in format Y today."

The model can actively write to and read from session memory via a memory tool. The distinction from long-term memory is deliberately clear: no approval, no SourceTag gate, immediate write access. The scope is explicitly ephemeral.

Layer 3: Long-Term Memory

Long-term memory consists of permanent information that remains available across sessions. It is passed to the model as part of the context upon login—even before the first user message is received. This is the layer that turns a simple dialogue system into an assistant with true continuity.

Two sources fill this long-term memory:

SessionSummarizer: After logout, a summarizer compresses the completed session into a maximum of five sentences. This summary is stored as a long-term memory item. During the next login session, up to five such summaries are loaded into the context as seed_messages—before the actual chat history. This allows the model to immediately have a rough idea of what occurred in previous sessions, which topics were active, and what agreements were already in place.

ProfileFacts: During conversations, the agent can identify facts about the user. These are not adopted silently. The user receives a confirmation request, and the fact is only stored persistently after explicit consent. This simultaneously serves as a safeguard against memory poisoning and unwanted permanent assumptions.

Layer 4: Retrieval Index (Hybrid Search)

For larger amounts of information, it is no longer sufficient to push everything directly into the context. That is where the retrieval index comes in. At its core, it answers a simple question: "Which of this is truly relevant to the current request?"

The index combines two methods:

FTS5 (SQLite Full-Text Search): keyword-based search, very fast, good for exact terms.
Vector Similarity: Ollama nomic-embed-text generates embeddings for stored content and for the current request; semantic proximity is determined via cosine similarity using NumPy.

The hybrid retrieval combines both scores. Pure keyword matches work well for names, commands, and specific terms. Semantic search, on the other hand, finds content that is thematically relevant even if different wording was used.

The result: Relevant long-term entries are specifically pulled into the context window instead of carrying everything along permanently.

When a user logs in, the first LLM call is usually the most intensive. At this moment, the model must not only react to the current request but also be re-integrated into the correct context.

Algroveon-Agent solves this with a two-stage context header:

Stage 1 – Seed-Messages: Up to five long-term memory summaries from past sessions. This immediately gives the model a form of continuity: it knows which topics were recently relevant, which projects are open, and where previous conversations ended.

Stage 2 – Recent Context: The last 20 messages from the user's previous session. This ensures immediate conversational continuity for ongoing tasks—without having to reload the entire history.

The effect is crucial in everyday use: a session from yesterday can be continued without the user having to explain, "where we left off."

What an Agentic Loop Is: When the Assistant Executes Multiple Steps in Succession

A single tool call is relatively simple: the user asks something, the model calls a tool, the result comes back, and an answer is generated from it.

Agentic tasks are significantly more complex. An example:

"Check what my last five emails in mailboxes A and B are, and create a short overview for me."

For this, the agent needs several steps:

Call mail_list for mailbox A
Call mail_list for mailbox B
Process tool results and identify relevant mail IDs
Call mail_read for the relevant emails
Generate a coherent summary

This quickly involves several rounds of tool calls. Algroveon-Agent supports loops of up to six iterations. If no final text is output after six rounds, the loop terminates in a controlled manner. This is a deliberate choice to prevent the agent from becoming an infinite loop generator.

Loop Behavior in Practice

The loop works as follows:

1. Build context (System Prompt, Seed-Messages, Session-Log)
2. First LLM call
3. Parse response: Text or Tool-Calls?
   → Only text: finished, stream
   → Tool-Calls: continue
4. Check Tool-Calls via Policy Engine
   → BLOCKED: report error
   → APPROVAL_REQUIRED: pause, wait
   → ALLOWED: execute
5. Insert tool results into the message stack
6. Next LLM call → return to step 3

On paper, this looks quite straightforward. In practice, it becomes difficult in one specific area: Reasoning models do not always strictly adhere to this sequence.

HeadlessRunner: The Loop Without HTTP

The agentic loop is implemented entirely within a class that does not require an HTTP layer. The HeadlessRunner executes the same process as the actual endpoint—same context construction, same policy engine, same audit log, just without the SSE stream.

This enables two important use cases:

Time-scheduled tasks: An internal scheduler can trigger HeadlessRunner calls according to a schedule. The daily morning briefing is the obvious example: weather, calendar, and configured RSS feeds are compiled and sent via email—completely without direct user interaction.

Messenger Integration: Integration with messengers is fundamentally considered in the design but has not yet been implemented.

SessionSummarizer: Why After Logout?

The timing is deliberate: the summarizer does not run during the session, but only after logout.

During an active session, a continuous summarizer would be more of a distraction. It would interfere with an active work process and, if in doubt, would only create unnecessary load. After logout, however, the session is complete. The entire history is fully available and can be condensed in peace.

The summarizer receives the complete session and generates a maximum of five sentences from it. The compression ratio is typically 10:1 or higher. Five sentences from an hour of work might seem sparse at first, but in practice, they are often sufficient to allow the user to immediately get back into the topic upon the next login.

Limitations of the Current System

The memory system works and provides a noticeable benefit in everyday use. However, it also has clear limitations:

RAG for large documents is prepared but not yet active. The retrieval index exists, as does the embedding pipeline. However, a clean indexing pipeline is still missing for true Retrieval-Augmented Generation over larger volumes of documents: automatically embedding new files in the workspace, detecting changes, and reliably managing re-indexing. The foundation is present, but the actual production layer on top of it is not.

Summaries lose nuances. This refers to the SessionSummarizer mentioned in the previous section: it condenses a complete session at the end into a maximum of five sentences. This is often enough to quickly find the thread again during the next login. However, in longer or more technically deep sessions, details are inevitably lost.

The seed-message sequence continues to grow in the long term. Every completed session generates a new entry. In the medium term, the system will therefore require a second compression stage—i.e., summaries of summaries—once a user has accumulated a very large number of sessions. This is not currently an acute problem, but it is a clearly identified point of technical debt.

Conclusion

The Algroveon-Agent memory system attempts to bring the fundamental statelessness problem of LLMs into a form that is truly usable in everyday life: four layers with different lifespans, a hybrid retrieval index, a session summarizer, and an agentic loop that can cleanly process multiple tool rounds in succession.

The most important insight here is: memory in an AI system is not purely a storage problem. It is primarily a question of relevance. What is loaded, when, in what form, and to what extent? If you load too much, noise is created. If you load too little, you lack the very continuity that makes an assistant useful.

This balance was not created theoretically on a drawing board, but iteratively through operation. Every real usage session shows anew what the system truly needs—and what is better left out.

Algroveon Agent – An assistant without memory is just a chatbot

The Core Problem: An Assistant Without Memory Loses the Thread

The Four Layers of Memory

Layer 1: Session Log (ChatHistoryStore)

Layer 2: Session Memory (Volatile Note List)

Layer 3: Long-Term Memory

Layer 4: Retrieval Index (Hybrid Search)

What an Agentic Loop Is: When the Assistant Executes Multiple Steps in Succession

Loop Behavior in Practice

HeadlessRunner: The Loop Without HTTP

SessionSummarizer: Why After Logout?

Limitations of the Current System

Conclusion

More posts

AlgroveonBook – On the way to a free-thinking, local AI agent

Algroveon-AI – Running your own local LLM: Hardware and Setup

Algroveon Agent – An assistant without memory is just a chatbot

The Core Problem: An Assistant Without Memory Loses the Thread

The Four Layers of Memory

Layer 1: Session Log (ChatHistoryStore)

Layer 2: Session Memory (Volatile Note List)

Layer 3: Long-Term Memory

Layer 4: Retrieval Index (Hybrid Search)

The Two-Stage Context at Login

What an Agentic Loop Is: When the Assistant Executes Multiple Steps in Succession

Loop Behavior in Practice

HeadlessRunner: The Loop Without HTTP

SessionSummarizer: Why After Logout?

Limitations of the Current System

Conclusion

More posts

AlgroveonBook – On the way to a free-thinking, local AI agent

Algroveon-AI – Running your own local LLM: Hardware and Setup