Designing multi-agent pipelines with shared state — how are you approaching it?

I’ve been working with LLM-based agents arranged as small pipelines (planner → executor → reviewer, sometimes with tools in between), and the hardest part so far hasn’t been the models themselves, it’s managing shared state across steps.

Once pipelines include branching, retries, or tool calls, context ends up spread across prompts, intermediate artifacts, and bits of application code. At that point, it becomes hard to reason about agent behavior or reproduce failures consistently.

I’ve been experimenting with making the pipeline and state more explicit — treating the workflow as a graph where agents read from and write to a shared spec/state instead of relying purely on implicit prompt passing. I’ve been testing this using a tool called Zenflow to see if it helps with inspectability and determinism, but I’m still evaluating the approach.

I’m curious how others here are structuring agent pipelines today. Are you managing state manually in code, using external stores/state machines, or relying on higher-level frameworks? Would love to hear what’s worked or not for you.

1 Like

I think the commonly used methods these days are like this. (Based on searches and my own resources)

I was able to do it when I built a new file system for Ai when I added multi tenant session storage, I can also make multiple agents think as one how ever it’s quite technical to explain. I will how ever copy paste the information document which explains everything. I am honestly super close to releasing the first beta. I just have to see a lawyer first to finalize the licensing for me and then I’ll post the free dev license details on huggingface asap.

# AIFS AI File System
Technical overview plus token savings report


Author
Astto Scott
Triskel Data Pty Ltd
ABN 22686914376
Contact [email protected]

## Executive summary
AIFS is a local persistence engine designed for agents and chat systems.
It externalizes memory out of the language model context and into an indexed local store.
When an AI process restarts, it can resume by reading the durable session state from disk and retrieving only the relevant memory slices for the next turn.

The product goal is simple.
1. Persist every session indefinitely on local storage.
2. Provide fast lexical and semantic retrieval over that history.
3. Reduce repeated prompt context tokens by replacing full history replay with targeted retrieval.
4. Avoid vendor lock in by exposing a stable HTTP API plus SDKs.

## What AIFS does
AIFS provides these capabilities.
1. Multi tenant session storage with append only event logs.
2. Durable blob storage keyed by digest for large artifacts.
3. Search and retrieval over stored events using two index types.
   1. SQLite full text search for lexical recall.
   2. A vector index for semantic recall when embeddings are enabled.
4. Tools and presets endpoints to support agent style workflows.
5. Admin endpoints for tenant management, config, reindex, retention, and license status.
6. Offline licensing using signed license files, no phone home required.

## Core concepts
1. Tenant
   1. A tenant is an isolated namespace for sessions, blobs, config, and audit logs.
   2. A tenant is selected by the request header `X_Tenant_ID` or a default tenant.
2. Session
   1. A session is a durable conversation or agent run.
   2. Each session stores an append only log of events.
3. Event
   1. An event is a JSON object written as one line in a JSONL file.
   2. Typical fields include timestamp, event id, kind, type, session id, user id, text, and metadata.
   3. Deletions are handled via tombstone events rather than rewriting history.
4. Blob
   1. A blob is an immutable payload stored once, addressed by a sha256 digest.
   2. Events can reference blobs by digest.
5. Index
   1. Indexes provide retrieval over event text fields and selected metadata.
   2. Indexes are maintained asynchronously by the ingest daemon and embedding worker.
6. License and entitlements
   1. A license defines caps and feature switches such as tenant limits, session limits, quota bytes, and workers allowed.
   2. Licenses are verified offline using ed25519 signatures and configurable public keys.

## On disk layout
AIFS stores everything under a single base directory.

Base directory
1. If `AIFS_FS_BASE` is set, it is used.
2. Otherwise the default is `var/aifs`.

Recommended operational base directory
1. For Linux hosts use a persistent path such as `/var/lib/aifs`.
2. For containers mount a volume and set `AIFS_FS_BASE` to the mounted path.

Directory layout overview
```text
{AIFS_FS_BASE}/
  admin/
    license.json                   global license, optional
  index/
    fts.sqlite                     SQLite FTS index
    vec.sqlite                     vector index, optional
  tenants/
    {tenant}/
      .usage.json                  cached quota scan results and counters
      admin/
        config.json                tenant config
        license.json               tenant license, optional override
        retention.json             retention policy, if configured
      sessions/
        {session_id}/
          events.jsonl             append only session log
      blobs/
        sha256/
          ab/
            cd/
              <digest>             blob payload
      audit/
        audit.jsonl                append only audit log
      jobs/
        jobs.jsonl                 retention and maintenance logs, if enabled
```

Key properties
1. Every tenant is a directory.
2. Every session is a directory.
3. The main session record is append only JSONL, which is friendly to streaming and incremental indexing.
4. Blobs are content addressed, enabling dedupe across events inside a tenant.

## Durability model
AIFS is designed to be restart safe on commodity filesystems.

1. Events are appended to `events.jsonl`.
2. Metadata files use atomic write semantics.
   1. Write to a temp file.
   2. fsync the file.
   3. rename to the target name.
   4. fsync the parent directory.
3. Index state is stored in SQLite files under the index directory.
4. If indexing lags, the source of truth is still the event log on disk.

## Write path
This is the typical flow when your application writes memory.

1. Create or select a session.
   1. `POST /v1/sessions` creates a session and writes a session start event.
2. Append events.
   1. `POST /v1/sessions/events` appends one or more events to the session log.
3. Store large payloads as blobs.
   1. `POST /v1/blobs/sign` provides a signed upload intent when enabled.
   2. `POST /v1/blobs/upload` writes a blob payload.
   3. The blob path is derived from the sha256 digest and stored once.
4. Quota enforcement
   1. Writes can reserve quota by updating `.usage.json` and may reject writes when a cap is exceeded.
   2. Periodic rescans reconcile cached usage with actual disk usage.

## Read path
This is the typical flow when your application reconstructs state to respond.

1. Read session log
   1. `GET /v1/sessions/events` streams events for a session.
2. Retrieve relevant memory
   1. `POST /v1/search/hybrid` queries both lexical and semantic indexes.
   2. Results are returned as event slices with source metadata.
3. Build the prompt
   1. Instead of sending the full session transcript, your app sends a small fixed retrieval budget.
   2. This fixed budget is what drives token savings as sessions grow.

## Indexing pipeline
Indexing is asynchronous by design so writes stay fast.

1. Ingest daemon
   1. A process scans tenant session logs and tracks file offsets.
   2. New lines are parsed and indexed.
   3. Tombstone events remove or hide prior events in indexes.
2. Embedding worker
   1. When vector indexing is enabled, selected text fields are embedded.
   2. Vectors are stored in the vector index backend.
3. Reindex and repair
   1. Admin reindex can rebuild indexes from the source logs.
   2. The fsck tool validates consistency and can quarantine orphans.

## Maintenance and support
AIFS includes operational tooling so environments can run unattended.

1. Retention runner
   1. Applies retention policies per tenant.
   2. Removes expired audit entries, jobs logs, temp data, and optionally sessions.
2. Garbage collection
   1. Identifies unreferenced blobs.
   2. Deletes or quarantines them based on policy.
3. fsck tool
   1. Validates that session logs, indexes, and blob references are consistent.
   2. Fix mode quarantines orphans rather than deleting immediately.
4. Support bundle tool
   1. Generates a sanitized diagnostics bundle that avoids secrets.
   2. Includes config view, license status, usage summary, and metrics snapshot.

## Security model
AIFS is designed to be safe for local deployment.

1. Transport security
   1. HTTPS is supported.
   2. Optional mutual TLS is supported using a client CA.
2. Auth
   1. API key auth is supported.
   2. JWT and OIDC can be supported based on configured mode.
   3. Startup validation prevents ambiguous auth configuration.
3. Offline licensing
   1. Licenses are verified using ed25519 signatures and embedded public keys.
   2. Key rotation is supported by allowing multiple public keys and an optional key id in the payload.
4. Log redaction
   1. Sensitive headers and secret like fields are redacted from logs and audit.

## Integration patterns
The standard integration pattern is retrieval augmented continuity.

1. On each user turn
   1. Append the user message and any tool output as events.
   2. Query AIFS for relevant prior events for this turn.
   3. Build a prompt containing only:
      1. system instructions
      2. the current user request
      3. a fixed set of retrieved memories
      4. a short recent window if desired
   4. Call your model provider.
   5. Append the assistant response as an event.
2. On restart
   1. Load the session metadata or read the event stream.
   2. Continue appending events to the same session id.
3. Why this becomes infrastructure
   1. It separates durable state from compute.
   2. It works with any model provider because it sits on the outside of the model call.
   3. It makes long lived agents practical without context replay costs.

## Token savings report
This section shows why token usage reduction improves as sessions grow.

### Baseline problem
If you do not have external memory, the usual approach is to resend a large portion of the conversation history on every turn.
This makes the input tokens per turn grow with the session length.
Across many turns, total tokens grow roughly quadratically.

### AIFS approach
With AIFS, you store the full history locally and send only a fixed retrieval budget to the model each turn.
This makes total tokens grow roughly linearly with the number of turns.

### Definitions
Let:
1. n be the number of turns in a session
2. M be the average number of tokens added to history per turn
3. B be constant per call tokens such as system plus framing
4. W be the fixed retrieval budget tokens sent per call when using AIFS

Baseline total input tokens across a session
1. Total baseline is approximately:
```text
TotalBaseline = n*B + M*n*(n+1)/2
```

AIFS total input tokens across a session
1. Total AIFS is approximately:
```text
TotalAIFS = n*(B + W)
```

Reduction percentage
```text
ReductionPercent = 1 - TotalAIFS / TotalBaseline
```

### Example scenario
Assumptions for this example
1. B equals 500 tokens
2. M equals 650 tokens per turn added to history
3. W equals 2000 tokens per call retrieval budget

Total token comparison for one session
```text
turns  baseline_tokens_total  aifs_tokens_total  reduction_percent
10     40750                 25000             38.7
20     146500                50000             65.9
50     853750                125000            85.4
100    3332500               250000            92.5
200    13165000              500000            96.2
500    81662500              1250000           98.5
```

Per call reduction improves with session age
```text
turn  baseline_tokens_call  aifs_tokens_call  reduction_percent
5     3750                 2500             33.3
10    7000                 2500             64.3
20    13500                2500             81.5
50    33000                2500             92.4
100   65500                2500             96.2
200   130500               2500             98.1
```

### Embedding overhead
Vector search requires embeddings. Embeddings add tokens to the embeddings provider, but each text segment is embedded once and then reused for retrieval.
This overhead is linear, while the baseline context replay cost is quadratic.

Illustrative net savings when embeddings consume 200 tokens per turn
```text
turns  embed_tokens_total  net_tokens_saved  net_reduction_percent
10     2000               13750            33.7
20     4000               92500            63.1
50     10000              718750           84.2
100    20000              3062500          91.9
200    40000              12625000         95.9
500    100000             80312500         98.3
```

How to interpret this
1. Early in a session, savings can be modest because history has not grown yet.
2. As the session grows, savings quickly exceeds 60 percent and approaches 95 percent or more.
3. The older the session, the better the reduction because AIFS keeps the per call prompt budget roughly constant.

### How to compute your real savings
You can compute savings using real telemetry from your application.

1. Measure baseline tokens
   1. Track input tokens sent to your model provider for each call without external memory.
2. Measure AIFS tokens
   1. Track input tokens sent with AIFS retrieval enabled and a fixed budget W.
3. Compute reduction
   1. Use the reduction formula above over the same workload.

Practical guidance
1. Start with a retrieval budget W in the 1000 to 4000 token range.
2. Keep W fixed so savings scale with time.
3. Increase W only when it improves task quality.

## Selling points, phrased for buyers
1. Restart continuity
   1. Agents resume where they left off because session state is durable on disk.
2. Token cost reduction
   1. Replace repeated full context replay with fixed budget retrieval.
   2. Reduction improves as sessions age.
3. Vendor neutral infrastructure
   1. Works with any LLM because it does not replace the model, it feeds the model.
4. Data locality
   1. Memory stays on the customer host.
   2. No central service required.
5. Operationally simple
   1. One base directory.
   2. One API service plus optional ingest and worker processes.
   3. Maintenance tools for retention and repair.

## Quickstart for evaluation
1. Start services
```sh
docker compose up --build
```

2. Call status
```sh
curl http://localhost:8000/v1/status
```

3. Use SDK examples
1. Python
```sh
python examples/python/basic.py
```
2. Typescript
```sh
node examples/typescript/basic.ts
```

## Appendix: Key endpoints
Core
1. `POST /v1/sessions`
2. `POST /v1/sessions/events`
3. `POST /v1/search/hybrid`
4. `POST /v1/blobs/upload`
5. `GET /v1/status`

Admin
1. `GET /v1/admin/license`
2. `GET /v1/admin/auth/status`
3. `POST /v1/admin/reindex`
4. `POST /v1/admin/retention/run`
5. `GET /v1/admin/retention/status`

End
1 Like