Hi all — I’ve been building a verification-first toolkit for agent runs because most “agent debugging” today is still opinion-based. When something goes wrong, we get partial logs, screenshots, and a postmortem story — but no way for a third party to independently prove what happened, replay it, or pinpoint the exact divergence.
RFTSystems: Agent Forensics Suite turns agent behaviour into verifiable artifacts: hash-chained timelines, tamper-evident receipts, deterministic replays, and first-divergence diffs. The goal is simple: replace “trust me” with evidence you can validate.
Start here (guided entrypoint):
Full collection:
https://huggingface.co/collections/RFTSystems/rftsystems-agent-forensics-suite
What you can do right now
- Generate a verifiable proof run in under a minute (deterministic replay + exportable bundle):
ReplayProof Agent POV Verified Replay - a Hugging Face Space by RFTSystems - Record real agent sessions with chain-of-custody logging (hash-chained events across prompts/tools/memory):
Agent Flight Recorder - a Hugging Face Space by RFTSystems - Seal runs into tamper-evident receipts (download + upload to verify integrity):
RFT Memory Receipt Engine - a Hugging Face Space by RFTSystems - Diff two runs and find the first divergence (where/why they split):
TimelineDiff Differential Reproducibility - a Hugging Face Space by RFTSystems - Audit runs and state transitions (inspection cockpit):
TrustStack Console - a Hugging Face Space by RFTSystems
The workflow
learn → generate proof → record reality → seal it → diff it → audit it → benchmark it
What I’m looking for (to harden this for real-world use)
If you’re working with agents (LangGraph/LangChain/custom), I’d value:
- run bundles you’re willing to share (even tiny ones) so we can validate cross-machine reproducibility
- failure cases where you can’t explain why two “similar” runs diverged
- feedback on what you’d want in a “minimum viable audit trail” for deployment/compliance
If you try it and it breaks, tell me exactly where — I’m building this to survive professional scrutiny.
#Agents #LLMOps #MLOps #Reproducibility #Observability #Forensics #AISafety #Governance
— Liam (RFTSystems)