MSBAi / K-ai / NanoClaw — Process Architecture Audit

Created: 2026-05-30 · Status: living document Scope: the processes that move information from stakeholders and Box into the knowledge base and back out as answers. Grounded in the nanoclaw-msbai source, the msba-online repo, and observed incidents. Companion: architecture-email-pipeline.md describes the message plumbing; this document audits the processes and governance on top of it (and flags where that doc is stale).


A. Topology (one sentence)

6 channels → one HTTP server (:3003, path-routed) + Telegram long-poll → NanoClaw orchestration (allowlist, queue, container) → per-group Docker agent (Claude Code) operating on a single shared msba-online git working tree → reply out + audit log + behaviour-sensor. Box feeds the tree via a launchd autosync (Vishal’s Mac); GitHub pushes feed the VPS checkout via a webhook.

Channels actually in use (as of 2026-05-30): Email (most stakeholders) · Telegram (Vishal; sometimes Ron) · Web Chat (a few). The three Teams/Copilot channels are built but dormant — this audit and its governance model are scoped to the three active channels; the dormant ones are not assumed live.


B. Process inventory

# Process Trigger Mechanism Enforcement tier Key gap
1 Ingestion + allowlist inbound msg webhook.ts path-routes; email/telegram allowlist checked before agent spawn; Box-notification short-circuit (da98a7b) ahead of the allowlist Code (hard) Allowlist read off the live working tree → race (patched 285d67d, not rooted)
2 Orchestration / container post-allowlist per-group container; spawn → reuse 30 min via IPC --resume → fresh Code All 5 channel groups mount the same repo rw → cross-group concurrent writers
3 Agent processing container msg Modes: Question (read-only) / Status-update (commit) / Outreach CLAUDE.md (soft) Mode classification + which file to cite is model judgment
4 KB write governance DECISIONS write kb-conflict-check skill (skeptical subagent → commit or route to kb-triage/); provenance line; lore:intentional override Skill gate (semi-hard) for DECISIONS; conventions for ACTION_ITEMS / OPEN_QUESTIONS Only DECISIONS gated; other files free-form
5 Source-of-truth + propagation SoT edit program/curriculum.md = authority; propagate Type-A downstream per reference/DEPENDENCY_GRAPH.md; guardrail: agent “NEVER edits curriculum.md → flag in reply” CLAUDE.md (soft) Guardrail leaks (F1); propagation is flag-and-pray (F4)
6 Box → KB sync launchd 4×/day (Mac) scripts/box-autosync.py: rclone read → openpyxl → regenerate courses/<code>/sync/*.md → commit-if-md-changed → push Deterministic script (hard) Mac-only; Course Map flag-only; rclone token expiry
7 github-push webhook GitHub push /api/github-push HMAC → git pull --ff-only + chown → synthesize audit entry for non-bot commits Code (hard) Pulls the live checkout (race)
8 Audit logging after container exits host-side append to discussions/audit-log/YYYY-MM-DD.md, 5-min commit batch Code (hard) Writes into the shared tree (race)
9 Behaviour sensor after each outbound Haiku checks reply vs curriculum.md + Confirmed DECISIONS.md + citation existence → [sensor: pass/flag/fail] in the audit log; FAIL → kb-triage/ Code (hard) Blind to sync/ rosters (F3)
10 Concurrency control per-group serialization (assumed, unverified — Phase 0 of the concurrency plan) unknown No mutation gate across writers

C. Cross-cutting findings

F1 — Soft guardrails leak; the source of truth has no hard gate

Every channel prompt says “NEVER modify program/CURRICULUM.md.” The git log says otherwise — msbai-bot@illinihunt.org (the container agent, “K-ai”) has edited it repeatedly (e.g. 7173ff0 STEM designation, 2026-05-30; 401ffb7 quantum name, 177cd2e catalog name). The guardrail lives only in CLAUDE.md — the weakest enforcement tier (deterministic hooks > Claude hooks > CLAUDE.md). There is no hard mechanism (pre-commit hook, read-only mount, author gate) stopping the container from writing the source of truth. Consequence: “who is allowed to change the source of truth, and how” has no reliable answer — both the intended path (flag → human applies) and the forbidden path (agent self-edits) are simultaneously live.

F2 — Source-of-truth fragmentation → wrong answers (the “8 recorded” incident, 2026-05-30)

The recording count for BADM 554 existed in ACTION_ITEMS.md, curriculum.md, the narrative course file, and the new sync/ roster — with no precedence. K-ai cited the stale ACTION_ITEMS.md “8 (as of 5/27)” over the fresh roster’s “13”. Patched (de-staled the item to defer to the roster; added a sync breadcrumb to all 5 channel prompts), but the pattern recurs anywhere a fact is duplicated. The autosync did not cause this — it exposed it by adding a fourth, fresher authority the others didn’t defer to.

F3 — The behaviour-sensor is blind to the freshest source of truth

behaviour-sensor.ts grounds on curriculum.md (primary) + Confirmed DECISIONS.md (recency override) + citation existence. It does not read courses/<code>/sync/*.md. So a correct answer sourced from today’s autosync’d roster gets a FLAG (“cannot verify”) at best, a FAIL (“contradicts stale curriculum”) at worst — observed on the BADM 554 question. The QA layer cannot see the data layer the autosync just made authoritative. Highest-value fix: add sync/ rosters to the sensor’s ground truth, with roster-match downgrading a curriculum contradiction to propagation-lag (same treatment DECISIONS already gets).

F4 — Propagation is flag-and-pray

No mechanism guarantees a flagged curriculum.md change is ever applied. Curriculum drifts → sensor (F3) false-flags correct replies → audit noise. Self-reinforcing with F1.

F5 — Shared mutable working tree (concurrency)

/root/repos/msba-online is read by the host, written rw by N per-channel containers, and git pull --ff-only-ed by the webhook — with no gate. The allowlist bug (285d67d) was one symptom; the race class is intact. The Codex design (nanoclaw-msbai/.claude/plans/repo-concurrency-decoupling.md) is sound but Phase 0 (verify group-queue.ts serialization) is not done — current write-race exposure is unknown. A live instance recurred during this audit (push rejected, 12 commits behind, rebase required).

Observed live + root-caused, 2026-05-30: the VPS msba-online checkout was found diverged — 2 local Audit log commits the host had written but never pushed to origin, and 3 behind. Root cause: audit-log.ts commitAuditLog() did a bare git push with no rebase-on-reject, so whenever origin moved (e.g. a push from the maintainer’s laptop) the audit commit stranded locally; the github-push webhook’s git pull --ff-only then kept failing silently and the VPS drifted (access control survived — the allowlist was already present — but the audit trail stopped reaching GitHub and K-ai went briefly stale on docs). Fixed (audit-log.ts): the push now git push || (git pull --rebase --autostash origin main && git push) || (git rebase --abort) — rebase-on-reject, conflict-safe, commit retained for retry. This is the cheap version of the concurrency plan’s “audit-log batch fails on divergence” failure mode; the durable fix remains Phase 1 (pinned-ref reads) + Phase 2 (out-of-tree audit spool).

F6 — The architecture doc is stale

architecture-email-pipeline.md lists the audit log + behaviour-sensor as “planned, not yet implemented” — both are live and central. Onboarding from that doc yields a wrong mental model.

F7 — Operational single points / expiring clocks


D. Prioritized recommendations

Pri Action Why Effort
P1 Add courses/<code>/sync/ rosters to the behaviour-sensor’s ground truth (F3) QA can’t see the authoritative data; false-flags correct answers Short
P1 Decide F1: hard-enforce the curriculum guardrail or relax it to match reality “who edits the SoT” is undefined Decision + Short
P2 Concurrency plan Phase 0 (verify group-queue.ts) → Phase 1 Unknown write-race exposure; reads still race Quick → Short
P2 SoT precedence policy: “for X, sync/ wins; ACTION_ITEMS/curriculum reference, never duplicate” (F2) Prevent the next 8-vs-13 Short (partly done)
P3 Refresh architecture-email-pipeline.md (audit log + sensor are live) (F6) Doc integrity Quick
P3 Calendar TEAMS_APP_SECRET 9/11 expiry; add Box-token + autosync health check (F7) Silent failures Quick

E. Synthesis

The system is well-built at the plumbing layer (ingestion, channels, webhook, sync) but has a governance/consistency gap at the knowledge layer: multiple un-prioritized sources of truth, a soft guardrail that doesn’t hold, and a QA sensor that can’t see the freshest data. F1, F2, F3 are one connected problem — there is no authoritative answer to “what is true, and who gets to change it.”


F. First-principles analysis from the system’s own history

(Evidence mined from git history + audit logs + kb-triage, 2026-05-30.)

One root fault, expressed five ways: there is no single, authoritative, concurrency-safe source of truth. program/curriculum.md is simultaneously (a) the sensor’s “primary ground truth,” (b) a file the agent is forbidden to edit, (c) a file the sensor’s own code calls known-stale by design (behaviour-sensor.ts:53: “CURRICULUM.md routinely lags a logged decision”), and (d) a file mutated by git pull on a live working tree with no lock. Every finding traces back to this.

F-hist-1 · The sensor fails almost as often as it passes. Across discussions/audit-log/*.md: ~113 pass / ~127 flag / ~146 fail. The dominant fail class is stale-source false-positives — the sensor checks against curriculum.md, which is stale by design. Example: Practicum drives 13 fail/flag lines where the agent correctly says “4 cr, confirmed” but curriculum.md still says Draft. “Cannot be verified against ground truth” appears ~24×. Course clusters in flags: Quantum 61, BADM 557 26, Practicum 24, FIN 550 23, BADM 554 22 — the two biggest (Quantum, Practicum) are driven by stale/contradictory source docs, not agent error.

F-hist-2 · The guardrail demonstrably does not hold — and is self-contradictory. git log program/curriculum.md = 23 human edits vs 7 bot edits (msbai-bot@illinihunt.org), every bot edit a guardrail violation. The smoking gun: 7173ff0 (2026-05-30, “Document STEM designation”) — commit message says it was made to “prevent future K-ai responses defaulting to ‘unconfirmed’… Source: sensor flag 2026-05-29.” The bot edited the file it’s forbidden to edit, specifically to satisfy the sensor. The system requires curriculum.md to be fresh (or the sensor false-flags) but forbids the only always-on actor from freshening it. The guardrail is aspirational; the bot violates it or the file rots.

F-hist-3 · “Single source of truth” has eroded into a drifting multi-file mirror.

F-hist-4 · kb-triage shows un-retired supersession chains. Of 28 triage files, 26 are sensor response-error dumps; the 2 genuine conflict files (2026-05-02-conflicts, 2026-05-19-quantum-names) show the same fact (Quantum names, prep deadlines) re-decided 3–4× with no mechanism to retire the superseded DECISIONS entry → recurring conflicts.

F-hist-5 · The concurrency root cause is fully present, not just patched. 285d67d only made the email allowlist reader tolerate a failed read. webhook.ts:118-135 (pullRepo) still runs git pull --ff-only + chown -R on the live tree every push; grep of src/ for flock|mutex|lockfile|worktree|mirrorzero hits. The sensor, Telegram allowlist, and container reads retain the identical race. The Phase-1 fix is designed but unimplemented.


G. External best practices (researched + mapped, with citations)

G1 · Single source of truth (docs-as-code). Store each fact once; reference/transclude, never copy — copies are what drift (DRY: “one unambiguous authoritative representation”). Plain markdown has no live transclusion, so back it with a CI consistency check that greps duplicated facts and fails when copies disagree — turning drift into a build failure. — DRY; Transclusion/SSOT, Paligo SSOT. Map: make curriculum.md canonical per fact; other docs link, don’t restate; add a grep-test for the handful of duplicated numbers (credits/months/courses).

G2 · Grounding the QA validator. Staleness is the silent failure mode — a confident answer off an old snapshot with no signal. Fixes: incremental re-index on change (content-hash change detector); freshness-aware precedence (rank on relevance + recency so the newest authoritative file wins); groundedness judging sampled against a ground-truth set. — RAG freshness rot; Solving Freshness in RAG (arXiv 2509.19376); Databricks groundedness judge. Map: the sensor must read the same committed ref the agent did, and treat sync/ rosters + recent DECISIONS as higher-precedence than curriculum.md — directly fixes F3. At a few msgs/day, a content-hash check + pinned-ref read suffices; a vector RAG stack is over-engineering.

G3 · Generated vs human-authored content. Mark generated files with a DO NOT EDIT — generated by … header; make generators deterministic (no timestamps, stable ordering) so regen doesn’t churn; golden-test committed-vs-generated in CI = “commit only on real change.” — Go “DO NOT EDIT” convention; deterministic codegen + golden tests. Map: box-autosync.py already commits-only-on-change and stamps “auto-generated”; add an explicit DO-NOT-EDIT header to sync/*.md and keep generation deterministic.

G4 · Concurrency on a shared git tree. Git is a single-process tool; concurrent ops on one object DB/index can corrupt it, and a stale index.lock freezes everything. So: serialize writers; read immutable committed blobs via git show <ref>:<path> / cat-file (unaffected by an in-flight pull); keep an append-only event spool out of the contended tree. — git single-process lock contention; git-cat-file, Pro Git: immutable objects. Map: this is exactly the repo-concurrency-decoupling.md plan — validates Phase 0 (serialize) + Phase 1 (pinned-ref reads) + Phase 2 (out-of-tree audit spool).

G5 · Protecting a SoT file from an autonomous agent. Prompt/markdown guardrails are the weakest layer. Defense-in-depth, strongest first: read-only mount / file perms (agent physically can’t write) → deterministic pre-commit/PreToolUse hook (fires every time, can’t be --no-verify-bypassed, agent can’t edit the hook) → CI/branch protection server-side → CODEOWNERS on the protected file and the guardrail files. Governance rule: if the agent can edit the rules that evaluate it, you have a governance problem.Making it hard to cheat the guardrails; Claude Code hooks. Map: the “flag, don’t edit” policy is the right soft layer; back it with a read-only mount of curriculum.md into the container (highest-leverage single control at this scale). Full CODEOWNERS+CI is enterprise-grade — likely over-engineering here.


H. Path forward

(Sequence below is the Codex-Architect-reviewed revision, 2026-05-30. Codex verdict: REVISE — the directional reframe “fix the sensor before hard-locking curriculum.md” is right, but two claims were corrected; see notes.)

The reframe still holds: the sensor’s stale ground truth (F3) is what makes the guardrail unenforceable (F1) — the bot edits curriculum.md only because leaving it stale false-flags the sensor. But two corrections from the Codex review:

Sequenced accordingly (separating what is authoritative / how reads are stable / how writes are serialized):

  1. Define fact ownership by category (Quick, F2). One short table: which file owns which fact class — curriculum.md (program policy), DECISIONS.md (confirmed decisions), sync/ rosters (production facts), ACTION_ITEMS.md (open work, no hard facts). This is the cheap policy fix that prevents the next 8-vs-13 and feeds steps 3-4.
  2. Pinned committed reads for host-side readers (Short, F5/G4). Serve the sensor + allowlists from git show <ref>:<path> (last-good SHA on fetch failure). Kills the read-race class (incl. the allowlist bug at root); retires the reconcileAllowlistReload band-aid.
  3. Sensor reads sync/ rosters with field-specific precedence (Short, P1, F3/G2). Per the ownership map from step 1. Regression-test against the BADM 554 “13 recorded” case (today’s false-flag must turn green).
  4. Hard-protect curriculum.md (Quick/Short, P1, F1/G5) — AFTER step 3’s false positives pass. Read-only mount into the container (or a pre-commit path block). Safe only once the sensor no longer needs the bot to keep curriculum fresh.
  5. Clean obvious drift (Quick, F3/G1). Delete the untracked program/CURRICULUM.md duplicate; fix the in-file practicum Weeks contradiction; finish propagating the 15-month figure; add DO-NOT-EDIT headers to sync/*.md; add a CI grep-test for the few cross-file duplicated numbers.
  6. Write gate / audit spool (Medium) — only if write collisions actually manifest. Do not build first (concurrency plan Phase 2). Verify group-queue.ts serialization (Phase 0) before assuming you need it.

The leverage is steps 1→3: a category ownership map + the sensor reading the right source per field. That dissolves the F1/F3 conflict and stops the false-flags — without new infrastructure. Steps 4-6 are downstream and partly conditional.


I. KB / memory options (does NanoClaw offer an alternative substrate?)

(From the 2026-05-30 ecosystem exploration. Verdict: keep-and-augment — no off-the-shelf plugin fits.)

The NanoClaw framework ships no RAG / vector / memory infrastructure — its memory model is git-markdown-per-agent-group, which is exactly what this system runs. Options surveyed:

Option What it is Fit
add-karpathy-llm-wiki (in-repo, uninstalled) 3-layer wiki: immutable raw sources / LLM-maintained interlinked markdown (index.md + append-only log.md) / a CLAUDE.md “schema” making the agent a disciplined maintainer. Ops: ingest / query / lint. Optional qmd hybrid-search CLI only at scale. Best fit — partial-adopt the discipline (index + log + lint + synthesis), not a full separate wiki layer. Its lint op directly attacks F2/F3/F4 (contradictions, stale claims, orphan pages); index.md is a lightweight retrieval layer; synthesis-into-pages fights fragmentation.
In-repo skills knowledge-builder / kb-conflict-check / quality-review / self-eval Homegrown governance pipeline on git-markdown. kb-conflict-check is keyword-only over one file (DECISIONS.md). Already in use; the keyword-only single-file scope is itself part of the F3 root. Strengthen these rather than replace.
claude-mem (installed) Real hybrid search (Chroma + SQLite FTS5) — but over Claude Code dev-session transcripts, not the curriculum repo; the container can’t reach it at answer time. Wrong corpus. Keep for dev recall only (per the memory-architecture rules), never the curriculum KB.
add-mnemon (upstream) Graph memory in-container, recall-before/write-after hooks (Claude-Code-only). Conversational, per-group-local — not a curriculum KB; doesn’t unify source-of-truth.

Recommendation (Codex + exploration agree): keep git-markdown; partial-adopt the Karpathy index+log+lint+synthesis discipline by strengthening knowledge-builder + kb-conflict-check (let knowledge-builder maintain synthesis pages; add a lightweight lint pass). Reserve qmd local hybrid search as the retrieval/freshness upgrade for the behaviour-sensor (G2) only when keyword overlap visibly stops sufficing. A vector DB / dedicated service is over-engineering at a few messages/day — the scale needs cleaner ownership rules, not new infrastructure.


J. Authority & gatekeeping model (partially shipped — concrete F1 resolution)

(Grounded in mined usage, 2026-05-30: DECISIONS.md provenance + audit-log inbound volume, cross-checked against the stated role split.)

Status — shipped 2026-05-30: the access-tier model’s first phase is live. (1) The allowlist was narrowed to the 13-person pilot group (11 read-write + Ocasio/Love read-only); the other 13 were off-boarded. (2) Blocked @illinois.edu senders now receive a one-time courteous “limited pilot” reply instead of a silent drop (nanoclaw c04ebb1, codex-reviewed; verified end-to-end). (3) Observer read-only (Ocasio/Love) is enforced softly via the email-group prompt. Still future: hard read-only enforcement (the Access-column write-block), and the J.3 two-stage PR gate on the source of truth.

Strategic context. MSBAi’s coordination through K-ai is a living embodiment of the Gies AI governance/orchestration strategy and is under active research observation (W. Ocasio, G. Love are study observers). The gatekeeping model below is therefore not just an ops control — it is itself a study artifact, which raises the bar on getting authority, provenance, and auditability right.

Canonical roster, titles, access tiers, roles, and gate routing now live in program/EMAIL_ALLOWLIST.md (single source of truth — 2026-05-30 consolidation). This section keeps only the rationale, evidence, and mechanism; it does not restate the per-person data.

J.1 Access tiers (definitions)

Senior staff with no defined K-ai role are simply off the pilot allowlist; if they email from @illinois.edu they get the one-time limited-pilot reply (J status block).

J.2 Authority map — rationale (the canonical per-person mapping is the allowlist Role / domain column)

Why the domain owners are who they are, from mined usage:

J.3 Two-stage gate (maker-checker on the source of truth)

(Routing table is canonical in EMAIL_ALLOWLIST.md → “Gatekeeping routing”; mechanism + rationale below.)

J.4 Why this resolves F1 (supersedes the read-only-mount in H step 4)

A read-only mount merely blocks the bot. The gate is better: the bot (and any human) can still propose good SoT changes — e.g. the 7173ff0 STEM-designation edit — but nothing lands without a domain owner’s merge. It fixes the human accidental-write hole and the bot guardrail-leak with one deterministic, tracked control, while preserving the agent’s usefulness.

J.5 Open items