FinAI Build Building a Financial AI in Four Months
Building a financial recommendation system with Claude Code, on consumer hardware, as a three-person team.
[FinAI Build] Ep 1 — The Premise
What three people with one desktop GPU set out to replace — an 80-DAG ALS recommender that could not explain itself and could not scale past one task — and how that starting constraint shaped every later architectural choice.
[FinAI Build] Ep 2 — Organizing the AI Agents
Five phases, four tools — how Gemini, Claude Opus, Cursor, and Claude Code each took a specific slot (ideation / technical validation / scaffolding / implementation / paper writing). Why tool separation beat tool uniformity on both speed and quality.
[FinAI Build] Ep 3 — How We Adapted: Guardrails, Memory Bank, Contract Verification
The mechanisms that kept three parallel AI-agent teams from breaking at integration — the CLAUDE.md constitution, the migration from a manual memory bank to Claude Code auto-memory, and the interface-key diff check run after every parallel session.
[FinAI Build] Ep 4 — The Seven Experts: Importing Structural Isomorphism Across Eleven Disciplines
Why seven experts, why these seven. The cross-disciplinary scan with Gemini surfaced eleven fields; the feasibility review with Opus narrowed to DeepFM, Temporal Ensemble, HGCN, PersLay, Causal, LightGCN, and Optimal Transport.
[FinAI Build] Ep 5 — The Data Integrity Hunt
Before any architecture debate — three chained label-leakage detections, the deterministic-leakage rationale behind the 18→13 task reduction, and the self-replicating features that surfaced across synthetic-data iterations v2→v3→v4.
[FinAI Build] Ep 6 — The Bug That Overwhelmed All Architectural Decisions
For weeks sigmoid gating seemed to beat softmax. Fixing an uncertainty-weighting implementation bug flipped the result. A case study in how a training-environment bug contaminates architectural conclusions.
[FinAI Build] Ep 7 — Distillation and Serving: PLE → LightGBM → Lambda + 5 Bedrock Agents
Teacher is PLE, student is per-task LightGBM, serving is AWS Lambda. Why this combination, what happens when teacher-student fidelity fails, and the role division across the 5-agent Bedrock pipeline.
[FinAI Build] Ep 8 — Honest Negative Results and What Comes Next
Record from four months — how adaTT converged to a null effect at 13-task scale, why GradSurgery was rejected on VRAM overhead, Paper 3 WIP status, and real-data metrics pending after 2026-04-30. Why what did not work matters as much as what did.