Firmwatch Data-Completeness Audit — 2026-04-24

Cross-reference of:

Live prod Neon (the only branch with data; staging/dev are CoW-empty for non-test rows)
Original v1 spec (specs/001-firmwatch-v1/spec.md REVISION 2, Paul-approved 2026-04-22 22:16 PDT)
Constitution + strata.config.md non-negotiables

A. What the database actually has

21 tables. Row counts:

Has data	Empty
`deals` (456)	`companies` (0)
`firms` (17)	`company_news` (0)
`thesis` (1)	`deal_email_provenance` (0)
`thesis_match_scores` (93)	`deal_score_explanations` (0)
`thesis_versions` (4)	`deal_source_provenance` (0)
`provider_usage` (10)	`digest_item_opened` (0) — NSM tracking
`sources` (66)	`digest_items` (0) — the actual digest
`schema_migrations` (28)	`email_inbound` (0) — newsletter ingest
	`firm_drift_centroids` (0)
	`partner_thumbs` (0)
	`render_tool_invocations` (0)
	`saved_dashboards` (0)
	`source_snapshots` (0)

B. Critical missing data — by category

B1. Deals — 88% null on the columns that actually matter

The 456 deal rows are headlines without payloads. Sample null rates from deals table:

Column	Null %	Why it matters
`firm_slug`	88%	Without this, deal isn't attached to a watched firm. 88% are just "there was an announcement somewhere."
`amount_usd`	86%	Spec calls for "raising $5-15M" filter as a thesis dimension. Can't filter on what's null.
`round_type`	86%	Spec wants Series A/B/Growth Equity etc. for stage match.
`sector`	86%	Spec's check-1 for thesis match. Can't bucket by sector if column is null.
`lead_investors`	90%	Whole point of competitive intel — who led? Can't co-investor analyze on nulls.
`company_id`	100%	The deals table doesn't link to companies (which is itself empty — see B2).
`canonical_id`	100%	The dedup column has zero call sites. Cross-source duplicates show as separate rows.
`enrichment_attempted`	only `false`	Enrichment pipeline (Sonnet extraction) never ran on these rows post-ingest.

What's actually populated: id, source_id, announced_at, created_at, source_url, via_scraper, raw_json. That's the headline + URL + scraper provenance — not the structured fields the thesis matcher needs.

B2. Companies — empty (0 rows)

The companies table has 18 columns including current_employee_count, twelve_months_growth_rate, current_employee_range, job_count, last_reported_revenue_usd, linkedin_url, sourcescrub_company_id. All these were specced. None are populated. Spec said:

"Sourcescrub: tomorrow → Phase 0.5 add-on"

Still tomorrow. Spec FIRM-N15 covers Sourcescrub credit budget; FIRM-313/314 cover the newsletter ingest. The companies-enrichment leg is not yet wired — even though the schema is ready.

B3. Newsletters — 0 rows in `email_inbound`

Spec said:

"Newsletters remain priority-1 in source fusion"

email_inbound table has 0 rows. That means ZERO newsletter-derived deals have been ingested. The Resend webhook decommissioning (FIRM-347) returned the system to a state where:

Resend inbound returns 410 Gone
Newsletter scraping (FIRM-N6/N7/N8) ships per spec 023 but appears not to have run end-to-end yet
The "first-class source" never actually fired

B4. Source provenance — 0 rows in `source_snapshots` + `deal_source_provenance`

Spec 023 (newsletter scraping-first) added source_snapshots for raw-content archive + deal_source_provenance to link deals to their source. Zero rows in either. Neither table is being written by current ingest paths. Migration 0027/0028 ran (verified in schema_migrations), so the schema is there — pollers just don't write to it.

B5. Engagement / NSM signals — all empty

partner_thumbs: 0 rows. Spec called this v1.5 NSM-driving feature.
digest_item_opened: 0 rows. The NSM (digest open rate per partner per week ≥ 60%) literally cannot be measured because the tracking table is empty. v1 spec said this is the project's North Star.
render_tool_invocations: 0 rows. Render tools don't log to it (they would, but they never fire client-side per FIRM-026 still-pending work).
saved_dashboards: 0 rows. Save-as-dashboard feature has UI but no one's saved one yet (also blocked on the broken render path).

B6. Drift signals — empty

firm_drift_centroids: 0 rows. v1 spec called for "thesis-drift detection" as one of 5 differentiators. The schema exists, the cron job (0 3 * in wrangler.toml) exists, but no firm has a centroid computed yet because deals are 86% null on sector (B1), and the centroid algorithm requires sector distribution.

B7. Digest — 0 rows in `digest_items`

The actual digest table is empty. The spec's #1 deliverable was "every morning a digest arrives." There's a daily cron (brief-digest), but the cron's output isn't being persisted to digest_items. Either the cron silently fails OR the persistence step was never wired.

C. Cross-reference vs original v1 spec deliverables

From REVISION 2 (Paul-approved), the user-observable behaviors:

v1 promise	Status	Evidence
Single named Blueprint thesis	✅ Working	`thesis` row 1, `thesis_versions` 4 versions
Daily digest delivers 5/5 business days	❌ FAILING	`digest_items` empty; can't measure SLA
3-5 thesis-filtered items with "why this matters" narrative	⚠️ Partial	`thesis_match_scores` has 93 rows but `why_narrative` IS NULL on most
Watchlist Home dense table with sparklines	⚠️ Partial	UI exists, but `firm_drift_centroids` empty so sparklines have no drift data
Drift indicators per firm	❌ Empty	0 centroid rows
Generative chart/table responses with citations	⚠️ Server-side ready, client-side blocked	Spec 026 in flight (PR #446)
Thumbs feedback per item	❌ Empty	`partner_thumbs` 0 rows
Newsletter as first-class source	❌ Empty	`email_inbound` 0 rows
Co-investor network analytics	❌ Schema not started	No `co_investors` or `network_edges` table
Per-firm thesis-match scoring	⚠️ Partial	93 scores but mostly score-only, no breakdown
18 firms watched	⚠️ Off — we have 17	spec's seed list is 17 (per MEMORY.md), but spec text says "18 firms"
Sourcescrub Data Connect API	❌ Not yet writing to companies	Phase 0.5 deferred
Axios Pro Rata, StrictlyVC, PitchBook newsletters	❌ Not yet ingested	spec 023 partially shipped
Gmail OAuth	❌ Not yet wired	superseded by Resend; Resend now decommissioned (FIRM-347) — circular

Constitution cross-check:

Principle I (env isolation): partial — Neon branches exist but only prod has data
Principle II (orch branches off dev): ✅ working
Principle III (Strata owns specs): ✅ working
Principle IV (data model = migrations not rewrites): ✅ all 28 migrations recorded
Principle V (Access binary): ✅ working
Principle VI (every agent run leaves a trail): ⚠️ render_tool_invocations + digest_items absent, breaks audit promise
Principle IX (tickets not weeks): ✅ working

D. Top-priority gaps (Paul-actionable)

In rough Build/Severity order:

🔴 P0 — Things that break the v1 promise

1. Deal enrichment is not running — 86% of deals are headline-only. The Sonnet extraction step + Apify retry path that was specced isn't producing structured fields. Without this, every other feature (scoring, digest, charts) is starving. 2. digest_items is empty — the v1 NSM (digest open rate) literally cannot be measured. Either cron fails silently or persistence step missing. 3. partner_thumbs is empty — feedback-loop NSM also cannot be measured.

🟠 P1 — Feature exists but data missing

4. firm_drift_centroids empty — drift glyph in Watchlist UI shows nothing because there's no centroid history. 5. why_narrative null on most thesis_match_scores — Sonnet narrative-generation is either off or its writes are dropped. 6. source_snapshots + deal_source_provenance empty — provenance/audit promise from spec 023 is not being kept.

🟡 P2 — Phase 0.5 add-ons that never landed

7. Companies table — 0 rows despite SourceScrub schema being fully ready 8. email_inbound — 0 rows — newsletter pipeline has no real ingestion (FIRM-347 decommissioned the Resend path) 9. Co-investor graph — schema not even started; spec 023 mentioned as differentiator but no co_investors / firm_relationships table

🟢 P3 — Schema gaps that haven't been filed yet

10. 18 vs 17 firms — minor; spec says "18 firms" but we have 17. Either add the 18th or update spec. 11. firms.team 100% null — spec said partner-team data should be ingested per firm. Not happening. 12. sources.url 100% null — sources table has 66 rows but no URLs on any of them. Probably fine for RSS-known names but breaks if we re-derive from URL.

E. Recommendation

Frame these as two specs:

Spec 027 — Deal-enrichment recovery (P0/P1): items 1, 2, 3, 5, 6 above. The schema is right; the writes aren't happening. This is mostly diagnose-and-wire-up work. ~6-10 tickets, fast track / Standard split.

Spec 028 — Phase 0.5 backlog (P2): items 4, 7, 8, 9 above. Bigger scope — actually building the SourceScrub→companies pipeline + reviving newsletter ingest + adding co-investor graph schema. Standard brief, 8-12 tickets.

P3 items are incremental; file as a someday-maybe.md follow-up.

Run them sequentially (027 first since the v1 promise is the bigger fire), or in parallel (different parts of the codebase) — Paul's call.

Generated 2026-04-24 by Morty · Source: ~/Documents/Mojo/Morty/briefs/firmwatch-data-completeness-audit-2026-04-24.md