Cross-reference of:
specs/001-firmwatch-v1/spec.md REVISION 2, Paul-approved 2026-04-22 22:16 PDT)21 tables. Row counts:
| Has data | Empty |
|---|---|
deals (456) | companies (0) |
firms (17) | company_news (0) |
thesis (1) | deal_email_provenance (0) |
thesis_match_scores (93) | deal_score_explanations (0) |
thesis_versions (4) | deal_source_provenance (0) |
provider_usage (10) | digest_item_opened (0) — NSM tracking |
sources (66) | digest_items (0) — the actual digest |
schema_migrations (28) | email_inbound (0) — newsletter ingest |
firm_drift_centroids (0) | |
partner_thumbs (0) | |
render_tool_invocations (0) | |
saved_dashboards (0) | |
source_snapshots (0) |
The 456 deal rows are headlines without payloads. Sample null rates from deals table:
| Column | Null % | Why it matters |
|---|---|---|
firm_slug | 88% | Without this, deal isn't attached to a watched firm. 88% are just "there was an announcement somewhere." |
amount_usd | 86% | Spec calls for "raising $5-15M" filter as a thesis dimension. Can't filter on what's null. |
round_type | 86% | Spec wants Series A/B/Growth Equity etc. for stage match. |
sector | 86% | Spec's check-1 for thesis match. Can't bucket by sector if column is null. |
lead_investors | 90% | Whole point of competitive intel — who led? Can't co-investor analyze on nulls. |
company_id | 100% | The deals table doesn't link to companies (which is itself empty — see B2). |
canonical_id | 100% | The dedup column has zero call sites. Cross-source duplicates show as separate rows. |
enrichment_attempted | only false | Enrichment pipeline (Sonnet extraction) never ran on these rows post-ingest. |
What's actually populated: id, source_id, announced_at, created_at, source_url, via_scraper, raw_json. That's the headline + URL + scraper provenance — not the structured fields the thesis matcher needs.
The companies table has 18 columns including current_employee_count, twelve_months_growth_rate, current_employee_range, job_count, last_reported_revenue_usd, linkedin_url, sourcescrub_company_id. All these were specced. None are populated. Spec said:
"Sourcescrub: tomorrow → Phase 0.5 add-on"
Still tomorrow. Spec FIRM-N15 covers Sourcescrub credit budget; FIRM-313/314 cover the newsletter ingest. The companies-enrichment leg is not yet wired — even though the schema is ready.
email_inboundSpec said:
"Newsletters remain priority-1 in source fusion"
email_inbound table has 0 rows. That means ZERO newsletter-derived deals have been ingested. The Resend webhook decommissioning (FIRM-347) returned the system to a state where:
source_snapshots + deal_source_provenance
Spec 023 (newsletter scraping-first) added source_snapshots for raw-content archive + deal_source_provenance to link deals to their source. Zero rows in either. Neither table is being written by current ingest paths. Migration 0027/0028 ran (verified in schema_migrations), so the schema is there — pollers just don't write to it.
partner_thumbs: 0 rows. Spec called this v1.5 NSM-driving feature.digest_item_opened: 0 rows. The NSM (digest open rate per partner per week ≥ 60%) literally cannot be measured because the tracking table is empty. v1 spec said this is the project's North Star.render_tool_invocations: 0 rows. Render tools don't log to it (they would, but they never fire client-side per FIRM-026 still-pending work).saved_dashboards: 0 rows. Save-as-dashboard feature has UI but no one's saved one yet (also blocked on the broken render path).
firm_drift_centroids: 0 rows. v1 spec called for "thesis-drift detection" as one of 5 differentiators. The schema exists, the cron job (0 3 * in wrangler.toml) exists, but no firm has a centroid computed yet because deals are 86% null on sector (B1), and the centroid algorithm requires sector distribution.
digest_items
The actual digest table is empty. The spec's #1 deliverable was "every morning a digest arrives." There's a daily cron (brief-digest), but the cron's output isn't being persisted to digest_items. Either the cron silently fails OR the persistence step was never wired.
From REVISION 2 (Paul-approved), the user-observable behaviors:
| v1 promise | Status | Evidence |
|---|---|---|
| Single named Blueprint thesis | ✅ Working | thesis row 1, thesis_versions 4 versions |
| Daily digest delivers 5/5 business days | ❌ FAILING | digest_items empty; can't measure SLA |
| 3-5 thesis-filtered items with "why this matters" narrative | ⚠️ Partial | thesis_match_scores has 93 rows but why_narrative IS NULL on most |
| Watchlist Home dense table with sparklines | ⚠️ Partial | UI exists, but firm_drift_centroids empty so sparklines have no drift data |
| Drift indicators per firm | ❌ Empty | 0 centroid rows |
| Generative chart/table responses with citations | ⚠️ Server-side ready, client-side blocked | Spec 026 in flight (PR #446) |
| Thumbs feedback per item | ❌ Empty | partner_thumbs 0 rows |
| Newsletter as first-class source | ❌ Empty | email_inbound 0 rows |
| Co-investor network analytics | ❌ Schema not started | No co_investors or network_edges table |
| Per-firm thesis-match scoring | ⚠️ Partial | 93 scores but mostly score-only, no breakdown |
| 18 firms watched | ⚠️ Off — we have 17 | spec's seed list is 17 (per MEMORY.md), but spec text says "18 firms" |
| Sourcescrub Data Connect API | ❌ Not yet writing to companies | Phase 0.5 deferred |
| Axios Pro Rata, StrictlyVC, PitchBook newsletters | ❌ Not yet ingested | spec 023 partially shipped |
| Gmail OAuth | ❌ Not yet wired | superseded by Resend; Resend now decommissioned (FIRM-347) — circular |
Constitution cross-check:
In rough Build/Severity order:
1. Deal enrichment is not running — 86% of deals are headline-only. The Sonnet extraction step + Apify retry path that was specced isn't producing structured fields. Without this, every other feature (scoring, digest, charts) is starving.
2. digest_items is empty — the v1 NSM (digest open rate) literally cannot be measured. Either cron fails silently or persistence step missing.
3. partner_thumbs is empty — feedback-loop NSM also cannot be measured.
4. firm_drift_centroids empty — drift glyph in Watchlist UI shows nothing because there's no centroid history.
5. why_narrative null on most thesis_match_scores — Sonnet narrative-generation is either off or its writes are dropped.
6. source_snapshots + deal_source_provenance empty — provenance/audit promise from spec 023 is not being kept.
7. Companies table — 0 rows despite SourceScrub schema being fully ready
8. email_inbound — 0 rows — newsletter pipeline has no real ingestion (FIRM-347 decommissioned the Resend path)
9. Co-investor graph — schema not even started; spec 023 mentioned as differentiator but no co_investors / firm_relationships table
10. 18 vs 17 firms — minor; spec says "18 firms" but we have 17. Either add the 18th or update spec.
11. firms.team 100% null — spec said partner-team data should be ingested per firm. Not happening.
12. sources.url 100% null — sources table has 66 rows but no URLs on any of them. Probably fine for RSS-known names but breaks if we re-derive from URL.
Frame these as two specs:
Spec 027 — Deal-enrichment recovery (P0/P1): items 1, 2, 3, 5, 6 above. The schema is right; the writes aren't happening. This is mostly diagnose-and-wire-up work. ~6-10 tickets, fast track / Standard split.
Spec 028 — Phase 0.5 backlog (P2): items 4, 7, 8, 9 above. Bigger scope — actually building the SourceScrub→companies pipeline + reviving newsletter ingest + adding co-investor graph schema. Standard brief, 8-12 tickets.
P3 items are incremental; file as a someday-maybe.md follow-up.
Run them sequentially (027 first since the v1 promise is the bigger fire), or in parallel (different parts of the codebase) — Paul's call.
~/Documents/Mojo/Morty/briefs/firmwatch-data-completeness-audit-2026-04-24.md