# AIQ v2 — Methodology, Regrade-Stability, and Flexible Capstone Spec

**Version:** 2.0.0
**Status:** Canonical. Supersedes PRD §10 (4-lens AJIR taxonomy).
**Owners:** LWL Platform team. Review cadence: quarterly.
**Audience:** Engineers building `/aiq`, council scoring functions, capstone tooling, and educator dashboards. Also readable by sponsors and educators — write accordingly.

---

## 1. Why AIQ exists

In the AI era, output is cheap. Anyone can ship a polished prototype in a weekend. What's scarce — and what schools, sponsors, and employers actually need to identify — is **judgment**: the ability to choose the right problem, use AI with discernment, ship something real, and reflect honestly on what happened. AIQ is a portable, defensible index of that judgment, scored against artifacts (not opinions), versioned (so it's stable over time), and earned project-by-project (so it can't be faked with a single lucky build). It exists so a Year 11 student in a UAE classroom and a self-taught maker in Lagos can both walk into the same sponsor conversation with comparable, trusted evidence of who they are as builders.

---

## 2. The 5 Dimensions

Every project is scored on five dimensions. Each dimension is scored 0–20. Total raw score is 0–100.

For each dimension below: **definition → 4 anchor levels → council evidence → common failure modes**.

### 2.1 Artifact Integrity (0–20)

**Does it exist? Does it run? Is it complete?**
Not a deck. Not a wireframe. A thing that works for a real user.

| Range | Anchor |
|---|---|
| 0–5 | Idea-only. Slides, mood boards, or a Figma file. No code, no link, no demo. |
| 6–10 | Prototype with significant gaps. Happy-path only, breaks on edge cases, missing core flow. Reviewer can see the intent but not actually use it. |
| 11–15 | Working build with rough edges. End-to-end flow completes. Visible bugs or unfinished states, but the core promise of the product is reachable. |
| 16–20 | Shipped. Public URL or installable build. Survives a stranger trying to break it. Includes README, deploy, and at least one real user other than the builder. |

**Council evidence (Tech voice leads):** live URL crawl, browser console errors, Lighthouse score, README completeness, commit history vs. claimed scope, presence of error states and empty states.

**Failure modes:** Submitting Figma as "the product." Demo video instead of a link. "It works on my machine." Skeleton UI passed off as a build.

### 2.2 Problem Precision (0–20)

**One user. One pain. Named specifically.**

| Range | Anchor |
|---|---|
| 0–5 | Vague user, vague pain. *"Students struggle with motivation."* No evidence the builder talked to anyone. |
| 6–10 | A user segment named, but pain is generic. *"University students need better study tools."* |
| 11–15 | Specific user, specific pain, some research. *"First-year engineering students at IIT Madras forget assignment deadlines because announcements are spread across WhatsApp groups."* Includes ≥3 user conversations or a clear personal anecdote. |
| 16–20 | Surgical. *"Year 11 IGCSE students in UAE schools lose track of past-paper attempts and can't identify their weak topics."* Quantified frequency, named context, observed (not hypothesized) friction. The builder has shadowed, interviewed, or *is* the user. |

**Council evidence (Strategy + Business voices):** problem statement specificity, user-research artifacts, presence of named user vs. abstract demographic, evidence in the artifact that the problem informed the build (not retrofitted to it).

**Failure modes:** "Everyone struggles with X." Demographic without context. Retrofitted problem statement that doesn't match what the build actually does.

### 2.3 AI Judgment (0–20)

**Did they use AI with intention — or just prompt and paste?**
This is the new literacy. Anyone can generate. The score is for *discernment*.

| Range | Anchor |
|---|---|
| 0–5 | No reflection on AI use at all. Either claims "no AI used" implausibly, or pasted output without inspection. Prompts not shared. |
| 6–10 | Used AI but can't articulate where it helped vs. hurt. Mentions tools used but no decisions described. |
| 11–15 | Can explain at least 2 specific AI decisions: what was generated, what was edited, what was rejected. Caught at least one AI error. |
| 16–20 | Clear evidence of judgment, not just generation. Can articulate: what AI did, what they decided, where AI was *deliberately not used* and why, how they verified output. Shows prompt iteration log or rationale. |

**Council evidence (Tech + Strategy voices):** AI-use disclosure section in submission, prompt log if shared, explicit "I overrode the AI here because..." moments in the reflection, evidence of fact-checking AI output, evidence of choosing a non-AI path when appropriate.

**Failure modes:** Pretending no AI was used. "I used ChatGPT" with no detail. Hallucinated facts in the artifact that the builder didn't catch. AI-generated reflections about AI use (meta-irony, scored 0).

### 2.4 Reflective Depth (0–20)

**Do they know what they learned?**
The reflection separates a builder from someone who got lucky once. Three reflection prompts, each worth ~7 points (rounded to 20):

- What didn't work and why?
- What would you do differently?
- What surprised you?

| Range | Anchor |
|---|---|
| 0–5 | "I learned a lot." Generic positive. No specifics, no discomfort, no concrete second-attempt plan. |
| 6–10 | One specific lesson identified, but stays surface-level. Can name *what* failed but not *why*. |
| 11–15 | Two of three prompts answered with specifics. Names a real wrong turn. Has at least one concrete change for next time. |
| 16–20 | All three prompts answered with discomfort and specificity. Identifies a wrong assumption (not just a tactical mistake). Surprise is genuine, not performative. The reflection makes the reader trust the builder more, not less. |

**Council evidence (all 5 voices weigh in — this is the universal dimension):** specificity of named failures, presence of "I was wrong about X" moments, whether the surprise is about the *user* or about the *self*, whether the "differently" answer is concrete enough to be testable.

**Failure modes:** "I learned to manage my time better." Performative humility ("everything was hard"). LLM-generated reflection (detectable by uniform sentence length and absence of named specifics).

### 2.5 Original Insight (0–20)

**Is the thinking theirs?**
The code might be AI. The design might be Tailwind. But the *angle* — the specific user, the non-obvious problem, the unexpected connection — that has to be human.

| Range | Anchor |
|---|---|
| 0–5 | Tutorial clone. Idea pulled from a "30 startup ideas" list. Could be anyone's project. |
| 6–10 | Familiar problem, slight twist. "To-do app but for students." Twist isn't load-bearing. |
| 11–15 | Real observation underneath. The build addresses something the builder noticed in their own life that others would have missed. |
| 16–20 | This project would not exist without *this* builder's specific experience, context, or angle. Non-obvious framing. The kind of project that, when sponsors see it, they want to know who made it. |

**Council evidence (Strategy + Impact voices):** uniqueness of problem framing, presence of personal context that informs the angle, comparison to obvious-default version of the same idea (would 9/10 builders have made the same choice?), whether the project earns its existence.

**Failure modes:** Hackathon-template ideas (AI tutor, habit tracker, journal app). Trending-tech-driven ("I wanted to use vector DBs"). Solving a problem the builder doesn't actually have or understand.

---

## 3. Score Bands

Raw score is 0–100. Public display is 200–1000 (rescaled — see §6). Bands are identical thresholds in both scales.

| 0–100 | 200–1000 | Band | What it means | What unlocks |
|---|---|---|---|---|
| 0–40 | 200–520 | **Explorer** | Still learning to ship. | Profile is private by default; can opt in to public Vault. Not yet sponsor-visible. |
| 41–60 | 528–680 | **Builder** | Shipping, not yet consistent. | Public profile + Vault. Eligible for cohort showcases. Bounty waitlist. |
| 61–79 | 688–832 | **Maker** | Verified builder, sponsor-visible. | Appears in sponsor-search; eligible for paid bounties up to $250; unlocks Maker badge. |
| 80–89 | 840–912 | **Architect** | Bounty-eligible, fellowship-ready. | All bounties; fellowship application unlocked; Demo Day finalist track; mentor-pair invitation. |
| 90–100 | 920–1000 | **Founder-grade** | Top of class, direct intro to partners. | Direct partner intros; seed-grant eligibility; Champion-tier Pitch Night seeding; co-marketing slots. |

**Band display rules:**

- The **band name** is shown wherever the number is shown. Number-only is forbidden — context matters more than score.
- Bands are **sticky** in one direction: a student promoted to a band stays in that band for ≥30 days even if their rolling score dips, to prevent churn-anxiety.
- Demotions from Architect → Maker (the most consequential transition) require ≥45 days below threshold and trigger a "what changed" notification with appeal CTA.

---

## 4. The Council Model

Every project is scored by **3 of 5 voices**, selected from the project's declared type. Each voice scores only the dimensions where it has authority. The AI council runs all five voices initially; human mentors can override any score with a written note.

### 4.1 The Five Voices

| Voice | Scores | Focus |
|---|---|---|
| **Tech** | Artifact Integrity, AI Judgment | Does it run, does it scale, was AI used well |
| **Strategy** | Problem Precision, AI Judgment, Original Insight | Is the angle real, is the user real |
| **Business** | Problem Precision, Reflective Depth | Could this be a business, did they learn the right things |
| **Impact** | Original Insight, Reflective Depth | Does it matter, does it move the world |
| **Design** | Artifact Integrity, Original Insight | Is it considered, is it crafted |

Every dimension is scored by **2–3 voices**; final dimension score is the **median** across voices that scored it. (Median, not mean, to dampen one outlier voice.)

### 4.2 Voice → Dimension Matrix

| | Artifact Integrity | Problem Precision | AI Judgment | Reflective Depth | Original Insight |
|---|---|---|---|---|---|
| Tech | ✓ | | ✓ | | |
| Strategy | | ✓ | ✓ | | ✓ |
| Business | | ✓ | | ✓ | |
| Impact | | | | ✓ | ✓ |
| Design | ✓ | | | | ✓ |
| **Voices per dim** | 2 | 2 | 2 | 2 | 3 |

### 4.3 Council Selection by Project Type

The builder declares one or more project types at submission. The council voices are selected by the rules below.

| Declared type | Voices selected (3) |
|---|---|
| Tech tool / SaaS | Tech, Strategy, Design |
| Business idea | Business, Strategy, Impact |
| Social impact | Impact, Strategy, Business |
| Design system / brand | Design, Tech, Strategy |
| Research / report | Strategy, Impact, Business |
| Hybrid (>1 declared) | Union of the above, capped at 4; if >3, drop the voice with the lowest dim coverage for the declared union |
| None declared | Default: Tech, Strategy, Design |

**Override:** mentors and educators can manually swap voices with a logged reason. Sponsors cannot.

### 4.4 Human Override Protocol

Any council score can be overridden by a verified mentor. Rules:

- Override requires a **written note** ≥80 characters, visible to the builder.
- Override is **recorded as a separate row** — original AI score is preserved for audit.
- Override magnitude > 5 points on any single dim auto-flags for senior-mentor review (prevents grade inflation/deflation).
- Override is **attributed** publicly: profile shows "AIQ 724 (mentor-reviewed)" with the mentor's display name on hover.
- Builders can request a re-grade from a different mentor once per override.

---

## 5. Program Track Weights

Different LWL programs weight the same 5 dimensions differently. The dim raw score is multiplied by the track weight; weighted total is then renormalized to 0–100.

| Program | Artifact Integrity | Problem Precision | AI Judgment | Reflective Depth | Original Insight |
|---|---|---|---|---|---|
| **Default (unaffiliated)** | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| **TEO / NBL** (tech entrepreneurship) | 1.0 | 1.2 | 1.3 | 0.8 | 0.7 |
| **Edge Club** (process-focused) | 0.7 | 1.0 | 1.0 | 1.4 | 1.3 |
| **FDF** (founder dev fellowship) | 1.0 | 1.4 | 0.7 | 1.3 | 0.9 |
| **WLA** (writers, leaders, artists) | 0.6 | 1.0 | 0.8 | 1.4 | 1.5 |

**Renormalization:** `score100 = round( (Σ dim_i × weight_i) / (Σ weight_i × 20 / 100) )` — so a perfect-by-track-criteria submission still yields exactly 100.

**Display rule:** Wherever the score is shown for a program-affiliated submission, the program name and "weighted" tag appear next to the score: `AIQ 87 — Founder-grade (FDF-weighted)`.

---

## 6. Display math

```
Inputs:
  d = [aIntegrity, pPrecision, aiJudgment, rDepth, oInsight]   // each 0..20
  w = trackWeights                                              // five floats

Computation:
  weighted_sum = Σ (d[i] × w[i])
  max_possible = Σ (20 × w[i])
  score100     = round( weighted_sum × 100 / max_possible )    // 0..100
  publicScore  = round( 200 + score100 × 8 )                   // 200..1000

Public band:
  if score100 < 41 → "Explorer"
  elif < 61        → "Builder"
  elif < 80        → "Maker"
  elif < 90        → "Architect"
  else             → "Founder-grade"
```

### 4-lens projection (for the existing radar UI)

The existing radar/sparkline components render four lenses. Until the radar is rewritten to show the 5 dims natively, project the 5 dims onto 4 lenses with a fixed matrix. Each lens output is 0–100 (multiply intermediate by 5).

```
Design   = (0.5·OriginalInsight + 0.3·ProblemPrecision + 0.2·ArtifactIntegrity) × 5
Business = (0.6·ProblemPrecision + 0.4·OriginalInsight)                          × 5
Tech     = (0.6·ArtifactIntegrity + 0.4·AIJudgment)                              × 5
Content  = (0.5·ReflectiveDepth + 0.3·OriginalInsight + 0.2·AIJudgment)          × 5
```

These coefficients are illustrative narratives, not statistically derived. They will be revisited after the first 500 v2-rated projects to check whether the projection feels true. **Reflective Depth deliberately does not feed Tech or Business** — it lives almost entirely in Content/Brand because reflection is ultimately a communication artifact.

### Implementation note

`computeAIQ` in `src/lib/aiq.ts` will be migrated to read `dimScores: { aIntegrity, pPrecision, aiJudgment, rDepth, oInsight }` per project. Critique records gain a `dim_scores` jsonb column alongside the existing lens scores. Old (lens-only) critiques continue to render via a backfill: `aIntegrity = (Tech×0.7 + Design×0.3) / 5`, etc. — exact backfill matrix lives in `aiq.ts` JSDoc.

---

## 7. Regrade-Stability Rules — The Trust Contract

A score people can defend is more valuable than a score that's slightly more accurate. These rules trade some accuracy for stability.

### 7.1 Idempotency

The same artifact, submitted under the same `AIQ_RUBRIC_VERSION`, scored by the same council voice composition, must yield the same score within **±2 points** on any dim. This is enforced by:

- Council prompts are versioned and checked into git.
- LLM `temperature = 0.2` for scoring calls (deterministic-ish, not zero, to allow modest variance).
- Each scoring run records prompt hash, model, model version, and a content hash of the artifact inputs.

### 7.2 Versioning

- Every score record stamps `rubric_version` (e.g. `2.0.0`).
- Rubric changes (anchor edits, weight changes, council recomposition) **bump the version**.
- Existing scores are **never** silently re-rated when the rubric changes. Re-grade is **opt-in** by the builder, and the original score is preserved as a separate row.
- The public profile shows scores under the version they were rated under, with a small "rated under v2.0" tag on hover.

### 7.3 Confidence intervals

The council emits a per-dim score **and** a confidence value `0.0–1.0`. Low-confidence dims (`< 0.6`) are flagged for human review **before** the score is published. The student sees "AIQ pending mentor review" instead of a partial score. SLA: 48h human-review turnaround.

### 7.4 Re-grade cool-down

A single project can be re-graded at most **once per 14 days** unless a mentor unlocks the cool-down. This prevents score-farming (resubmitting until favorable variance hits).

### 7.5 Drift monitoring

A monthly job samples 50 historical projects across all bands and re-runs the council. If mean drift exceeds **0.5σ** in any direction, an alert fires and the rubric/version is reviewed before further publishing. Drift logs are public on the methodology page.

### 7.6 Trailing average (the "rolling AIQ")

The number shown on the public profile is **not** the most recent project's score — it's a **weighted trailing average over the last 90 days**:

```
rollingAIQ = Σ (projectScore_i × recencyWeight_i × magnitudeWeight_i) / Σ weights

recencyWeight  = exp(-daysAgo / 30)         // half-life ~21 days
magnitudeWeight = clamp(projectAmbition, 0.5, 1.5)   // mentor-set per project
```

- Brand-new students with <3 scored projects show "AIQ forming — N projects scored" instead of a number.
- One bad grade can drop rolling AIQ at most **30 points** in a single update (clamp).
- Rolling AIQ recomputes nightly + on every new score insert (write-through, see `saveCritique`).

### 7.7 Appeals

See §10. An appeal pauses public-display of the disputed score until resolution.

---

## 8. Capstone Rubrics — Flexible, School-Owned

Schools and educators bring their own rubrics. Platform provides templates and tooling; schools own the criteria. Both the school grade and the AIQ score are computed and shown — they don't compete.

### 8.1 Data model (target shape for M3 migration)

```sql
-- Reusable rubric, owned by an org (school, cohort, or LWL central)
create table public.rubrics (
  id uuid primary key default gen_random_uuid(),
  owner_org_id uuid references public.orgs(id) on delete cascade,
  name text not null,
  description text,
  scale_min int not null default 0,
  scale_max int not null default 100,
  version int not null default 1,
  is_template boolean not null default false,        -- LWL-provided defaults
  is_locked boolean not null default false,          -- locked once first submission graded
  created_at timestamptz default now(),
  updated_at timestamptz default now()
);

create table public.rubric_criteria (
  id uuid primary key default gen_random_uuid(),
  rubric_id uuid references public.rubrics(id) on delete cascade not null,
  label text not null,
  description text,
  weight numeric not null default 1.0,
  -- Anchors: array of { score: int, label: text, description: text }
  anchors jsonb not null default '[]'::jsonb,
  position int not null,                              -- display order
  -- v2: maps_to_aiq_dim text                         -- optional dim mapping
  created_at timestamptz default now()
);

-- Capstone submission gets BOTH a school score and an AIQ score
alter table public.capstone_submissions
  add column school_rubric_id uuid references public.rubrics(id),
  add column school_score jsonb,                      -- { criteria: [{id, score, note}], total: number }
  add column aiq_dim_scores jsonb;                    -- { aIntegrity, pPrecision, ... }
```

RLS:
- `rubrics`: read by anyone in the same org; write by org admins; LWL templates readable by all.
- `rubric_criteria`: same as parent rubric.
- `capstone_submissions.school_score`: read by submitter, mentors in same org, and educators who own the rubric; write by educators only.
- `capstone_submissions.aiq_dim_scores`: read by submitter and (if `is_public`) public; write by council functions only.

### 8.2 Educator UX (M3 build)

1. **Clone or build:** "Start from a template" (6 LWL defaults — see §8.4) or "Build from scratch."
2. **Criterion editor:** label, description, weight, optional anchor levels (2–6 levels). Anchor labels render as a Likert-style grading widget for the educator at submission review.
3. **Preview as student:** shows the rubric exactly as the student will see it on the submission page.
4. **Lock on first submission graded:** prevents mid-cohort criteria changes that would invalidate prior grades. To edit after lock, educator clones into v2.
5. **Reusable across cohorts:** a single rubric can be assigned to multiple capstone instances.

### 8.3 Student-facing display

On the capstone submission page, two cards side-by-side:

```
┌──────────────────────┐  ┌──────────────────────┐
│  School Grade        │  │  AIQ Dim Scores      │
│  84 / 100            │  │  87 / 100  Architect │
│  (Distinction)       │  │                      │
│  by Ms. Chen         │  │  AI 18 · Prob 19 ... │
│                      │  │  Contributes to      │
│  Criteria breakdown… │  │  rolling AIQ +0.4    │
└──────────────────────┘  └──────────────────────┘
```

- School grade appears on transcript + certificate.
- AIQ dim scores feed `rollingAIQ` via the same council pipeline as any other project.
- The two never average together. They are **independent signals**.

### 8.4 LWL Default Rubric Library (templates)

Six starter templates ship with the platform. Educators can clone and modify.

| Template | Criteria (count) | Best for |
|---|---|---|
| Design Capstone | Concept, Craft, Coherence, Critique-readiness, Presentation (5) | Visual/UX projects |
| Business Capstone | Problem, Market, Model, Traction, Pitch (5) | Founder-track capstones |
| Tech Capstone | Working Build, Code Quality, AI Use, Deployment, Documentation (5) | Engineering-heavy |
| Social Impact Capstone | Beneficiary, Theory of Change, Evidence, Scalability, Reflection (5) | Community projects |
| Research Capstone | Question, Method, Data, Analysis, Write-up (5) | Academic / report-style |
| Build-in-Public Capstone | Cadence, Vulnerability, Community, Iteration, Outcome (5) | Process-focused (Edge Club) |

Each template ships with anchor descriptions for every criterion at every score band, written by LWL faculty.

### 8.5 v2 — Mappable rubrics (deferred, noted in §12)

In v2, each `rubric_criteria` row gets an optional `maps_to_aiq_dim` column. When set, school grading on that criterion contributes proportionally to the AIQ dim. This collapses the dual-track into a single scored event but requires careful UX so educators understand they're influencing a portable score, not just their own grade.

---

## 9. Full Skill Mapping

Every project is tagged with 1–6 skills from a master taxonomy. Skills are the bridge between AIQ (a number) and a recruiter-readable narrative ("strong on AI fluency and product thinking").

### 9.1 Master skill taxonomy (~40 skills, 6 families)

| Family | Skills |
|---|---|
| **Build / Engineering** | Frontend Implementation, Backend Implementation, Database Design, API Design, Deployment & DevOps, Performance Optimization, Debugging |
| **Product Thinking** | User Research, Problem Framing, Feature Scoping, Roadmapping, Trade-off Reasoning, Metrics Definition, Prioritization |
| **AI Fluency** | Prompt Engineering, Model Selection, Output Verification, Tool Chaining, AI Ethics in Practice, Cost & Latency Reasoning, Eval Design |
| **Communication** | Written Reflection, Pitch Delivery, Demo Recording, Documentation, Storytelling, Visual Explanation |
| **Research** | Literature Review, User Interviewing, Data Analysis, Market Sizing, Competitive Analysis, Hypothesis Design |
| **Collaboration** | Async Coordination, Code Review, Mentor-feedback Integration, Cohort Engagement, Conflict Navigation, Open-source Contribution |

### 9.2 Skill → AIQ Dim Contribution

Every skill primarily contributes to 1–2 dims. The contribution table feeds the auto-tagging suggester and the "skills that grew this score" UI on the profile.

| Family | Primary dim | Secondary dim |
|---|---|---|
| Build / Engineering | Artifact Integrity | AI Judgment |
| Product Thinking | Problem Precision | Original Insight |
| AI Fluency | AI Judgment | Artifact Integrity |
| Communication | Reflective Depth | Original Insight |
| Research | Problem Precision | Reflective Depth |
| Collaboration | Reflective Depth | — |

(Per-skill table — 40 rows — lives in `src/lib/skills.ts` once implemented; auto-generated from this family mapping with hand-tuned overrides for ~6 outliers.)

### 9.3 Tagging flow

1. **Auto-suggest at submission:** the council reads the artifact + reflection and proposes 3–6 skill tags with confidence scores.
2. **Builder confirms or edits:** can accept, remove, or add skills. Cap at 6.
3. **Mentor review:** mentor can approve/edit during scoring. Approved tags get a verified badge.
4. **Aggregation:** profile shows top 5 skills by `frequency × mean_dim_contribution` over the last 90 days.

### 9.4 Recruiter-readable export

Public profile emits JSON-LD for skills:

```json
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Maya R.",
  "knowsAbout": [
    { "@type": "DefinedTerm", "name": "Prompt Engineering", "termCode": "AI-FLUENCY/PROMPT-ENG" },
    { "@type": "DefinedTerm", "name": "User Research", "termCode": "PRODUCT/USER-RESEARCH" }
  ]
}
```

LinkedIn, Greenhouse, and similar systems can ingest this. Skill term codes are stable across rubric versions.

---

## 10. PII, Fairness, and Appeals

### 10.1 PII handling in council prompts

- **Names redacted** before the artifact is sent to the LLM. Replaced with `[BUILDER]`.
- **Demographic features** (age, gender, nationality, school) **never** appear in scoring prompts.
- **Profile photos** never sent to scoring models.
- Inputs to scoring: artifact URL/content, problem statement, reflection text, AI-use disclosure, declared project type.

### 10.2 Fairness checks

- Per-cohort score distribution monitored monthly. Outlier cohorts (mean ±1.5σ from platform mean) trigger review.
- A/B comparison: the same artifact submitted by builders from different programs should yield the same dim scores within ±3 points.
- Quarterly bias audit: 100 random scored projects reviewed by a 3-person human panel; results published in the methodology page.

### 10.3 Appeal process

1. Builder clicks "Appeal this score" on their dashboard. Picks the dim(s) they're disputing and writes a 200-char justification.
2. Score is **frozen** (still visible to builder, hidden from public profile and rolling AIQ) within 1 hour.
3. Mentor (different from any prior reviewer of this project) is auto-assigned. **3-day SLA.**
4. Mentor decision: uphold, partial adjust, full re-grade. Written reasoning required.
5. If adjusted, new score replaces the old in rolling AIQ; original preserved in audit log.
6. Builder may escalate once to AIQ council human review (5-day SLA, final).

Appeal rate is published per-cohort. Cohorts with appeal rate >15% trigger educator-facing review.

---

## 11. Telemetry

Tracked at platform level for ops + published on the methodology page (aggregated, not individual):

- **Score distributions** per program, per cohort, per band — shown as histograms updated weekly.
- **Dimension drift** — monthly drift sample (§7.5).
- **Appeal rate** — overall + by program, target <8%.
- **Mentor-override rate** — overall + by mentor, target 5–15% (too low = rubber-stamp, too high = AI not trusted).
- **Time-to-first-score** for new builders — target <72h from first submission.
- **Confidence-flag rate** — % of submissions that hit human-review-required (§7.3), target <10%.
- **Skill diversity** — Gini coefficient across the skill taxonomy, watching for over-concentration.

Per-builder telemetry stays private to the builder + their mentors.

---

## 12. Open questions and v2 follow-ups

1. **Mappable school rubrics** — per §8.5. Cleanest UX long-term but needs careful educator onboarding so they understand their grading affects a portable score.
2. **Multi-language rubric anchors** — current spec is English-only. Hindi, Arabic, Mandarin localizations needed before international rollout.
3. **Sponsor-defined bounty rubrics** — sponsors paying for bounties want their own rubric. Should plug into the same flexible-rubric infrastructure as schools (§8) but with sponsor-org ownership.
4. **5-axis radar** — currently we project 5 dims onto 4 lenses to keep the existing radar component. After the v2 rubric is live for 90 days, revisit whether to ship a native 5-axis radar.
5. **Public methodology page versioning** — the `/aiq` page should render the *current* rubric version *and* let visitors view archived prior versions for trust.
6. **Council voice weighting refinement** — currently median-of-voices. After 500+ v2 scores, evaluate whether weighted-mean (with voice-confidence-as-weight) would be more accurate without sacrificing stability.
7. **Builder-led re-tagging** — should builders be able to edit project skill tags after grading? Currently locked post-grade; debate is whether this is too rigid.

---

## Appendix A — Glossary

| Term | Definition |
|---|---|
| **Dim** | One of the 5 dimensions (Artifact Integrity, Problem Precision, AI Judgment, Reflective Depth, Original Insight). Scored 0–20. |
| **Lens** | One of the 4 legacy projections (Design, Business, Tech, Content). Derived from dims via fixed matrix (§6). |
| **Voice** | One of 5 council perspectives (Tech, Strategy, Business, Impact, Design). Each project is scored by 3 voices. |
| **Band** | One of 5 score tiers (Explorer → Founder-grade). |
| **Track** | A program affiliation (TEO, NBL, Edge Club, FDF, WLA) that applies dim weights. |
| **Rolling AIQ** | The number shown on a public profile; trailing 90-day weighted average across that builder's project scores. |
| **AIQ_RUBRIC_VERSION** | Semver of this spec. Stamped on every score record. Currently `2.0.0`. |

## Appendix B — What changed from v1 (the AJIR taxonomy)

| v1 (AJIR, deprecated) | v2 (this doc) |
|---|---|
| 4 lenses (Design, Business, Tech, Content) as primary scoring axis | 5 dims as primary axis; lenses are derived |
| 4 AJIR pillars (Applied, Judgment, Impact, Responsible) | Folded into the 5-dim rubric |
| Tiers: Apprentice / Practitioner / Operator / Architect / Pioneer | Bands: Explorer / Builder / Maker / Architect / Founder-grade |
| Single-shot score per critique | 90-day weighted rolling average |
| No formal regrade-stability rules | §7 trust contract |
| No formal council selection | §4 voice selection by project type |
| Capstone rubric = single fixed scheme | §8 flexible school-owned rubrics, dual-track |
| No skill taxonomy | §9 40-skill taxonomy with JSON-LD export |
