■ Auditor

■ Filer ■ Compiler ■ Summariser ■ Tools

What this is

Auditor scans a case folder for problems before files leave the office. It's a sister tool to Filer β€” same canonical reference rules, run independently or wired into Filer's Compile pre-flight (v0.8).

Workflow

  1. Case Details β€” set the case's identifiers (case ref, client ref, HO refs, client name). Import case summary populates everything from a gmiau-case-summary/1 JSON, or type the refs by hand.
  2. Scan β€” drop the case folder (or specific files). Tick Scan PDF text so Auditor can search document bodies, not just filenames. Hit Run audit.
  3. Report β€” review the findings. The headline question is per-file: does this document belong in this client's folder β€” yes or no?

The checks (v0.7)

  • C1 Files that belong to this case β€” for each file, Auditor searches the filename + the body text for any of this case's known identifiers (case ref, client ref with context cue, HO refs, client name, client email). First hit wins β€” once a file's ownership is confirmed, Auditor stops scanning it. Files where no identifier appears anywhere are flagged for review (they may be misfiled). For emails saved to PDF, a client email match in the subject / From / To line is enough.
  • C2 Third-party recipient allow-list β€” for each PDF named Letter to <X> or Email to <X>, Auditor extracts the recipient and checks it against gmiau-specs/parties.yaml (canonical names + aliases) then the Case Details ad-hoc parties textarea. Unknown β‡’ WARN. CCL / OAL client letters are exempted; their first-page salutation is cross-checked against the case summary's client name (mismatch β‡’ WARN).
  • C3 Three-monthly client-letter gap β€” finds client-facing letters by filename keyword (CCL / OAL / Opening Advice Letter / Client Care Letter), takes the newest date (filename YYMMDD or YYYY-MM-DD β†’ PDF /CreationDate β†’ file mtime) and flags WARN at 91+ days old, INFO at 81–90 days, INFO if no client letter is present. Thresholds + keywords are tunable in Config.
  • C4 Missing key documents β€” per funding flag (LH / CLR / CLR2 / PFC / …), Auditor checks that the expected key documents are present. v1.1 of auditor-keydocs.yaml extends beyond CCL/OAL: identity verification, eligibility/means evidence, and CW4 merits forms on CLR matters.
  • C5–C8 (stubs) β€” regulator-aligned checks scaffolded per gmiau-specs/AUDITOR-REGULATORY-MAP.md: C5 CW4 merits assessment, C6 closure letter, C7 retention/destruction, C8 attendance-note presence. Each emits a pending row in the Report until the matching v0.x lands.

The tabs

πŸ“‹ Case Details

The case's identifiers β€” case ref, client ref, HO umbrella refs, client name. Anything that positively identifies a document as belonging to this case. Import a Case Summary JSON to populate.

πŸ” Scan

Drop zone, file list, Run audit button.

πŸ“Š Report

Findings from the last audit run, per check.

βš™οΈ Config

Per-caseworker thresholds + keywords (Phase 0.5+).

🎨 Settings

App appearance, font, tab visibility, embedded spec.

Confidentiality: this tool runs entirely in your browser; nothing is uploaded. The files you load, the case references and the client names are confidential β€” keep audit reports on this machine.

Scan

Drop a case folder, or individual files. Filenames are always scanned; tick Scan PDF text to scan body content too (uses pdf.js β€” slower).

Drop case folder or files here
Diagnostics β€” run a smoke test against the shared check library

Proves @@AUDITOR_CHECKS@@ resolved and the loader registered window.gmiauAuditor. Useful after a rebuild.

Case Details

The case's identifiers. C1 checks each scanned file for at least one of these β€” case ref, client ref (digits with cue word like "Our ref:"), Home Office umbrella ref, client name, or client email. A file with none of them is flagged for review.

Report

No audit has been run yet. Go to the Scan tab, drop a case folder, and hit Run audit.

Config

Per-caseworker preferences β€” shared with other GMIAU tools via the One File projection in localStorage['ifyi_config_bundle'] under auditor.*. Persists across reloads; sync between machines by exporting/importing from Filer's Config tab.

C3 Β· Three-monthly client-letter gap

Coming with later phases.
  • C2 parties allow-list editor (currently sourced from gmiau-specs/parties.yaml at build time; the Case Details ad-hoc textarea is the runtime override).
  • C1 bare-digit context cue list β€” surface the existing heuristic for editing.
  • Severity overrides per check.
  • Override marker text for Filer pre-flight (Compiled with GMIAU Auditor) β€” used by v0.8.

App font

App appearance

Dracula themes are dark-only β€” switching to Light keeps them dark.

Tab visibility

Syncs across every GMIAU Shell tool via ifyi_hide_guide_tab + ifyi_hide_config_tab.

Canonical spec

The current AUDITOR-SPEC.md is inlined at build time. Edit the spec at gmiau-specs/AUDITOR-SPEC.md and run immigrationfyi-tools update to refresh every tool that embeds it.

Show AUDITOR-SPEC.md
# GMIAU Auditor Spec
**Version 0.4 Β· 2026-05-19 Β· v0.7 C2 shipped**

v0.4 (2026-05-19): C2 recipient allow-list check shipped.
- `gmiau-specs/parties.yaml` bumped to v1.1 with HMCTS + GMIAU additions; inlined into auditor.html via `<!-- @@DATA:parties.yaml@@ -->` (resolves to `window.IFYI_PARTIES`).
- C2 extracts recipient from `Letter to <X>` / `Email to <X>` filenames; looks up against parties.yaml canonical names + aliases, then the Case Details ad-hoc textarea. Unknown β‡’ WARN.
- CCL / OAL client letters exempted from the recipient check; their first page is salutation-checked against `caseSummary.client_name` (mismatch β‡’ WARN).
- Database experts/barristers layer deferred β€” left as a TODO in `auditor-checks.js` until the encrypted Database's unlock state is surfaced into the audit ctx.

v0.3 (2026-05-19): C3 three-monthly gap check shipped.
- C3 threshold reverted to **91 days, comparison `>=`** (one day strictly past three months). Functionally equivalent to v0.2's "90 days, strict `>`"; this form is clearer in the Config UI ("WARN at 91+ days").
- Added a second knob β€” `auditor.approachDays` (default **81**). Gap 81–90 days β†’ INFO "approaching"; 91+ β†’ WARN. Gives caseworkers a ten-day heads-up before the gap actually trips.
- Config persisted via `localStorage['ifyi_config_bundle'].auditor.*`, the same bundle Filer's Config tab reads/writes β€” cross-tab live-sync via the `storage` event.

v0.2 (2026-05-14): user feedback on Β§10 open questions:
- C2 simplified β€” recipient extraction now reads filenames (`Letter to X.pdf` / `Email to X.pdf`), not body salutations. Client letters identified by salutation `Dear <FirstName>(?:\s+<MiddleName>)?` matched against `case_summary.client_name`.
- Parties allow-list is **system-wide**, not per-case-summary. Sourced from a new canonical `gmiau-specs/parties.yaml` + the encrypted GMIAU Database (experts, barristers) + caseworker ad-hoc list per audit run. CASE-SUMMARY-IMPORT extension dropped.
- Threshold tightened to **90 days** (was 91). _(Reverted to 91 in v0.3 with `>=` comparison.)_
- Client-letter keywords narrowed to `CCL`, `OAL`, `Opening Advice Letter`, `Client Care Letter` only.
- Override marker text: `Compiled with GMIAU Auditor` (typo-correction reading; alternative "Complied with GMIAU Auditor" still open β€” see Β§10).
- Auditor is a **sibling tool**, not a Filer mode (user delegated; see Β§10).

Sister tool to **Filer**. Auditor scans a case's files (folder, sub-tree, or staged-for-filing set) and reports compliance + data-protection issues *before* anything leaves the office. Filer borrows Auditor's checks as a pre-flight before Compile.

Companion specs: [`FILER-SPEC.md`](./FILER-SPEC.md), [`REFERENCES-SPEC.md`](./REFERENCES-SPEC.md), [`CASE-SUMMARY-IMPORT.md`](./CASE-SUMMARY-IMPORT.md), [`GMIAU-STYLE-GUIDE.md`](./GMIAU-STYLE-GUIDE.md).

---

## Β§1 Purpose

A caseworker about to transfer a file to a third party (counsel, new firm, LAA, peer reviewer, court) needs to know:

1. Are there any **stray references** to other clients in this case's files? (Data protection β€” the first reason for the tool to exist.)
2. Are there **letters addressed to recipients not on the case summary's allow-list**? (Same concern, different vector.)
3. Has a **client-facing letter** been sent in the last three months? (LAA expectation on live matters.)
4. Are any **key documents** missing for this matter / funding type? (CW1, COI, CCL, OAL etc.)

Auditor runs these checks. Filer runs them automatically before Compile. Caseworker can also run Auditor ad-hoc on any case at any time.

---

## Β§2 Architecture

- **Standalone HTML tool** at `immigrationfyi-tools/source/auditor.html`. Standard GMIAU Shell canon (header + tabs + Settings tab + sentinel order per [[ref_ifyi_shell]]).
- **Shared check library** at `immigrationfyi-tools/scripts/auditor-checks.js`. Single source of truth for the check algorithms; both `auditor.html` and `filer.html` inline it via a new build-time sentinel `<!-- @@AUDITOR_CHECKS@@ -->` (same pattern as `@@IFYI_CONFIG@@`, `@@DATABASE@@`). One edit propagates to both tools on `immigrationfyi-tools update`.
- **No backend.** Pure-client. Inputs come from file pickers + case summary JSON; outputs render in-tab + as a downloadable HTML report.
- **No client data leaves the machine.** Same offline contract as Filer.

---

## Β§3 Inputs

### 3.1 Case context

- **Case Summary JSON** (`gmiau-case-summary/1`). Auditor uses:
  - `case_reference` β€” the allow-list of refs that belong to this case (Case ref, Client ref, all Home Office umbrella refs)
  - `client_name` β€” first name(s) used to identify client-addressed letters (`Dear <FirstName>`); also for the report header

- **Parties allow-list (system-wide, not per-case)** β€” three layers, in this lookup order:
  1. **Canonical bodies** from `gmiau-specs/parties.yaml` (NEW spec β€” see Β§7). Government bodies, courts, tribunals: UKVI, Home Office (alias HO), FtTIAC (aliases FtT, First-tier Tribunal), UTIAC (aliases UT, Upper Tribunal), EHRC, Court of Appeal, etc.
  2. **GMIAU Database** β€” experts, barristers, interpreters from the encrypted register, read via the existing `@@DATABASE@@` sentinel (`window.databaseGet('experts')`, `…('barristers')`, …). Cross-tool single source of truth per [[ref_ifyi_database_sot]].
  3. **Caseworker ad-hoc list** β€” a textarea in Auditor's Case Details tab, one name per line. Per audit run; not persisted. Covers one-off contacts: new instructing firm, the client's GP this matter, etc.

- **Ref allow-list** populates from the imported case summary; editable per run. **Parties allow-list** is read-only for layers 1–2 (canonical + database); editable for layer 3.

### 3.2 Files

- Drop a **folder** (whole case root, or a specific sub-tree, or Filer's staged set).
- Drop **individual files**.
- Auditor scans:
  - **PDFs** β€” text content via pdf.js (already inlined in the suite) + filename
  - **Filenames only** for everything else (`.docx`, `.xlsx`, images, audio)
- Auditor does NOT open Office files in v0.1 (left to v0.2; OOXML text extraction via JSZip is plausible).

### 3.3 Config

`ifyiConfig.auditor.*` (additive within `gmiau-config/1`, no schema bump). Persisted in `localStorage['ifyi_config_bundle'].auditor.*`:
- `threeMonthDays` β€” C3 WARN threshold; default **91** (gap β‰₯ this many days β†’ WARN)
- `approachDays` β€” C3 INFO threshold; default **81** (gap in [`approachDays`, `threeMonthDays`) β†’ INFO "approaching")
- `clientLetterKeywords` β€” list of filename tokens that count as a client-facing letter (defaults `CCL`, `OAL`, `Opening Advice Letter`, `Client Care Letter`; configurable)
- `keyDocsByFunding` β€” per funding flag (`LH`/`CLR`/`CLR2`/`PFC`/...), the list of required document keywords. Seeds from `gmiau-cases/gmiau.py:_REVIEW_DEFAULT_KEY_DOCS` β€” see Β§7.

---

## Β§4 Checks (v0.1 scope)

Four checks. Each emits zero or more findings; each finding has `{checkId, severity, file, evidence, suggestion}`. Severity ceiling per check:

### C1 β€” Files that belong to this case *(severity ceiling: WARN)*

**The question is per-file:** does this document belong in this client's folder β€” yes or no? (Reframed 2026-05-14 from the v0.4-v0.5.2 per-token approach. The user reported the old algorithm produced noise from PGP signatures, base64 / data: URIs, URLs and long hashes; the new algorithm searches for *specific known strings* so noise can't match by accident.)

For each scanned file, Auditor searches the **filename** and (if PDF text was extracted) every **body page** for any of the case's known identifiers. **First hit wins** β€” once a file's ownership is confirmed, scanning stops. Files with no hit anywhere get a WARN finding: *"Couldn't find any identifier for case <ref> in this file. Review whether this file belongs here."*

**Identifiers** built from `caseSummary` + `refsAllowList`:

| Kind | Source | Search rule |
|---|---|---|
| `case_ref` | `case_reference`, `refsAllowList.caseRef` | Substring (case-insensitive). The funding-tail stem (`RB12345` from `RB12345-LH-GF`) is also added. |
| `client_ref` | `client_reference`, `refsAllowList.clientRef` | Word-boundary digit match, but **only counts as a hit if a cue word** ("Our ref", "Client ref", "Your ref", "Matter ref", "File ref", "Case ref", "Reference") appears within 30 chars before. Avoids matching page numbers, post-codes etc. |
| `ho_ref` | `home_office_reference`, `uan`, `gwf`, `port_reference`, `case_id`, `refsAllowList.homeOfficeRefs[]` | Substring. |
| `appeal_ref` | `appeal_reference`, `refsAllowList.appealRefs[]` | Substring. |
| `name` | `client_name` | Substring of the full name as imported. (First name alone is too weak β€” gives false positives on common forenames.) |

**What was dropped from earlier drafts:** the `REF_FINDER` alternation, the `BARE_DIGIT_FINDER`+`BARE_DIGIT_CONTEXT_RE` open-universe sweep, `NOISE_PATTERNS` / `_noiseSpans` / `_inAnySpan`, the allow-list "anything else is flagged" model, the "stray reference" findings shape. The cross-reference detection ("this case's file mentions ANOTHER case's ref") is **not** part of v0.5.3 C1 β€” a file that contains its own ref is treated as belonging, irrespective of what else it mentions. That cross-reference question is left for a future check (possibly part of C2 or a new C5).

### C2 β€” Recipient not on the system parties allow-list *(severity ceiling: WARN)*

Reads filenames, not body content β€” GMIAU's naming convention carries the recipient.

**Identification rules:**
- `^.*\bLetter to (.+?)(?:,\s*\d|\.\w+$)` β€” captures `<X>` from `Letter to <X>.pdf` or `Letter to <X>, 12 May 2026.pdf` (Filer-style trailing date).
- `^.*\bEmail to (.+?)(?:,\s*\d|\.\w+$)` β€” same shape, "Email" variant.
- Files matching client-letter keywords (`CCL`, `OAL`, `Opening Advice Letter`, `Client Care Letter`) are **not** subject to C2 β€” they're for the client, who is always allowed.
- Files that match neither pattern are not outbound letters; C2 skips them.

**Lookup (`<X>` from filename):**
- Case-insensitive match against `parties.yaml` canonical name OR any alias.
- Then against database experts + barristers (name + aliases if stored).
- Then against caseworker's ad-hoc list for this run.
- Not found in any layer β‡’ WARN finding `{file, capturedRecipient, suggestion: "Confirm <X> is a permitted recipient; add to ad-hoc list or update parties.yaml."}`.

**Body-content cross-check (defensive):**
- For PDFs that *look* like client letters by filename (CCL / OAL), verify the salutation `^Dear (<client_first_name>)(?:\s+<client_middle_name>)?,` is present and matches `case_summary.client_name`'s first name(s) on the first page. Mismatch β‡’ WARN ("Filed as client letter but salutation reads 'Dear <other>'").
- For PDFs that *look* like third-party letters (`Letter to X`), no body cross-check in v0.1 β€” filename is authoritative.

Severity is WARN (not FAIL): caseworker may have a legitimate new contact that just needs adding to the ad-hoc list or the canonical YAML.

### C3 β€” Three-monthly client-letter gap *(severity ceiling: WARN)*

- Default WARN threshold: **91 days** (`ifyiConfig.auditor.threeMonthDays`); comparison `today βˆ’ newest >= 91`.
- Default INFO ("approaching") threshold: **81 days** (`ifyiConfig.auditor.approachDays`). Triggers when the gap is in `[approachDays, threeMonthDays)`. Clamped to `<= threeMonthDays` so the two never invert.
- Client-facing letters identified by filename keyword (case-insensitive substring): **`CCL`**, **`OAL`**, **`Opening Advice Letter`**, **`Client Care Letter`**. List is config-editable (`ifyiConfig.auditor.clientLetterKeywords`).
- For each match, extract a date β€” first hit wins:
  1. Filename `YYYY-MM-DD` (anywhere; year 19xx/20xx) or `YYMMDD` (six consecutive digits at a word boundary; YY pivots on current year + 5).
  2. PDF `/CreationDate` (or `/ModDate` fallback) via pdf.js `getMetadata()`. Already loaded for text extraction β€” no extra dependency.
  3. OS `lastModified` (`File.lastModified`).
- Take the most recent letter date. If `today βˆ’ newest >= threeMonthDays` β‡’ WARN. If `approachDays <= gap < threeMonthDays` β‡’ INFO. Otherwise PASS.
- If **no** client-facing letter exists at all β‡’ separate INFO finding ("no client letter") β€” a brand-new matter legitimately has none; not a WARN escalation.
- If letters exist but none has a usable date β‡’ INFO listing each undated file ("rename with a YYMMDD prefix").

### C4 β€” Missing key documents *(severity ceiling: WARN)*

- Read `case_summary.funding_flags` (e.g. `LH`, `CLR-ECF`) β€” or whichever field the case summary uses to mark the funding stage. For each flag present, look up the expected key-doc keywords in `keyDocsByFunding`.
- For each expected document, search filenames for the keyword (case-insensitive substring). Missing β‡’ finding.
- The matter-type β†’ expected-docs table is the v0.1 scope risk; lift the GMIAU-canonical table from `gmiau-cases/gmiau.py:_REVIEW_DEFAULT_KEY_DOCS` and promote to `gmiau-specs/auditor-keydocs.yaml`. Auditor reads the YAML at build time via a sentinel; the caseworker edits via Auditor's Config tab.

---

## Β§5 Output

### 5.1 In-tab Report pane

Per check, a section with:
- Title + status badge (PASS Β· INFO Β· WARN Β· FAIL)
- One-line summary ("3 stray refs in 2 files")
- Expandable rows per finding: file path, evidence excerpt (highlighted token / matched line), suggestion
- Overall verdict at top: green / amber / red

### 5.2 Downloadable report

- `<Case> Audit Report <YYYY-MM-DD>.pdf` β€” bound report mirroring the in-tab view. PRIVATE & CONFIDENTIAL header, GMIAU metadata, optional qpdf-wasm lock (reuse Filer's `_gmiauMaybeLockBytes`).
- `<Case> Audit Report <YYYY-MM-DD>.json` β€” machine-readable findings for archival or for piping into other tools.

### 5.3 Persistence

- Audit runs are ephemeral. No history kept by default.
- Saving the JSON report is the caseworker's responsibility (drop into `~/Documents/Work/<case>/Reviews/`).

---

## Β§6 Filer integration

Filer's Compile button gains a pre-flight:

1. On click, before `buildFilerBundle` runs, Filer invokes `window.gmiauAuditor.runChecks({caseSummary, fileState})`.
2. If any **FAIL**: modal shows the report inline, "Cancel" / "Compile anyway" buttons. "Compile anyway" requires a one-time confirm.
3. If only WARN/INFO: status bar shows the count; Compile proceeds without blocking.
4. If everything passes: silent.

**Compile marker in PDF metadata.** Every Filer compile that ran the pre-flight stamps a marker into the bound PDF's `Producer` field (appended to `GMIAU Toolkit Β· `):
- **Clean compile** (no FAIL, or only WARN/INFO accepted): `GMIAU Toolkit Β· Compiled with GMIAU Auditor (YYYY-MM-DD)`
- **Override compile** (FAIL accepted via Compile-anyway): `GMIAU Toolkit Β· Compiled with GMIAU Auditor (YYYY-MM-DD) β€” overrides accepted`

The marker means "audit ran"; the suffix flags non-clean. The audit JSON sidecar (when written β€” Β§5.2) carries the full finding list for a reviewer who wants more detail than the marker.

Caseworker can run the full audit independently any time via Auditor's own UI β€” Filer's pre-flight is just a convenient hook.

---

## Β§7 Dependencies (need to land before / alongside v0.2 build)

1. **`gmiau-specs/parties.yaml`** *(NEW)* β€” canonical recipients. Cross-consumed by Auditor (C2 allow-list) and eventually by `letter` / `court-doc` CLIs for filename autocomplete. Shape:
   ```yaml
   name: parties
   version: "1.0"
   updated: 2026-05-14
   bodies:
     home_office:
       name: "Home Office"
       aliases: ["HO", "UKVI"]
       role: government
     fttiac:
       name: "FtTIAC"
       aliases: ["FtT", "First-tier Tribunal", "First-tier Tribunal IAC"]
       role: tribunal
     utiac:
       name: "UTIAC"
       aliases: ["UT", "Upper Tribunal", "Upper Tribunal IAC"]
       role: tribunal
     # … Court of Appeal, EHRC, AIT, LAA, etc.
   ```
2. **`gmiau-specs/auditor-keydocs.yaml`** *(NEW)* β€” promoted from `gmiau-cases/gmiau.py:_REVIEW_DEFAULT_KEY_DOCS`. Cross-consumed by Auditor + the gmiau-cases CLI.
3. **`immigrationfyi-tools/scripts/auditor-checks.js`** *(NEW)* β€” shared check library. New build-time sentinel `<!-- @@AUDITOR_CHECKS@@ -->` (extends [[ref_spec_sentinel_pattern]]).
4. **`@@DATABASE@@`** sentinel β€” already exists; Auditor pulls experts/barristers from the encrypted register via `window.databaseGet('experts')` / `…('barristers')`. No new build wiring.
5. **`@@SPEC:AUDITOR-SPEC@@`** sentinel in `auditor.html`'s Settings tab β€” embeds this spec, per [[ref_spec_sentinel_pattern]] (already wired).
6. **No CASE-SUMMARY-IMPORT change.** The previous v0.1 plan to add a `parties:` field to `gmiau-case-summary/1` is dropped β€” parties are system-wide (via #1 + #4), not per-case.

---

## Β§8 Out of scope (v0.1)

- **Named-entity recognition** on document body content (would need an ML model β€” violates offline + airgap rules).
- **`.docx` / `.xlsx` text extraction.** v0.2 candidate via JSZip + OOXML parse β€” same pattern as Filer's `_filerScrubOfficeXml`.
- **Email / SharePoint scanning** (out of scope; Auditor scans the on-disk case folder only).
- **Auto-redact / auto-fix.** Auditor reports; the caseworker fixes.
- **Cross-case scanning** ("are any files from case X showing up in case Y?"). Possible v0.3 with the encrypted GMIAU register's case list.
- **Three-monthly auto-letter generation.** Different tool β€” Auditor only flags the gap.
- **History / trend reports** (audit results over time). v0.3+ if asked for.

---

## Β§9 Build phases

| Phase | Scope | Output |
|---|---|---|
| **v0.1 β†’ v0.2 (this doc)** | Design + user-confirmed scope. C2 simplified to filename-based; parties source moved system-wide. | This file β€” for user re-review of Β§10 (only Q4 remains) |
| **v0.3** | Scaffold `auditor.html` (canon shell, 5 tabs incl. Guide + 🏠 Index back-link) · `auditor-checks.js` skeleton · `@@AUDITOR_CHECKS@@` sentinel wired into `build-all-offline.py` · seed `gmiau-specs/parties.yaml` + `auditor-keydocs.yaml` from existing GMIAU sources | tool boots; no real audit yet |
| **v0.4** | Check C1 (stray refs) β€” highest-value, lowest-ambiguity | scan a folder; see flagged refs |
| **v0.5** | Check C4 (missing key docs) β€” reuses well-defined table | full findings UI |
| **v0.6** (shipped 2026-05-19) | Check C3 (three-monthly gap) β€” filename date / PDF `/CreationDate` / mtime; WARN at 91+, INFO at 81–90, INFO if no letter; Config tab live for `threeMonthDays` + `approachDays` + `clientLetterKeywords`; PDF metadata extraction folded into existing pdf.js pass | gap detection live |
| **v0.7** (shipped 2026-05-19) | Check C2 (recipient allow-list) β€” `Letter to <X>` / `Email to <X>` filename capture, lookup against `parties.yaml` + ad-hoc list, CCL/OAL salutation cross-check. Database experts layer deferred. | parties lookup live |
| **v0.8** | Filer pre-flight integration + PDF metadata Producer marker | Compile-time blocking |
| **v0.9** | PDF report + JSON report download | shareable artefact |
| **v0.10** | Vault docs (Cheatsheet + Reference + Guide tab export) + tool-audit clean | feature-complete |

Each phase = a single backup snapshot to `gmiau-backup/auditor-vNN-YYYY-MM-DD/`, in line with [[feedback_ifyi_snapshot_every_change]].

---

## Β§10 Open questions

### Resolved 2026-05-14 (user reply)

1. βœ… **Recipient extraction (C2)** β€” GMIAU letters use `Dear FIRSTNAME,` / `Dear FIRSTNAME MIDDLENAME,` for client letters; third-party letters are identified by filename (`Letter to X.pdf` / `Email to X.pdf`). Spec rewritten in C2.
2. βœ… **Client-letter keywords (C3)** β€” narrowed to `CCL`, `OAL`, `Opening Advice Letter`, `Client Care Letter`. User's proposed additions (Letter to Client / Client Update / Follow-up) dropped.
3. βœ… **Three-monthly threshold** β€” 90 days.
5. βœ… **Parties source** β€” system-wide (canonical YAML + GMIAU Database experts/barristers + caseworker ad-hoc list per run). CASE-SUMMARY-IMPORT extension dropped. See Β§3 + Β§7.
6. βœ… **Sibling tool, not Filer mode** β€” user delegated; rationale: distinct purpose (check correctness vs package outbound), distinct cadence (ad-hoc audit vs file-transfer moment), aligns with GMIAU Shell pattern (Bundle Builder + Evidence Exhibitor are sibling tools, not one tool with modes), separate icon in launcher makes "run an audit" discoverable.

### Resolved 2026-05-14 (user reply, Reading A)

4. βœ… **Override marker β€” Reading A confirmed.** `Compiled with GMIAU Auditor (YYYY-MM-DD)` stamped on every audited compile; overrides append ` β€” overrides accepted` after the date. Lives in the bound PDF's metadata `Producer` field, appended to `GMIAU Toolkit Β· …`. Marker = "audit ran"; the suffix flags non-clean compiles.

---

**Status:** v0.7 shipped 2026-05-19 β€” C2 recipient allow-list live via `parties.yaml` + ad-hoc textarea + CCL/OAL salutation cross-check. C1 + C2 + C3 + C4 all live. C5–C8 scaffolded as stubs per `AUDITOR-REGULATORY-MAP.md` (CW4 merits / closure letter / retention / attendance notes). Database experts/barristers layer for C2 deferred. Browser-test still pending. Next: v0.8 (Filer pre-flight integration + Producer marker) β€” and the C5–C8 implementations in subsequent v0.x.

Regulatory framework cross-reference: [`AUDITOR-REGULATORY-MAP.md`](./AUDITOR-REGULATORY-MAP.md) β€” maps every file-level evidence requirement from SQM v3, LAA Peer Review IA, LAA Imm Common Errors (2022 + 2024), and IAA Code of Standards 2024 to a live / partial / planned / out-of-scope status.

**Provenance:**
- v0.1 Β· 2026-05-14 β€” initial design doc; written by Claude with user (conversational thread "Filer + Auditor + spec sentinel", 2026-05-14). No build started.
- v0.2 Β· 2026-05-14 β€” same-day revision after user's Β§10 answers. C2 simplified (filename-based recipient extraction), parties moved system-wide (`parties.yaml` + database + ad-hoc), threshold tightened to 90 days, client-letter keywords narrowed, sibling-tool decision confirmed. Snapshot of v0.1 at `gmiau-backup/auditor-v0.1-to-v0.2-2026-05-14/`. Still no build.
- v0.3 build Β· 2026-05-14 β€” scaffold shipped. `parties.yaml` + `auditor-keydocs.yaml` seeded, `auditor-checks.js` skeleton, `@@AUDITOR_CHECKS@@` sentinel wired into `build-all-offline.py`, `source/auditor.html` (GMIAU Shell, 7 tabs) booting; all 4 checks PASS-stubbed. Snapshot at `gmiau-backup/auditor-v0.3-scaffold-2026-05-14/`.
- v0.4 build Β· 2026-05-14 β€” C1 (stray case references) implemented. `auditor-checks.js` gains `REF_FINDER` alternation + bare-digit context heuristic + `findRefsIn` + `buildAllowList` + `classifyRef` + `canonicalRef` (strips funding-flag tail). `auditor.html` gains drop zone (folder+files), file list, pdf.js text extraction (CDN 3.11.174 + worker blob), Run audit button, full Case Details form (Case ref Β· Client ref Β· HO refs Β· Appeal refs Β· Funding flag Β· Client name Β· Ad-hoc parties) + Import Case Summary, Report tab rendering (verdict pill + per-check cards + findings table). Smoke-tested against 3-file scenario: filename stray ref βœ“, in-body stray ref βœ“, bare-digit with context βœ“, own refs not flagged βœ“. **Not yet browser-tested** β€” `node --check` βœ“, build βœ“ (1.88 MB), tool-audit βœ“. Snapshot at `gmiau-backup/auditor-v0.4-c1-2026-05-14/`.
- v0.5.1 fixtures + test runner Β· 2026-05-14 β€” `gmiau-testing/auditor/` added: `build.py` generates 8 stdlib-only PDFs covering every C1+C4 path (own refs clean Β· stray filename ref Β· stray body ref Β· bare-digit with context Β· bare-digit without context Β· clean CCL+OAL satisfying C4 Β· WARN scenario by removal). `test-auditor.js` is a Node-based automated runner β€” 18 scenarios, no browser needed. The runner caught a v0.4-shipping bug: `REF_FINDER` was missing the FtTIAC appeal pattern (`[A-Za-z]{2,3}-\d{4,5}-\d{4}`) so PA/HU/EA/DA appeal references in stray-file text wouldn't have been flagged. Fixed inline; 18/18 pass post-fix.
- v0.5.3 C1 redesign β€” **per-file ownership, not per-token** Β· 2026-05-14 β€” user feedback: "the auditor isn't working. If the auditor verifies that a document contains information matching the client's data, then it does not need to review the entire document. The issue is does this document belong in the client's file β€” yes or no." This is a fundamental reframe: C1 is no longer "scan every token and flag those not in the allow-list" (which produced PGP / base64 / URL noise). It's now "search the file for any of the case's known identifiers β€” if at least one appears, the file belongs; if none, flag the file for review." Implementation: `_collectIdentifiers(ctx)` builds a small list of `{kind, token, needsContext}` entries (case ref + funding-tail stem, client ref with cue, every HO umbrella ref, appeal refs, full client name). `_searchIdentifier` does substring search (case-insensitive) β€” with the bare-digit cue test for client refs. `_findFirstIdentifier` returns the first hit and stops. The check iterates files, checks filename first (cheap, almost always carries the ref under GMIAU naming), then body pages until a hit. Files with no hit get a WARN finding. Title renamed: "Stray case references" β†’ "Files that belong to this case". NOISE_PATTERNS and the `_noiseSpans` filter dropped β€” no longer needed (specific-string search can't false-match base64). The Report scope-line now lists the identifiers being searched for (grouped by kind: case ref, client ref with cue, HO ref, appeal ref, client name). Test runner rewritten: 14 C1 scenarios cover every identifier kind + cue-gated bare-digit + no-identifier WARN + pages:null fallback + PGP-content-doesn't-block-ownership; plus 5 C4 unchanged + 1 E2E = 20/20 pass. Fixtures README rewritten β€” expected outcomes change from "3 stray-ref findings" to "1 mis-filing flag on the RB67890 file". Snapshot at `gmiau-backup/auditor-v0.5.3-ownership-2026-05-14/`.
- v0.5.2 false-positive filter + Report UX Β· 2026-05-14 β€” user reported "lots of false flags from my email signature's PGP" + "doesn't tell me what the issue is". Two fixes + a tab reorder. (1) `NOISE_PATTERNS` + `_noiseSpans` + `_inAnySpan` added to `auditor-checks.js`: any C1 finding whose index falls inside a PGP/PEM armored block, a `data:…;base64,…` URI, an `http(s)`/`ftp`/`mailto` URL, an email address, or a long (40+) unbroken alnum run is dropped. Original-text snippet still drawn from un-stripped source so context stays human-readable. (2) Report tab rewritten: each finding now renders as a small card with explicit `Where`, `Token`, `Kind`, `Snippet`, and a visible **Why** line (was tooltip-only). Each check card carries a `check-card-scope` subtitle: C1 lists the actual allow-list tokens scanned against (so caseworker sees what counted as own-case); C4 names the funding flag + requirement count. (3) Tab order reshuffled per user: Guide Β· Case Details Β· Scan Β· Report Β· Config Β· Settings (Case Details moved before Scan since you set up the allow-list before scanning; Scan stays the default-active per Β§2C). Three new regression tests added (PGP block, stray-ref-then-PGP-block, URL/email/data:URI/hash). 21/21 pass. Snapshot at `gmiau-backup/auditor-v0.5.2-noise-filter-2026-05-14/`.
- v0.5 build Β· 2026-05-14 β€” C4 (missing key documents) implemented + generic `@@DATA:<filename>@@` sentinel infrastructure added. `auditor-keydocs.yaml` (v1.0, 11 funding-flag keys β†’ CCL/OAL aliases) is now consumed at runtime via `window.IFYI_AUDITOR_KEYDOCS`. New `inline_data` + `_parse_yaml_subset` + `_split_flow_items` in `build-all-offline.py` (hand-rolled YAML parser for the gmiau-specs shape β€” top-level scalars, nested maps, flat lists, inline flow arrays). `auditor.html` gains the `@@DATA:auditor-keydocs.yaml@@` sentinel + `fundingFlag` passed into `ctx` by `readCaseDetails`. C4 also derives the flag from a hyphenated case-ref tail (`RB12345-CLR-ECF` β†’ `CLR-ECF`) when the explicit field is blank. Smoke-tested against 6 scenarios: empty fileset + LH β†’ 2 missing (WARN), CCL present no OAL β†’ 1 missing, both present β†’ PASS, no flag β†’ INFO with hint, unknown flag β†’ INFO with hint, flag-from-case-ref β†’ PASS. **Not yet browser-tested** β€” `node --check` βœ“, build βœ“ (1.89 MB), tool-audit βœ“. Snapshot at `gmiau-backup/auditor-v0.5-c4-2026-05-14/`. The `@@DATA@@` sentinel is reusable β€” `parties.yaml` (v0.7) will plug into it the same way.