Query 20 years of data in plain English

Point Archivist at any legacy system - emails, spreadsheets, databases, PDFs. AI discovers the structure, you approve the schema, and extraction runs automatically. Then ask questions like you'd ask a colleague.

Request Demo

How It Works

Four steps from dark archive to queryable data. AI does the heavy lifting, you stay in control.

Discover

AI scans your messy source data and maps every field, format, and relationship.

Propose

A clean schema is designed and presented for your review. You approve, edit, or reject.

Extract

Deterministic extraction runs fast and predictably. 500k+ records, no AI per row.

Query

Ask questions in plain English. Summaries, semantic search, and dashboards - instantly.

1 Discover

Your data is messy. That's the point.

Legacy systems leave behind a mess - inconsistent formats, cryptic column names, data scattered across CSVs, emails, PDFs, and ancient databases. Archivist's AI samples your source data and automatically identifies every field, format, and relationship.

Handles CSV, email archives, JSON, XML, PDFs, spreadsheets
Detects data types, relationships, and patterns automatically
No predefined parsers or manual mapping required

2 Propose

AI designs the schema. You approve it.

Archivist analyses the discovered data and proposes a clean, normalised database schema. Review column mappings, data types, and relationships before anything is extracted. Nothing runs without your sign-off.

Human-in-the-loop - approve, edit, or reject the proposed schema
AI explains its reasoning for each mapping decision
Full control before a single record is touched

AI schema proposal and approval interface

3 Extract

500,000+ records. Fast. Predictable.

Once you approve the schema, extraction kicks off using compiled deterministic rules - no AI per record. This means speed, consistency, and zero hallucinated data. Quality issues are flagged for review, never silently ignored.

Deterministic extraction - no AI inference per row
Quality exceptions surfaced for human review
Runs on your infrastructure, on-premises or private cloud

Extraction progress and quality dashboard

4 Query

Your archive is alive. Ask it anything.

Once extracted, your data becomes fully searchable and queryable. Ask questions in plain English, explore with semantic search, or generate dashboards - Archivist translates your intent into SQL and returns results with visualisations.

Smart Summaries

Ask a question in plain English and get a clear, contextual summary drawn from across your entire archive.

Semantic Search

Find related records across the entire archive using meaning, not just keywords. Powered by retrieval-augmented generation.

Instant Dashboards

AI-generated charts, tables, and visualisations. Save and share dashboards with your team.

Built for Real-World Data

Legacy data is messy. Archivist is designed for exactly that.

📄

Any Source Format

CSV, email archives, JSON, XML, PDFs, spreadsheets - Archivist handles whatever your legacy system left behind. No predefined parsers needed.

☑

Human-in-the-Loop

AI proposes the schema, but you approve before anything is extracted. Review column mappings, data types, and relationships. Nothing runs without your sign-off.

⚡

Scalable Extraction

Extraction uses compiled rules, not AI per record. Process 500,000+ records quickly and predictably. Quality issues are flagged for review, never silently ignored.

🔒

On-Premises & Private

Your data stays on your infrastructure. Archivist runs locally, supports local AI models, and never sends your archive to the cloud unless you choose to.