Human knowledge connects naturally. Until AI systems break it apart.

Supermat's Structured Citations encode relationships that vector embeddings miss—unifying what humans need for clarity and machines need for precision.

For as long as humans have shared information, we’ve been deliberate. Every paragraph builds upon the last. Definitions appear just in time to clarify. Key concepts emerge at precise moments. References loop back to deepen meaning.

These aren’t arbitrary choices; they’re the result of centuries of experience in how humans craft and absorb information.

Yet, modern AI workflows split documents into disconnected chunks, forcing teams to:

Accept Inaccuracy

Without original structure, LLMs hallucinate or miss critical dependencies (e.g., footnotes, section headers, exact detail).

Tolerate Opaque Citations

Citations like [Source 3] mean nothing to humans and lack granularity, forcing manual audits and guesswork.


UUIDs like f47ac10b-58cc-0e02b... or non-standard flat IDs like doc123_chunk003 are equally opaque to machines- they're simply pointers that force isolated chunk processing, losing critical relationships that can't be fully recovered.

Citations like [Source 3] mean nothing to humans and lack granularity, forcing manual audits and guesswork.


UUIDs like f47ac10b-58cc-0e02b... or non-standard flat IDs like doc123_chunk003 are equally opaque to machines- they're simply pointers that force isolated chunk processing, losing critical relationships that can't be fully recovered.

Tolerate Opaque Citations

Citations like [Source 3] mean nothing to humans and lack granularity, forcing manual audits and guesswork.


UUIDs like f47ac10b-58cc-0e02b... or non-standard flat IDs like doc123_chunk003 are equally opaque to machines- they're simply pointers that force isolated chunk processing, losing critical relationships that can't be fully recovered.

Tolerate Opaque Citations

Citations like [Source 3] mean nothing to humans and lack granularity, forcing manual audits and guesswork.


UUIDs like f47ac10b-58cc-0e02b... or non-standard flat IDs like doc123_chunk003 are equally opaque to machines- they're simply pointers that force isolated chunk processing, losing critical relationships that can't be fully recovered.

Accept Inaccuracy

Without original structure, LLMs hallucinate or miss critical dependencies (e.g., footnotes, section headers, exact detail).

Reconstruct Context

Devs waste cycles with secondary lookups and internal mapping to get close to traceability.

Reconstruct Context

Devs waste cycles with secondary lookups and internal mapping to get close to traceability.

Supermat inverts this from the ground up:

Atomic, yet fully connected.

We treat each piece of data as a discrete, referenceable unit—while keeping the big picture intact. This fundamental rethink enables something new…

Supermat reimagines Data Representation and Citations for the AI era.

For the first time, this gives common cause and common ground to both developers and domain stakeholders.

Parse like a human, not a tokenizer

Supermat unifies pre-processing, enrichment, and hierarchical citations in a single pass. The result?

Double digit gains in reliability.

+15.56%

Accuracy

+12.53%

Faithfulness

+33.33%

ROUGE-1 Recall

In our internal evals, we see double-digit lifts in factual correctness with broader coverage and more complete outputs. This translates to fewer hallucinations and more trust in automated answers.

What this enables

Adaptive Chunking

Dynamically adjust chunk sizes based on density, importance and attach parent-child context to prevent silos, all while staying within token limits

Fine-Tune at Ingestion

Iteratively enrich and add domain-specific labels, easily get SME validation, and enforce security controls at any level (document, section, sentence)—no re-ingestion required.

Structured Citations

Machine-Ready + Token-Efficient + Human-Friendly

Encoding hierarchy, sequence and relationships, our IDs have a core structure:

2.1.4.3 = Document [2], Section [1], Paragraph [4], Sentence [3] 

Each ID immediately communicates its exact coordinates. This forms a stable, graph-like backbone to complement AI applications with built-in context, bi-directional traversal (up, down, sibling) and flexible granularities at scale.

Standardized and self-contained, our citations tell the complete story and adapt naturally to various content types - from documents and presentations to transcripts and web pages.

Machine-Ready + Token-Efficient + Human-Friendly


Encoding hierarchy, sequence and relationships, our IDs have a core structure:

2.1.4.3 = Document 2, Section 1, Paragraph 4, Sentence 3 Each ID immediately communicates its exact coordinates. This forms a stable, graph-like backbone to complement AI applications with built-in context, bi-directional traversal (up, down, sibling) and flexible granularities at scale.

Standardized and self-contained, our citations tell the complete story and adapt naturally to various content types - from documents and presentations to transcripts and web pages.

Machine-Ready + Token-Efficient + Human-Friendly


Encoding hierarchy, sequence and relationships, our IDs have a core structure:

2.1.4.3 = Document 2, Section 1, Paragraph 4, Sentence 3 Each ID immediately communicates its exact coordinates. This forms a stable, graph-like backbone to complement AI applications with built-in context, bi-directional traversal (up, down, sibling) and flexible granularities at scale.


Standardized and self-contained, our citations tell the complete story and adapt naturally to various content types - from documents and presentations to transcripts and web pages.

Machine-Ready + Token-Efficient + Human-Friendly


Encoding hierarchy, sequence and relationships, our IDs have a core structure:

2.1.4.3 = Document 2, Section 1, Paragraph 4, Sentence 3 Each ID immediately communicates its exact coordinates. This forms a stable, graph-like backbone to complement AI applications with built-in context, bi-directional traversal (up, down, sibling) and flexible granularities at scale.


Standardized and self-contained, our citations tell the complete story and adapt naturally to various content types - from documents and presentations to transcripts and web pages.

For Developers

Supermat’s IDs encode both location and relationships. This single shiftfrom random strings to Structured Citations—removes an entire layer of complexity and makes retrieval and generation far more intuitive and complete. 

For Everyone Else

Citations finally align with how humans naturally reference, building trust and auditability between source text and AI output.

Verbatim Retrieval

Original Expression, Zero Waste

Most AI systems force models to paraphrase or regenerate text—even when the source is perfect. Supermat flips this: when content is identified as relevant, we retrieve and serve the exact, original text – unaltered and equipped with provenance.

Saves Output Tokens

No need to re-generate content you already have.

Ensures Perfect Fidelity

Verbatim text eliminates AI paraphrasing errors or hallucinations.

Clear Boundaries

Audiences see exactly what’s original versus AI-generated.

Ideal for High-Stakes Domains

Both for regulatory  (legal, medical) and IP-led (media, creative) fields that strongly rely on precise, exact wording and provenance—not AI interpretations.

It’s more than a cost saver.

Sometimes, the original expression carries the nuance or authority you simply can’t replicate.

By integrating verbatim retrieval as a core design, we’re building for the future.

A mission to preserve what’s human in AI

We’re bridging the gap between AI efficiency and human authenticity and expertise—with a singular framework that respects human endeavour and craft while being the gateway to new possibilities.

Let’s shape this vital intersection – together.

For Developers and Engineering teams:
Get started immediately on Github.

For Business, Domain, and Creator Stakeholders:
We’re building human-centric interfaces for data curation, observability and novel granular licensing models. 

More control, more agency and better human + AI synergy.  

Reach out to us for our early access program and connect with us. 

Stay in the loop with our newsletter below