How Do I Know Carys Is Accurate?

Carys is built around the Insights Engine, a verification pipeline that keeps the AI away from the arithmetic, re-queries every cited number against your data and refuses to ship claims it cannot back up.

The Problem With AI And Numbers

Large language models are built for language. When asked to calculate, they do not run an equation. They predict what a plausible answer looks like. For multi-step, data-dependent arithmetic across millions of rows, such as aggregations, percentage breakdowns and year-on-year comparisons with shifting denominators, that prediction is not reliable. The figures look authoritative, there is no error message when they are wrong and they simply appear in the report.

Any analytics system that lets its AI generate, infer or re-state numbers in prose without separately verifying those figures against the source data is using the wrong tool for the job.

The Insights Engine: Separate The AI From The Arithmetic

Carys keeps a hard boundary between what the AI is allowed to do and what the data engine is allowed to do. The AI plans the investigation and writes the queries. It never computes a number itself. Every figure that appears in a report was produced by a query run directly against your data, stored as a structured fact with a unique ID and re-verified before it was allowed into the final write-up.

LLM Domain

Language, planning & code authoring

The AI understands your question, plans the investigation and writes the queries and code that will be executed against your data. It reads the results and drafts the analytical conclusions for your report.

The AI authors the queries. It does not run them, and it does not compute the results. Every number it later cites was produced by execution in the Numerical Domain.

Numerical Domain

Data execution & verification

All calculations happen here, by running real queries directly against your data. Results land in the fact store as structured evidence records with unique IDs, ready to be cited, never re-stated, by the AI.

When the AI drafts a conclusion citing a figure, that figure is re-queried against your data before the claim is accepted. A number can only reach your report by passing through this check.

Key Principle

The AI sees numbers but does not produce them. Every figure in your report comes from a query run directly against your data, and every figure cited in a conclusion is re-verified by running that same query again before the conclusion is accepted.

How A Report Gets Built

Execution and publication are deliberately separated. Steps run against your data and produce evidence: facts, charts, tables and journal entries. Only once that evidence is complete does the report pipeline plan the narrative, write the claims that go in each section and check them.

Step 1Plan & queryThe AI plans the investigation and authors read-only SQL against your data.

Step 2Produce evidenceQueries run; results are captured as structured facts with unique IDs, alongside charts and tables.

Step 3Plan the narrativeThe report planner decides the story, assigns evidence to each section and sets the role of every section.

Step 4Write & verify claimsEach section's claims are written against its assigned facts, then re-verified line by line through a direct query against your data.

Step 5Cross-check & assembleA global consistency review catches contradictions across sections. Only verified, consistent claims reach the final report.

What Happens To A Claim That Fails

Every numeric statement in the report is treated as a claim. The verifier re-runs the underlying query, and the claim only proceeds if the data still supports it. Claims that can be safely reworded are rewritten and re-verified. Claims with no supporting evidence are dropped, so they never reach you.

Every numeric claim in a Carys report links to the validator's findings and the facts that were re-queried against your data. Two examples: one verified, one corrected before the report shipped.

“Premium subscriptions grew 23.2% year-over-year in Q4 2024, driven primarily by the Enterprise tier.”

Validator finding

Verifier re-ran the underlying aggregation and confirmed Q4 2024 premium revenue of $4.20M versus $3.41M in Q4 2023, a 23.2% increase. Driver attribution matches the Enterprise tier breakdown.

Related facts

premium_revenue_q4_yoyQuery: analytics.subscription_revenue

Quarterly premium subscription revenue

Quarter	Revenue (USD)	Growth
Q4 2023	$3.41M	n/a
Q4 2024	$4.20M	+23.2%

premium_tier_split_q4_2024Query: analytics.subscription_revenue

Q4 2024 premium revenue by tier

Tier	Revenue (USD)	Share
Enterprise	$2.81M	66.9%
Business	$1.02M	24.3%
Pro	$0.37M	8.8%

“Customer churn fell by half compared to the previous year, with retention now above 95% across every segment.”

Validator finding

Re-query against the customer cohort table showed churn fell from 8.4% to 5.1%, a 39% reduction, not 50%. Retention exceeded 95% only in the Enterprise segment; SMB retention was 91.3%.

Corrected claim used in the report

“Customer churn fell 39% year-over-year, with retention above 95% in the Enterprise segment.”

Related facts

annual_churn_rateQuery: analytics.customer_cohorts

Annualised customer churn rate

Period	Churn Rate
FY2023	8.4%
FY2024	5.1%

retention_by_segment_fy2024Query: analytics.customer_cohorts

FY2024 retention rate by segment

Segment	Retention
Enterprise	97.1%
Mid-Market	94.6%
SMB	91.3%

Full Traceability, From Report Back To Source

Because every figure is tied to a fact and every fact is tied to the query that produced it, every line of a Carys report is auditable. You can click through a statement to see the validator finding, the related facts, the underlying tables and the exact query Carys ran. Nothing in the report is left as an unverifiable assertion.

Validator findings on the record

For every claim, the validator's verdict and reasoning are kept alongside the original draft and the corrected version, so it is always visible what the engine accepted, rewrote or dropped.

Queries you can inspect

The exact query behind a figure is available from the audit panel. Reviewers can verify the methodology independently, with no black boxes between question and number.

CarysBench: Accuracy As A Measured Metric

The Insights Engine guarantees how a single report is built. CarysBench measures whether that guarantee holds at scale. It is our internal evaluation framework: thousands of benchmark questions, run against Carys on a regular schedule, scored against defined metrics, so reliability is something we track, not something we claim.

Continuous regression detection

CarysBench runs frequently to catch quality regressions when prompts, models or execution logic change, before they reach customers.

Objective scoring

Score trends are tracked over time across factual accuracy, consistency between sections and overall answer quality.

Iterative improvement

Results feed directly into prompt updates, verifier improvements and report-flow tuning so the system gets measurably stronger over time.

Safety and quality together

We pair CarysBench tracking with runtime safeguards (sandboxing, access controls, bounded execution) so analysis is both reliable and controlled.