Any analytics system that lets its AI generate, infer or re-state numbers in prose without separately verifying those figures against the source data is using the wrong tool for the job.
How Do I Know Carys Is Accurate?
Carys is built around the Insights Engine, a verification pipeline that keeps the AI away from the arithmetic, re-queries every cited number against your data and refuses to ship claims it cannot back up.
The Problem With AI And Numbers
Large language models are built for language. When asked to calculate, they do not run an equation. They predict what a plausible answer looks like. For multi-step, data-dependent arithmetic across millions of rows, such as aggregations, percentage breakdowns and year-on-year comparisons with shifting denominators, that prediction is not reliable. The figures look authoritative, there is no error message when they are wrong and they simply appear in the report.
The Insights Engine: Separate The AI From The Arithmetic
Carys keeps a hard boundary between what the AI is allowed to do and what the data engine is allowed to do. The AI plans the investigation and writes the queries. It never computes a number itself. Every figure that appears in a report was produced by a query run directly against your data, stored as a structured fact with a unique ID and re-verified before it was allowed into the final write-up.
The AI understands your question, plans the investigation and writes the queries and code that will be executed against your data. It reads the results and drafts the analytical conclusions for your report.
The AI authors the queries. It does not run them, and it does not compute the results. Every number it later cites was produced by execution in the Numerical Domain.
All calculations happen here, by running real queries directly against your data. Results land in the fact store as structured evidence records with unique IDs, ready to be cited, never re-stated, by the AI.
When the AI drafts a conclusion citing a figure, that figure is re-queried against your data before the claim is accepted. A number can only reach your report by passing through this check.
The AI sees numbers but does not produce them. Every figure in your report comes from a query run directly against your data, and every figure cited in a conclusion is re-verified by running that same query again before the conclusion is accepted.
How A Report Gets Built
Execution and publication are deliberately separated. Steps run against your data and produce evidence: facts, charts, tables and journal entries. Only once that evidence is complete does the report pipeline plan the narrative, write the claims that go in each section and check them.
What Happens To A Claim That Fails
Every numeric statement in the report is treated as a claim. The verifier re-runs the underlying query, and the claim only proceeds if the data still supports it. Claims that can be safely reworded are rewritten and re-verified. Claims with no supporting evidence are dropped, so they never reach you.
“Premium subscriptions grew 23.2% year-over-year in Q4 2024, driven primarily by the Enterprise tier.”
Verifier re-ran the underlying aggregation and confirmed Q4 2024 premium revenue of $4.20M versus $3.41M in Q4 2023, a 23.2% increase. Driver attribution matches the Enterprise tier breakdown.
| Quarter | Revenue (USD) | Growth |
|---|---|---|
| Q4 2023 | $3.41M | n/a |
| Q4 2024 | $4.20M | +23.2% |
| Tier | Revenue (USD) | Share |
|---|---|---|
| Enterprise | $2.81M | 66.9% |
| Business | $1.02M | 24.3% |
| Pro | $0.37M | 8.8% |
“Customer churn fell by half compared to the previous year, with retention now above 95% across every segment.”
Re-query against the customer cohort table showed churn fell from 8.4% to 5.1%, a 39% reduction, not 50%. Retention exceeded 95% only in the Enterprise segment; SMB retention was 91.3%.
“Customer churn fell 39% year-over-year, with retention above 95% in the Enterprise segment.”
| Period | Churn Rate |
|---|---|
| FY2023 | 8.4% |
| FY2024 | 5.1% |
| Segment | Retention |
|---|---|
| Enterprise | 97.1% |
| Mid-Market | 94.6% |
| SMB | 91.3% |
Full Traceability, From Report Back To Source
Because every figure is tied to a fact and every fact is tied to the query that produced it, every line of a Carys report is auditable. You can click through a statement to see the validator finding, the related facts, the underlying tables and the exact query Carys ran. Nothing in the report is left as an unverifiable assertion.
Validator findings on the record
For every claim, the validator's verdict and reasoning are kept alongside the original draft and the corrected version, so it is always visible what the engine accepted, rewrote or dropped.
Queries you can inspect
The exact query behind a figure is available from the audit panel. Reviewers can verify the methodology independently, with no black boxes between question and number.
CarysBench: Accuracy As A Measured Metric
The Insights Engine guarantees how a single report is built. CarysBench measures whether that guarantee holds at scale. It is our internal evaluation framework: thousands of benchmark questions, run against Carys on a regular schedule, scored against defined metrics, so reliability is something we track, not something we claim.
Continuous regression detection
CarysBench runs frequently to catch quality regressions when prompts, models or execution logic change, before they reach customers.
Objective scoring
Score trends are tracked over time across factual accuracy, consistency between sections and overall answer quality.
Iterative improvement
Results feed directly into prompt updates, verifier improvements and report-flow tuning so the system gets measurably stronger over time.
Safety and quality together
We pair CarysBench tracking with runtime safeguards (sandboxing, access controls, bounded execution) so analysis is both reliable and controlled.