How do dbt tests work in practice?

dbt tests are SQL assertions that run against your warehouse tables after each pipeline load. A not_null test runs a query that counts null values in a column, if the count is above zero, the test fails. A unique test finds duplicate values. Custom tests run any SQL you write: 'revenue must be positive', 'order date cannot be in the future', 'every order must have a matching customer'. Tests run automatically at the end of each dbt run. When a test fails, the run produces an error with the failing records, and the data team is alerted before anyone queries the affected table.

What's the difference between data quality testing and monitoring?

Testing (dbt tests, schema validation) checks specific assertions about the data -- 'this column cannot be null', 'this value must exist in a reference table'. Monitoring (row count, anomaly detection, freshness) tracks statistical properties of the data over time and alerts when something deviates from the expected pattern without a pre-defined assertion. Testing catches known failure modes. Monitoring catches unknown failure modes, the pipeline that delivers 40% fewer records than usual because a source system had an issue, without failing any specific test. A complete data quality system uses both.

How do we set up freshness SLAs for our pipeline tables?

Freshness SLAs are configured by table: each table gets an expected update frequency (every 1 hour, every 24 hours, every Monday at 6am) and a tolerance window before alerting. dbt's built-in source freshness check compares the maximum timestamp in a configured column against the expected freshness. For tables without a reliable timestamp column, we build a pipeline metadata table that records the completion time of each pipeline run and monitor against that. Alerts go to Slack or PagerDuty depending on the severity of the affected table.

What does data quality monitoring cost?

Adding dbt-based testing and freshness monitoring to an existing dbt project typically runs $8,000 to $20,000. A full data quality platform including anomaly detection, schema change monitoring, lineage tracking, and a quality dashboard typically runs $25,000 to $60,000. Fixed cost agreed before development starts.

Data Quality Management Services | dbt Testing

Bad data in a dashboard doesn't just produce wrong numbers, it produces wrong decisions that nobody traces back to the data.

Data quality problems compound silently. A source system changes a field definition, a pipeline drops null records, an ETL job fails halfway through and loads partial data. The report looks complete. The numbers are wrong. Decisions get made on bad information and the root cause stays buried until something important breaks.

We build data quality monitoring infrastructure that validates data as it moves through pipelines, alerts when anomalies occur before they reach reports, and gives the data team a clear view of what is trustworthy and what isn't. Schema validation, row count monitoring, statistical anomaly detection, and data lineage for the data layer that supports business decisions.

See our work

dbt tests for null values, referential integrity, uniqueness, and custom business logic, run on every pipeline execution
Row count and statistical anomaly detection that alerts when a pipeline delivers significantly more or fewer records than expected
Schema change detection that catches source system changes before bad data reaches the warehouse
Data lineage tracking from source to report so root cause investigation takes minutes, not days

Recent outcomes

Voice AI · Research

Text-based interviews converted to automated phone calls

6× deeper insights

AI Automation · Ops

Manual invoice OCR across 40+ gas stations

20k+ txns day one

Loyalty · Retail

SuperValu & Centra loyalty platform with receipt validation

1,062 users in 4 weeks

SaaS · Logistics

Multi-carrier shipping hub for Indonesian eCommerce

2,000+ shipments yr 1

4.9 / 5 on ClutchSee all work

Recognition

Sound familiar?

How long does it take your team to find the root cause when a report shows a number that doesn't look right?
Has a pipeline ever silently loaded wrong data, no error, no alert, and the problem was discovered weeks later in a board meeting?

In short

RaftLabs builds data quality monitoring infrastructure for data engineering pipelines. dbt-based testing, row count and statistical anomaly detection, schema change monitoring, freshness SLAs, and data lineage tracking. Adding dbt testing and freshness monitoring to an existing project costs $8,000 to $20,000. A full data quality platform with anomaly detection, lineage, and a quality dashboard runs $25,000 to $60,000. Most projects deliver in 6 to 10 weeks at a fixed cost.

Trusted by

Data quality problems are expensive precisely because they are invisible until something important breaks. A pipeline that silently drops records when a source field is null delivers a report that looks complete, the row count is close enough, the numbers are plausible, and nobody questions them until a decision made on that data produces a bad outcome. By then the root cause is buried in pipeline logs from three weeks ago, and finding it takes days.

The fix is data quality infrastructure that validates data as it moves through the pipeline, not after the fact, and not only when someone notices a number looks wrong. Schema validation before the load, row count checks after the load, statistical anomaly detection on the delivered data, and freshness SLAs that alert before a stale table reaches a report. We build that infrastructure as a defined engagement scoped to your existing pipeline and warehouse setup.

Capabilities

What we build

dbt data testing

dbt schema tests written as SQL assertions that run automatically at the end of every pipeline execution, before the transformed data becomes available to analysts or dashboards. Four built-in test types cover the most common structural failures: not_null catches columns that should always have a value but contain nulls (a customer_id being null on an order row means that order is orphaned and will produce incorrect revenue-by-customer figures); unique catches duplicate primary key values that would cause double-counting in aggregations; accepted_values validates categorical fields against the agreed value set (an order status field containing 'PENDIG' instead of 'PENDING' is caught at test time, not when a finance team member notices the status filter produces wrong results); relationships enforces foreign key integrity between fact and dimension tables, catching the case where a fact table references a dimension row that no longer exists. Custom dbt tests written as SQL SELECT statements for domain-specific assertions: revenue columns must be non-negative; order timestamps cannot be more than one hour in the future; a daily transaction summary count must match the sum of hourly transaction records for the same period; a monthly rollup row must exist for every month in the date range. Test severity levels configured per assertion: some tests fail the entire run and block the mart layer from updating (primary key uniqueness violations on core entity tables); others raise a warning and allow the run to complete but alert the data team (row-level checks that catch edge cases rather than systemic failures). dbt test results stored in the warehouse alongside the test-run metadata so test pass/fail history is queryable and trendable, the data team can see whether a specific table's quality is improving or degrading over time without digging through CI logs.

Row count and freshness monitoring

Expected row count range established per table per pipeline run based on 60-90 days of historical delivery data: the system calculates the mean and standard deviation of record counts across prior runs for the same load window (daily loads compared to prior daily loads, hourly loads compared to the same hour in prior periods). Alert thresholds are set at configurable sigma values: a 2-sigma deviation triggers a warning; a 3-sigma deviation triggers a critical alert. This catches the partial load failure mode, where a source system has a timeout or a network interruption and the pipeline completes with 60% of the expected records, no pipeline error, and a warehouse table that looks current but is materially incomplete. The alert fires before the data reaches any downstream consumer. Volume trend monitoring alongside the count: a table that normally grows by 5,000 rows per day but only grew by 500 rows on Monday is a different type of problem than a source that just had a slow day, the anomaly detection distinguishes between noise and genuine delivery problems by applying seasonal adjustment (weekends typically have lower volumes than weekdays; month-end typically has higher volumes). Freshness SLA monitoring via dbt source freshness: each source table configured with an expected maximum freshness age (the pipeline for the orders table should produce a row with an updated_at timestamp within the last 3 hours; a daily cost report table should have fresh data by 8am). dbt source freshness check runs on a separate schedule from the transformation pipeline and alerts via Slack (using dbt's built-in Slack notification hook) or PagerDuty (via webhook) when a table's freshness drops below the SLA. For tables without a reliable updated_at timestamp, a pipeline metadata table records the completion time of each successful run and freshness is monitored against that timestamp instead of a data column.

Statistical anomaly detection

Statistical anomaly detection applied to the values within pipeline-delivered data, not just the row counts, but the actual metric distributions that the data team cares about. For each monitored metric (daily revenue total, new customer count, transaction success rate, average order value), a baseline is established from 90 days of historical data, with seasonal decomposition applied to separate weekly and monthly patterns from the underlying trend. The anomaly detection model (Z-score for stationary metrics; STL decomposition + residual analysis for metrics with strong seasonality; Isolation Forest for multivariate metrics where multiple correlated signals are monitored together) evaluates each new delivery against the baseline and flags deviations at configurable severity thresholds. The key distinction from row count monitoring: anomaly detection catches semantic quality problems, not just volume problems. A pipeline that delivers the correct number of rows but with revenue figures that are 40% lower than the expected range has passed every structural test but is delivering wrong data, statistical anomaly detection catches this before the CFO sees a revenue drop in the dashboard that is a data artifact rather than a business event. Tool integration: Monte Carlo Data or Anomalo for teams that want a managed anomaly detection product with a UI; Great Expectations for teams that prefer an open-source framework with custom expectation suites; custom SQL-based monitoring queries run on a schedule for teams that want full ownership of the detection logic without a third-party dependency. Alert routing: warning-severity anomalies go to a Slack data quality channel for the data team; critical-severity anomalies (more than 3-sigma deviation on business-critical metrics) go to PagerDuty with the on-call data engineer. Every alert includes the specific metric, the expected range, the delivered value, and a direct link to the anomaly detail view for rapid triage.

Schema change detection

Schema change detection running as a pre-load validation step on each pipeline execution: the current schema of each source table is compared against the expected schema (stored in a schema registry as the authoritative definition) and any discrepancy triggers an alert before the pipeline attempts to load data with the unexpected structure. Three change types detected and handled differently: new columns added to a source table (safe to add to the staging layer in the next deploy, but the data team should be aware and decide whether to surface the column downstream); columns removed or renamed in the source (dangerous, the pipeline will fail or silently load nulls if not updated to handle the removal); data type changes in existing columns (a source changing a column from VARCHAR to INTEGER may succeed or fail depending on the values, and may change sort behaviour in downstream analytics). Schema change handling for Snowflake and BigQuery: schema evolution policies configured per table specifying whether new columns are automatically added to the warehouse table on detection (permissive), or whether the pipeline is blocked and a schema migration is required (strict). Strict mode is applied to fact and mart tables where unexpected schema changes should always be reviewed before propagating; permissive mode is applied to raw landing tables where adding a new source column is always safe to capture. Schema change notification sent to the data team Slack channel with the exact diff (which columns changed, from what type to what type), the affected source table name, and links to the downstream dbt models that reference the changed table, so the data engineer triaging the alert knows immediately whether downstream models need updating without navigating the dbt DAG manually. Historical schema change log queryable in the warehouse: every detected schema change recorded with timestamp, change description, source system, and resolution action, the audit trail that answers "why did this model break six weeks ago?" in under two minutes.

Data lineage tracking

Column-level lineage tracked from the raw source table through each transformation stage to the final mart table and the downstream BI tool measures that read from it, the complete chain from a raw API event to a number in an executive dashboard. dbt generates table-level lineage automatically from the model SQL (every ref() call in a dbt model records a directed dependency edge); column-level lineage requires either dbt's column-level lineage feature (available with the dbt Cloud semantic layer) or an open-source lineage extractor (sqlglot, Marquez) that parses the SQL SELECT statements and maps input columns to output columns. The lineage graph enables two critical workflows: forward impact analysis (starting from a source column, find every downstream model and report that would be affected if that column changes, the blast radius calculation before a source schema change), and backward root cause analysis (starting from a wrong number in a dashboard, trace back through the mart model, the staging model, and the source table to find exactly where the value diverged from expectations). Integration with data catalog tools: lineage metadata exported to DataHub, Atlan, or Collibra where the organisation already has a data governance platform, so lineage is visible alongside data definitions and ownership in a single interface. For teams without a data catalog, the dbt-generated documentation site (published as a static web app accessible to the data team) provides the lineage graph as an interactive DAG visualisation with each node linking to the model's test results, column descriptions, and run history. Cross-project lineage for organisations with multiple dbt projects: the lineage graph spans project boundaries by connecting dbt project artifacts via the dbt Cloud discovery API or by exporting manifest.json files from each project and merging them in a central lineage store.

Data quality dashboards and reporting

Centralised data quality dashboard built in the BI tool the data team already uses (Metabase, Looker, Tableau, or Power BI) pulling from a quality metrics table in the warehouse that aggregates test results, anomaly alert history, freshness SLA compliance, and schema change events into a single queryable dataset. The dashboard shows: pipeline run status and test pass rates for the last 24 hours (the view the data team checks every morning before stakeholders start their day); freshness compliance by table with a traffic-light status showing which tables are current and which are stale; anomaly alert volume by day and by table for the trailing 30 days; and test pass rate trend by model over the past 90 days to distinguish temporarily failing tests from models that have degraded systematically. Ownership layer: each monitored table and model tagged with a data owner (the person responsible for resolving quality issues on that table), so alerts can be routed to the correct person rather than to a generic data team channel. SLA compliance reporting: for each table with a freshness SLA, the percentage of time it was within SLA over the past 30 days, the metric that makes data quality visible to the business and gives the data team a defensible record of pipeline reliability. Weekly data quality summary email generated automatically from the dashboard data and delivered to the data team lead and any business stakeholders who have opted in: test pass rate, anomaly count, SLA compliance, and top-3 recurring issues with their status. This weekly report replaces the ad-hoc "is the data trustworthy?" conversation with evidence, and over time shifts the conversation from reactive firefighting to proactive quality investment based on which tables and models have the highest error rates.

Have a data quality problem?

Tell us your current pipeline setup, what broke last time bad data reached a report, and how long it took to find the root cause. We'll scope the monitoring infrastructure and give you a fixed cost.

Talk about your data quality project

Data Engineering Services, full data engineering capability overview
ETL Pipeline Development, pipelines that data quality monitoring validates
Data Warehouse Development, the warehouse layer where data quality is enforced
Real-Time Data Pipelines, streaming pipelines that also require quality monitoring

Predictive Analytics, ML models that depend on clean, reliable training data
Business Intelligence, dashboards that are only trustworthy if the underlying data is validated
Compliance Automation, regulatory reporting that requires auditable, verified data

How it works

From first call to shipped product: how every build runs.

The same four steps on every engagement. A 6-week voice AI deployment runs the same shape as a 16-week enterprise build.

Week 1
01
Discover
We spend the first week understanding the problem, not presenting a solution. Discovery session, interviews with the people closest to the work, workflow mapping, and a technical audit of what you already have. You leave knowing exactly what's broken and why previous attempts didn't fix it.
Weeks 2–3
02
Design
Low-fidelity wireframes before any code is written. You see the product before we build it. Scope, timeline, and fixed price locked at this stage. No surprises after work starts.
Weeks 4–12
03
Build
Bi-weekly agile sprints. Weekly progress calls. Direct access to the team and project management tools. Working software at the end of every sprint. Not a big-bang delivery at the finish line.
Weeks 12–16
04
Ship
Production deployment, QA sign-off, load testing, and team handover. You own the full codebase from day one. We stay on for post-launch iteration and support. Nothing gets thrown over the wall.

Frequently asked questions

: dbt tests are SQL assertions that run against your warehouse tables after each pipeline load. A not_null test runs a query that counts null values in a column, if the count is above zero, the test fails. A unique test finds duplicate values. Custom tests run any SQL you write: 'revenue must be positive', 'order date cannot be in the future', 'every order must have a matching customer'. Tests run automatically at the end of each dbt run. When a test fails, the run produces an error with the failing records, and the data team is alerted before anyone queries the affected table.
: Testing (dbt tests, schema validation) checks specific assertions about the data -- 'this column cannot be null', 'this value must exist in a reference table'. Monitoring (row count, anomaly detection, freshness) tracks statistical properties of the data over time and alerts when something deviates from the expected pattern without a pre-defined assertion. Testing catches known failure modes. Monitoring catches unknown failure modes, the pipeline that delivers 40% fewer records than usual because a source system had an issue, without failing any specific test. A complete data quality system uses both.
: Freshness SLAs are configured by table: each table gets an expected update frequency (every 1 hour, every 24 hours, every Monday at 6am) and a tolerance window before alerting. dbt's built-in source freshness check compares the maximum timestamp in a configured column against the expected freshness. For tables without a reliable timestamp column, we build a pipeline metadata table that records the completion time of each pipeline run and monitor against that. Alerts go to Slack or PagerDuty depending on the severity of the affected table.
: Adding dbt-based testing and freshness monitoring to an existing dbt project typically runs $8,000 to $20,000. A full data quality platform including anomaly detection, schema change monitoring, lineage tracking, and a quality dashboard typically runs $25,000 to $60,000. Fixed cost agreed before development starts.

Work with us

Tell us what you need. We'll tell you what it would take.

We scope Data Quality Management in 30 minutes. You walk away with a clear cost, timeline, and approach. No commitment required.

Scope and cost agreed before work starts. No surprises. No obligation.
Working prototype within 3 weeks of kickoff.
Pay by milestone. You see progress before each invoice.
60-day post-launch warranty. Bug fixes, UI tweaks, and deployment support. No retainer.
All conversations are NDA-protected.

Bad data in a dashboard doesn't just produce wrong numbers, it produces wrong decisions that nobody traces back to the data.

Sound familiar?

What we build

dbt data testing

Row count and freshness monitoring

Statistical anomaly detection

Schema change detection

Data lineage tracking

Data quality dashboards and reporting

Have a data quality problem?

Related data engineering services

Related services

From first call to shipped product: how every build runs.

Discover

Design

Build

Ship

Frequently asked questions

Tell us what you need. We'll tell you what it would take.