Talk to us about your CDP project.
Tell us your data sources, your segmentation needs, and what your current stack can't do. We'll scope a platform and give you a fixed cost.
Customer data split across CRM, email platform, analytics, and ecommerce with no single customer record -- marketing sending campaigns to churned customers because the systems don't sync?
Segment audiences taking days of data team work for every campaign because there's no self-service segmentation tool for the marketing team?
One customer record assembled from your CRM, email platform, ecommerce data, and web analytics -- so marketing segments audiences from real data and activates to the right channels without a data team request for each campaign.
We build custom CDPs for MarTech companies building data infrastructure as a product and for businesses that need a unified customer record their marketing tools can actually use.
Identity resolution and unified customer profiles
Real-time event ingestion from web, mobile, and backend
Self-service audience segmentation and activation
Data warehouse and downstream tool sync
A customer data platform (CDP) resolves customer identity across data sources, builds a unified profile from behavioural, transactional, and CRM data, and makes that profile available to marketing tools for segmentation and activation. RaftLabs builds custom CDPs for MarTech companies that need a data product and for businesses that have outgrown point solutions. Most customer data platform builds deliver in 14 to 18 weeks at a fixed cost.
When customer data lives in separate systems with no shared identifier, every marketing operation that requires cross-system data becomes a manual process. Churn suppression requires an export from the ecommerce system, a lookup against the email platform, and a manual list upload. Personalisation requires a data analyst to join tables before the campaign team can brief the creative. By the time the data is ready, the moment for the campaign has passed.
A customer data platform makes the unified customer record a live operational asset, not a report that gets built on request. Identity is resolved continuously as new events arrive. The unified profile is updated in real time. Segment membership changes as behaviour changes, not when someone runs a query.
We build CDPs for two audiences: MarTech companies that need a data product at the centre of their platform, and businesses with enough data complexity that assembling it manually is blocking marketing operations.
Event streaming architecture is built on Segment, RudderStack, or Snowplow as the event collection layer -- all three produce a consistent event schema across web, mobile, and server-side sources, and all three support identity stitching across anonymous and known users. Snowplow is preferred when you need full data ownership and a self-hosted pipeline without per-event pricing. RudderStack is preferred when you want warehouse-native activation and your data team works primarily in SQL. Segment is preferred for teams that want a large destination catalogue and a fast integration path.
Web events are collected via a lightweight JavaScript SDK that tracks page views, clicks, form submissions, and custom events. Mobile events are collected via iOS and Android SDKs. Server-side events from your backend (order completions, subscription changes, support ticket creation) are sent via the HTTP tracking API or server-side SDK.
Identity resolution uses deterministic matching on known identifiers: the same email address hash (SHA-256), phone number hash, or customer ID found on two records is enough to merge them. Probabilistic matching on device fingerprints and behavioural patterns links anonymous sessions that are likely the same person but have not yet shared a known identifier. Cross-device stitching merges the anonymous pre-login event history with the known customer record at the point of login -- using the login event's user ID as the bridge identifier. The identity graph is maintained as a real-time Kafka topic so profile merges propagate to the unified profile within seconds of the bridging event arriving, not at the next nightly batch run.
Single profile record for each resolved customer aggregates behavioural events from your web and mobile tracking, transactional history from your ecommerce platform or order management system, CRM attributes from Salesforce or HubSpot (synced bidirectionally), and computed properties calculated from the event and transaction history. The profile schema is designed around your data model rather than a fixed schema that requires your data to conform to the CDP's structure -- a custom schema means your product-specific attributes (subscription plan, cohort, referral source, service tier) are first-class profile attributes, not shoehorned into generic fields.
Computed attributes are updated in real time via a stream processing layer (Kafka Streams or Apache Flink) as new events arrive: total lifetime value (LTV) recalculated on each order event, days since last purchase updated on each order, product category affinity scores updated as browse and purchase events accumulate, and engagement score updated as email open and click events arrive. These computed attributes are available in the profile immediately after each event, so an audience segment that requires "customers with LTV over $500 who haven't purchased in 90 days" reflects the current state, not yesterday's batch.
Customer lifetime value prediction uses an XGBoost model trained on RFM (Recency, Frequency, Monetary) features plus additional behavioural signals. The model generates a predicted 12-month LTV score per profile, updated on a scheduled basis, which feeds into high-value audience segments for acquisition lookalike modelling and retention campaigns. The full event history is retained and queryable for analytical use cases, with a partition strategy that keeps queries against large event tables performant. Profile API supports real-time lookup by any downstream system -- personalisation engine, customer support tool, or recommendation service -- that needs to read current customer attributes without querying the data warehouse directly.
Drag-and-drop segment builder lets marketing teams define audiences from any profile attribute, behavioural event, or computed property without writing SQL or filing a data team request. Conditions cover event frequency (performed a purchase event at least 3 times), event recency (last session within 14 days), attribute ranges (LTV between $100 and $500), computed property thresholds (engagement score above 0.7), and Boolean combinations of all of the above.
First-party data collection for segmentation is structured to support cookieless targeting: identity is tied to authenticated events (email, phone, customer ID) and first-party device identifiers rather than third-party cookies, making the audience infrastructure GDPR and CCPA compliant by design. When building lookalike audiences for paid media, the segment is activated to Meta or Google via hashed email and phone -- no cookies or device IDs cross the platform boundary.
Real-time segment membership updates for event-triggered conditions as the matching events arrive: a customer who completes a purchase exits the "no purchase in 30 days" win-back segment and enters the "recent purchaser -- drive second purchase" segment within seconds. Scheduled refresh (hourly or nightly cron) is available for computationally expensive segments that don't need real-time updates. Segment size estimates are calculated and displayed before the segment is saved so the marketing team knows whether they have built an audience of 50,000 or 500 before designing a campaign around it. Segment overlap analysis shows how much two segments share, preventing the same contacts from receiving conflicting messages from two different campaigns.
Audience segment activation to paid media platforms uses server-side conversion APIs rather than pixel-based audience uploads, which addresses both data accuracy (server-side events aren't blocked by ad blockers or browser privacy restrictions) and compliance (no third-party cookies, hashed identifiers only). Meta segments are synced via the Meta Conversions API and the Custom Audiences API. Google segments use the Google Ads Customer Match API. LinkedIn Matched Audiences and TikTok Custom Audiences are also supported. Match rates on first-party email and phone hashes typically run 50 to 80%, significantly higher than retargeting pixels that depend on third-party cookie availability.
CRM bidirectional sync with Salesforce and HubSpot keeps both systems current: CDP segment membership flows into CRM as contact properties for sales outreach lists, and CRM deal stage and lifecycle changes flow back into the CDP as profile attributes for marketing suppression (suppress paying customers from acquisition campaigns) and re-engagement (trigger a win-back campaign when a deal goes closed-lost). Email platform sync to Klaviyo, Mailchimp, or Braze creates and updates audience lists in real time as segment membership changes, so campaign lists are always current without a scheduled export.
Data residency controls for EU/US data split ensure that profile data for EU residents is stored in EU-region infrastructure (AWS eu-west-1 or equivalent) and is not transferred to US-region systems in a way that violates GDPR Chapter V transfer requirements. US resident data can be processed in US-region infrastructure without those constraints. The residency configuration is enforced at the profile storage and event pipeline level, not just at the UI access layer.
The event pipeline uses Kafka as the backbone for event streaming between the ingestion layer and the profile processing layer. Each event passes through a schema validation step using Apache Avro or JSON Schema registry before it reaches the profile update consumers. Malformed events -- missing required fields, incorrect data types, out-of-range values -- are routed to a dead-letter queue with the validation failure reason attached, so the engineering team can review and replay them after fixing the source. Events that fail schema validation never reach the unified profile store.
Deduplication uses event IDs with an idempotency window to catch duplicate events from retry logic, client-side double-fires on slow connections, and SDK retransmissions after a network interruption. An event with a previously seen event ID within the deduplication window is discarded without being processed again.
PII detection and masking runs as a stream processing step on the Kafka pipeline. Fields configured as PII (email, phone, name, IP address) are either hashed before storage using SHA-256 (for use as identity resolution keys) or masked in the event log while the raw value is stored only in the identity graph under access control. Data lineage tracking records the source event and timestamp for each profile attribute update so the provenance of any computed value is auditable -- useful for debugging incorrect segment membership and for demonstrating GDPR data minimisation compliance.
Schema evolution is managed through the Avro or JSON Schema registry with forward and backward compatibility enforcement. Adding a new event type or a new optional property does not require a code change or downtime in the consumers. Breaking schema changes are flagged by the registry before they can be pushed, preventing silent data corruption in downstream systems.
Segment size trends track audience growth and decay over time, making it visible when a win-back segment is growing (more customers lapsing) or when a post-purchase segment is shrinking (fewer new buyers entering). Profile coverage metrics show what percentage of resolved profiles have data from each source -- email platform, ecommerce, CRM, web analytics -- so gaps in the identity graph are visible. A profile with no web behaviour data suggests the SDK is missing from specific pages. A profile with no CRM data suggests the Salesforce sync has a mapping gap. These metrics surface data quality problems before they cause incorrect segment membership.
Activation performance by destination reports match rates for ad platform uploads (the percentage of email/phone hashes that matched a known user in Meta or Google), email deliverability by segment (open rate, click rate, bounce rate), and CRM sync success rates. Low match rates on a Meta Custom Audience upload signal that the email addresses in that segment are old or inactive. This data informs how segments are constructed for activation rather than letting poor match rates silently reduce paid media effectiveness.
LTV distribution by segment identifies which audience definitions correlate with high-value customers -- comparing the median and 90th-percentile LTV of a "high-engagement, no purchase in 60 days" segment against a "low-engagement, no purchase in 60 days" segment tells the marketing team which win-back audience is worth more aggressive spend. CLTV prediction scores (generated by the XGBoost RFM model) are available as a dimension in all segment-level reports. Reporting is designed for the marketing team to understand their data and make decisions from it, not for data analysts to diagnose infrastructure health.
Frequently asked questions
A CDP collects customer data from multiple sources (web analytics, mobile apps, ecommerce, CRM, email platform, support tools), resolves identity across those sources using deterministic and probabilistic matching, and makes a unified customer profile available to marketing and analytics tools in real time. The distinction from a data warehouse is that the CDP is operational -- it powers live segmentation and activation -- rather than analytical. The distinction from a CRM is that the CDP ingests behavioural event data at scale and resolves anonymous-to-known identity, which CRMs were not designed for.
You need a CDP when the absence of a unified customer record is causing operational problems. The most common signals: campaigns being sent to churned customers because the email platform and ecommerce system don't sync in real time; personalisation requiring a data analyst to join tables before the campaign team can brief the creative; churn suppression lists that are a week out of date because they are built from a weekly export. If those problems are happening, the CDP addresses the root cause.
If your marketing team can self-service segment and activate audiences from your current stack without a data team request, a CDP is not your immediate priority. If they cannot -- if segmentation requires SQL, if activation requires a manual export-and-upload cycle, if the same customer appears in multiple systems under different records -- a CDP likely is. The build-vs-buy question also matters: Segment and RudderStack are SaaS CDPs that can be deployed quickly but impose per-event pricing and limited schema flexibility. A custom CDP is the right choice when your data model is complex, your event volume makes SaaS CDP pricing expensive, or you need full data ownership without a third-party processor holding your customer data.
Identity resolution combines deterministic and probabilistic matching. Deterministic matching links records that share a known identifier: the same SHA-256 hash of an email address, phone number, or customer ID found on two anonymous or known user records is enough to merge them. This is the primary resolution mechanism because it is high-confidence -- two records sharing a hashed email are almost certainly the same person.
When an anonymous user on a new device logs in, their pre-login event history on that device is stitched to their known profile using the login event's user ID as the bridge identifier. The pre-login anonymous session events (pages viewed, products browsed, content consumed) are merged into the known profile's event history, so the profile reflects the complete journey from first anonymous touch to the logged-in session without a data gap.
Probabilistic matching uses device fingerprints (browser fingerprint, IP address, screen resolution, user agent string) and behavioural pattern similarity to link anonymous sessions that are likely the same person but have not yet shared a known identifier. Probabilistic matches are lower-confidence than deterministic matches and are stored with a confidence score rather than being treated as definitive merges -- they inform lookalike modelling and session analytics but do not overwrite the identity graph until confirmed by a deterministic signal.
The identity graph is updated continuously via the Kafka event pipeline. A customer who uses three devices and two email addresses ends up as one resolved profile with the full event history from all sessions. Profile merges propagate to all downstream systems -- the segment membership engine, the profile API, and the activation destinations -- within seconds of the bridging event arriving.
Standard activation destinations for paid media use server-side APIs rather than pixel-based uploads. Meta segments are synced via the Meta Conversions API (server-side events) and the Custom Audiences API using SHA-256 hashed email and phone. Google segments use the Google Ads Customer Match API with hashed email, phone, and name. TikTok Custom Audiences, LinkedIn Matched Audiences, and Snapchat Customer Lists are activated using the same hashed identifier approach via their respective server-side APIs. Server-side activation avoids the match rate degradation caused by ad blocker and browser privacy restrictions on pixel-based tracking.
For email platforms, we build real-time sync to Klaviyo (via the Klaviyo Profiles and Lists API), Mailchimp (via the Audiences API), and Braze (via the Users Track API and Segment Membership API). Segment membership changes in the CDP are reflected in the email platform within seconds, so triggered campaign eligibility is always based on current behaviour.
For CRM, we build bidirectional sync with Salesforce (via the Salesforce Bulk API for large-scale updates and the REST API for real-time updates) and HubSpot (via the Contacts API and Lists API). CRM sync enables sales suppression (remove a contact from marketing nurture when a sales deal is created) and re-engagement (trigger a win-back campaign when a Salesforce deal goes closed-lost). For custom destinations, a webhook delivery mechanism sends segment membership add and remove events to any API endpoint. Activation logs record the timestamp and result of every activation event per destination for audit and troubleshooting.
A CRM (Salesforce, HubSpot) is a sales workflow tool that tracks customer relationships, deal stages, and sales team activities. It stores records that a human sales rep maintains: contact details, account history, pipeline stages. It was not designed to ingest millions of behavioural events per day, resolve anonymous-to-known identity, or make real-time segment membership decisions. Attempting to use a CRM as a CDP typically results in a bloated contact database with incomplete data and segmentation that runs on stale weekly exports.
A data warehouse (Snowflake, BigQuery, Redshift) stores historical data from all your systems for analytical SQL queries. It is excellent for answering retrospective questions: what was our revenue by customer segment last quarter? But it is not designed for real-time profile updates, sub-second segment membership evaluation, or pushing audiences to Meta at the moment a customer's behaviour changes. Query latency and the separation between storage and compute make data warehouses unsuitable as the operational layer for live marketing activation.
A CDP sits between these two layers. It ingests events from web, mobile, and server-side sources in real time via a Kafka-backed streaming pipeline. It resolves identity continuously using the identity graph. It maintains a queryable unified profile per customer that is always current. And it activates segments to downstream tools via server-side APIs rather than requiring manual exports. The CDP does not replace the CRM or data warehouse -- it reads from the CRM (for contact attributes and deal stage) and writes to the data warehouse (for historical analysis). It fills the operational gap between the two: the live, resolved, actionable customer record that neither the CRM nor the warehouse provides.
What clients say
Three-year average engagement. Founders and operators describing the work in their own words. No marketing varnish.

RaftLabs was outstanding at addressing our complex platform needs, delivering a stable, high-performance loyalty application that has been genuinely loved by the customers.
01 / 02
Tell us your data sources, your segmentation needs, and what your current stack can't do. We'll scope a platform and give you a fixed cost.