How to Build a Messaging App Like WhatsApp: The Engineering Reality

RaftLabs builds messaging MVPs in 10–16 weeks for $60K–$130K using WebSockets, encryption, and push notifications. Hard problems: offline delivery guarantees, key management, push reliability across iOS and Android. Build for compliance-heavy contexts, not to compete with WhatsApp.

Key Takeaways

  • Real-time messaging requires WebSockets, not REST APIs. Each user maintains a persistent WebSocket connection. Redis Pub/Sub routes messages across server instances when users connect to different servers.

  • Message delivery guarantees for offline users are harder to build than real-time messaging. You need a delivery queue, read receipts, and push notifications working together reliably.

  • End-to-end encryption is the right choice for privacy-sensitive and regulated use cases. For enterprise tools where IT needs admin access, server-side encryption with controlled visibility is often the better fit.

  • Media handling (photos, videos, voice notes) generates real and growing storage and bandwidth costs. Plan your CDN delivery and compression strategy before launch, not after.

  • Building custom messaging makes sense when you have compliance requirements, deep platform integration needs, or a specific context where generic apps fail. Otherwise, Sendbird or Stream will get you there in days.

You are not building the next WhatsApp. Nobody is — that ship sailed in 2014 when Facebook paid $19 billion for it. What you might be building is a messaging product for a specific context where WhatsApp does not fit. A healthcare platform where HIPAA compliance is mandatory. An enterprise tool where IT needs admin controls. A logistics product where field team communication is broken.

The engineering fundamentals are the same. The product decisions are completely different. This guide covers both.

According to Statista, the global enterprise messaging market is projected to reach $115 billion by 2030, growing at 14.5% annually. Most of that growth will not come from new consumer apps. It will come from embedded messaging in industry-specific platforms.

What Makes Messaging Hard

Consumer messaging apps look deceptively simple. Send a message, see it appear on the other end. The complexity is in what happens in between.

Delivery guarantees: Messages must be delivered reliably, even when recipients are offline, switch networks, or close the app. This requires queuing, acknowledgment protocols, and push notification backup.

Real-time at scale: Persistent WebSocket connections to thousands of simultaneous users require careful server architecture. Each server instance only knows about connections to itself, so you need a message broker (Redis Pub/Sub or a dedicated system) to route messages across server instances.

Ordering: Messages must arrive in the correct order. In distributed systems under load, this is not automatic.

Offline sync: When a user comes back online after being offline for hours, they need to receive all messages they missed, in order, without duplicates.

These are solved problems. Libraries, services, and proven patterns exist for all of them. But you need to make the right architectural choices upfront. These are hard to retrofit.

Core Features for a Messaging MVP

1-on-1 Messaging

Text messages with delivery status — sent, delivered, read. Reactions and replies are v2. Keep v1 simple: message sent, message delivered, message seen.

Group Messaging

Groups with participant management (add/remove members), group name, and group admin controls. Decide your group size limit early. Groups of 10 behave very differently from groups of 1,000 in terms of server load and message ordering.

Push Notifications

The most important background feature. When a user is not in the app, they need to receive a push notification for new messages. iOS and Android handle push notifications differently. Firebase Cloud Messaging (FCM) handles both and is the standard choice for v1.

Media Sharing

Photos, documents, and voice notes are expected in any modern messaging app. Each introduces complexity: file size limits, CDN delivery, thumbnail generation, audio playback controls. Plan your media storage and delivery architecture before launch. Do not design around it afterward.

Contact Discovery

How do users find each other? By phone number (WhatsApp's approach), by username, by email, or through an invitation system? This decision shapes your entire onboarding flow and user network. The answer depends on your product context. An enterprise tool with known users is different from a community platform with open discovery.

What to Skip in v1

  • Voice and video calls (use a third-party service instead — see below)

  • Message forwarding and broadcast lists

  • Status and stories features

  • Desktop and web apps (mobile-first)

  • Rich link previews (link to third-party metadata services instead)

  • End-to-end encryption (unless regulated — complex key management, add in v2 if not required from day one)

RaftLabs has seen this pattern across 12+ messaging builds: teams that try to include all of these in v1 ship nothing usable in the first six months.

The Encryption Decision

If you are building for a regulated industry — healthcare, legal, financial services — end-to-end encryption is not optional. You need it from day one to meet compliance requirements.

If you are building for enterprise teams with IT admin requirements, you may need the opposite: message archiving, admin visibility, and compliance export. These are architecturally incompatible with true end-to-end encryption.

Make this decision before you start. It determines your entire key management and storage architecture.

For most consumer-facing apps: implement the Signal Protocol. It is open source, well-documented, and available as a library.

For enterprise with admin controls: store messages server-side, encrypted at rest, with admin access maintained through key management.

The Architecture Decisions

WebSockets vs. Polling

Real-time messaging requires WebSockets. Polling — asking the server every few seconds "any new messages?" — creates latency and server load that do not scale. WebSockets maintain a persistent connection and deliver messages the instant they arrive.

At 10,000 concurrent users, polling generates roughly 600,000 HTTP requests per minute. WebSockets replace that with 10,000 persistent connections. The infrastructure cost difference is significant.

Message Storage

Messages need to be stored server-side for delivery to offline users, cross-device sync, and message history. PostgreSQL handles this straightforwardly. The question is how long you retain messages — and for end-to-end encrypted apps, whether the server can read them at all.

Media Storage

Use S3 or equivalent object storage for media files, delivered via a CDN (CloudFront, Cloudflare). Generate thumbnails server-side on upload. Set file size limits early. 10MB per file is a common starting point.

Horizontal Scaling

Your chat servers need to scale horizontally. Redis Pub/Sub routes messages between server instances. When user A (connected to server 1) sends a message to user B (connected to server 3), Redis makes sure server 3 delivers it. Without this layer, horizontal scaling breaks message delivery.

Tech Stack

LayerChoice
Mobile appsReact Native or Flutter
BackendNode.js with Socket.io
DatabasePostgreSQL
Real-time routingRedis Pub/Sub
Push notificationsFirebase Cloud Messaging
Media storageAWS S3 + CloudFront CDN
EncryptionSignal Protocol (libsignal)
HostingAWS or GCP

Cost to Build

ScopeTimelineCost
MVP (1-on-1, groups, media)10–16 weeks$60K–$130K
With end-to-end encryption14–20 weeks$100K–$180K
With voice and video callingAdd 6–10 weeksAdd $50K–$100K

Monthly operating costs scale with message volume and media storage. At small scale — under 10,000 active users — $2K–$5K per month is typical. Media-heavy usage (video sharing) can push that to $10K–$15K at the same user volume.

When Building Your Own Makes Sense

Building a custom messaging layer makes sense when:

  • You have compliance requirements (HIPAA, FINRA, legal holds)

  • You need deep integration with your existing platform (embedded chat in your SaaS product)

  • You are building for a specific context where generic apps create friction (healthcare coordination, logistics team communication)

It does not make sense when you just want chat in your app. In that case, Sendbird, Stream, or Twilio Conversations will get you there in days, not months, at a predictable monthly cost. The break-even point between a third-party service and a custom build is roughly 50,000 monthly active users for most pricing models.

Compliance-Specific Considerations

Healthcare (HIPAA)

Any messaging that includes Protected Health Information (PHI) falls under HIPAA. That covers patient names, appointment details, diagnosis references, and medication mentions. HIPAA does not ban messaging. It requires that your architecture, encryption, access controls, and audit logging meet specific standards.

A non-compliant messaging feature in a healthcare product is not just a legal risk. It is a reason enterprise health systems will not sign a contract.

Enterprise with IT Admin Requirements

Enterprise IT teams require message archiving, eDiscovery export, user provisioning via SCIM or SAML, and the ability to remotely wipe messages from a departed employee's device. End-to-end encryption in the Signal Protocol sense is incompatible with these requirements.

The architectural choice here is: server-side encryption at rest with key management that allows authorized admin access, combined with full audit logging of access to that key material.

How RaftLabs Approaches This

The first question RaftLabs asks on every messaging build: what is the compliance context? Healthcare, legal, enterprise with IT requirements — each changes the architecture significantly before a line of product code is written.

The second question: what is the network model? Phone-number discovery, username-based, organization/tenant-based? This shapes onboarding, user management, and scalability planning.

RaftLabs builds messaging infrastructure as part of larger platforms — field service apps, healthcare coordination tools, marketplace operator communication, internal enterprise tools. Standalone consumer messaging apps competing with WhatsApp are not a problem RaftLabs solves.

If you are building messaging as a feature of a larger product where generic tools do not fit, we should talk.

Frequently asked questions

A messaging MVP with 1-on-1 and group chat, push notifications, and basic media sharing takes 10–16 weeks with a team of 3–5 developers. Adding voice and video calls extends the timeline by 6–10 weeks. Adding end-to-end encryption and compliance features such as message archiving and admin controls adds 4–8 weeks on top of that.
MVP development runs $60K–$130K. Monthly operating costs range from $2K–$15K for a small active user base, scaling with message volume and media storage. Photos and videos are the variable costs that grow fastest with user activity. Voice and video calling infrastructure adds significant ongoing cost and is best handled by a third-party service in v1.
WebSockets maintain a persistent connection between client and server. When a user sends a message, it pushes through the WebSocket to the server, which immediately pushes it to the recipient's open connection. If the recipient is offline, the message queues and delivers on reconnect, with a push notification triggered in parallel. Servers need Redis Pub/Sub to route messages correctly when users connect to different server instances.
The Signal Protocol is an open-source end-to-end encryption protocol used by WhatsApp and Signal. It provides forward secrecy — each message uses a different encryption key — and deniability. Build it in v1 if you are building for healthcare, legal, financial, or any privacy-regulated context. For internal enterprise tools where IT needs admin access, server-side encryption with controlled admin visibility is often a better architectural fit. These two approaches are fundamentally incompatible, so the decision must be made before writing a line of code.
Almost certainly not. WebRTC is complex to build reliably across mobile platforms and network conditions. It adds significant cost and timeline to a v1 build. Use Daily.co, Agora, or Twilio Video for v1. Build native calling only when you have proven demand and the usage volume to justify the engineering investment.

Ask an AI

Get an instant summary of this post from your preferred AI assistant.