🏛️

WhatsApp Architecture Case Study

Arhitectură de Sistem Intermediar 2 min citire 400 cuvinte
Case Study System Design

WhatsApp Architecture Case Study

How WhatsApp handles 100+ billion messages daily with remarkable efficiency.

Architecture Overview

WhatsApp is known for its incredibly efficient architecture, handling massive scale with a relatively small engineering team.

┌─────────────────────────────────────────────────────────────┐
│                      Mobile Clients                          │
│              (iOS, Android, Web, Desktop)                    │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                   Load Balancers                             │
│               (Geographic Distribution)                       │
└─────────────────────────┬───────────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────────┐
│                         │                                    │
▼                         ▼                                    ▼
┌───────────────┐  ┌───────────────┐  ┌───────────────────────┐
│ Connection    │  │   Message     │  │    Media Storage      │
│ Servers       │  │   Routing     │  │    (S3/CDN)           │
│ (XMPP/Noise)  │  │   Servers     │  │                       │
└───────────────┘  └───────────────┘  └───────────────────────┘

Core Technology Stack

Erlang/OTP

WhatsApp’s backend is primarily built on Erlang, chosen for:

  1. Concurrency: Lightweight processes (millions per server)
  2. Fault Tolerance: “Let it crash” philosophy
  3. Hot Code Swapping: Update without downtime
  4. Distributed Computing: Built-in distribution
%% Example: Erlang process handling
-module(message_handler).
-export([start/0, handle/1]).

start() ->
    spawn(fun() -> loop() end).

loop() ->
    receive
        {send, Message, To} ->
            route_message(Message, To),
            loop();
        stop ->
            ok
    end.

FreeBSD Operating System

  • Highly tuned for networking
  • Better performance than Linux for their workload
  • Custom kernel optimizations

Key Components

1. Connection Management

  • Protocol: Custom protocol based on XMPP (simplified)
  • Encryption: Signal Protocol (end-to-end)
  • Connections: Long-lived TCP connections
  • Compression: Efficient binary protocol

2. Message Flow

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  Sender  │───▶│  Server  │───▶│  Server  │───▶│ Receiver │
│  Client  │    │  (Home)  │    │  (Dest)  │    │  Client  │
└──────────┘    └──────────┘    └──────────┘    └──────────┘
                     │
                     ▼
              ┌──────────────┐
              │ Mnesia/MySQL │
              │  (Offline)   │
              └──────────────┘

Message States:

  • Single checkmark: Delivered to server
  • Double checkmark: Delivered to recipient
  • Blue checkmarks: Read by recipient

3. Data Storage

Component Storage Purpose
Messages (offline) Mnesia → MySQL Store until delivered
User profiles MySQL Account data
Media files Amazon S3 Images, videos, documents
Keys Local device End-to-end encryption keys

4. Media Handling

┌────────────┐    ┌────────────┐    ┌────────────┐
│   Client   │───▶│   Upload   │───▶│    S3      │
│  Uploads   │    │   Server   │    │  Storage   │
└────────────┘    └────────────┘    └────────────┘
                                          │
                                          ▼
┌────────────┐    ┌────────────┐    ┌────────────┐
│   Client   │◀───│    CDN     │◀───│  Generate  │
│  Downloads │    │            │    │    URL     │
└────────────┘    └────────────┘    └────────────┘

Scalability Strategies

1. Server Efficiency

  • 2 million connections per server (Erlang’s strength)
  • Custom memory management
  • Optimized garbage collection

2. Database Optimization

  • Read replicas for scaling reads
  • Sharding by user ID
  • Minimal data storage (messages deleted after delivery)

3. Caching

┌─────────────┐     ┌─────────────┐
│   Request   │────▶│  Memcached  │ (Hit: Return)
└─────────────┘     └──────┬──────┘
                          │ (Miss)
                          ▼
                    ┌─────────────┐
                    │    MySQL    │
                    └─────────────┘

End-to-End Encryption

Signal Protocol Implementation

┌─────────────────────────────────────────────────────┐
│                 Key Exchange                         │
├─────────────────────────────────────────────────────┤
│  1. Identity Key (long-term)                        │
│  2. Signed Pre-Key (medium-term)                    │
│  3. One-Time Pre-Keys (single use)                  │
└─────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────┐
│              Double Ratchet Algorithm                │
├─────────────────────────────────────────────────────┤
│  - Forward secrecy                                   │
│  - Break-in recovery                                 │
│  - Per-message keys                                  │
└─────────────────────────────────────────────────────┘

Group Messaging

  • Sender Keys for efficiency
  • Each member has unique key
  • Server cannot decrypt messages

Performance Metrics

Metric Value
Daily Messages 100+ billion
Monthly Active Users 2+ billion
Engineers (2014) ~50
Servers (2014) ~550
Messages/second 1+ million

Design Principles

1. Simplicity

  • Focus on core messaging functionality
  • Minimal features, maximum reliability
  • Simple user experience

2. Efficiency

  • Binary protocol (not JSON/XML)
  • Minimal server storage
  • Optimized network usage

3. Privacy

  • End-to-end encryption by default
  • Minimal data collection
  • Messages not stored on servers

4. Reliability

  • Messages always delivered
  • Offline message queuing
  • Automatic reconnection

Lessons for Architects

1. Choose the Right Technology

Erlang was perfect for WhatsApp’s needs:

  • Concurrent connections
  • Fault tolerance
  • Low latency

2. Optimize Ruthlessly

  • Every byte counts
  • Profile and measure
  • Custom solutions when needed

3. Keep It Simple

  • Fewer features, done well
  • Minimal dependencies
  • Clear architecture

4. Plan for Scale

  • Design for millions from day one
  • Horizontal scaling capability
  • Efficient resource usage

C# Equivalent Patterns

Connection Handling (SignalR)

public class ChatHub : Hub
{
    public async Task SendMessage(string user, string message)
    {
        await Clients.User(user).SendAsync("ReceiveMessage", message);
    }

    public override async Task OnConnectedAsync()
    {
        await Groups.AddToGroupAsync(Context.ConnectionId, "Online");
        await base.OnConnectedAsync();
    }
}

Message Queue Pattern

public class MessageService
{
    private readonly IMessageQueue _queue;

    public async Task SendMessageAsync(Message message)
    {
        if (await IsUserOnline(message.RecipientId))
        {
            await DeliverDirectly(message);
        }
        else
        {
            await _queue.EnqueueForDelivery(message);
        }
    }
}

Sources

  • Arhitectura/WhatsApp architecture.gif

📚 Articole Corelate