API Gateway Development

When every service exposes its own authentication, rate limiting, and routing logic, you're maintaining the same infrastructure in a dozen different places.

An API gateway is a single entry point in front of multiple backend services that handles the concerns every service shares: authentication and authorisation, rate limiting, request routing, SSL termination, logging, and observability. Instead of each service implementing these independently, the gateway handles them consistently and the services handle their business logic. RaftLabs designs and builds API gateways using AWS API Gateway, Kong, and custom Node.js gateway implementations. For organisations moving from monolith to microservices who need traffic management, for businesses exposing a public API to third-party developers, and for teams with multiple internal services that need a unified authentication layer.

  • Single authentication layer that validates tokens before requests reach any backend service -- no per-service auth implementation
  • Rate limiting and throttling per client, per endpoint, and per API tier with standard rate limit headers
  • Request routing to multiple backend services based on path, host, or header -- with load balancing and health checks
  • Centralised logging with request and response details, correlation IDs, and latency metrics across all services
See our work

Recent outcomes

Voice AI · Research

Text-based interviews converted to automated phone calls

6× deeper insights

AI Automation · Ops

Manual invoice OCR across 40+ gas stations

20k+ txns day one

Loyalty · Retail

SuperValu & Centra loyalty platform with receipt validation

1,062 users in 4 weeks

SaaS · Logistics

Multi-carrier shipping hub for Indonesian eCommerce

2,000+ shipments yr 1
4.9 / 5 on ClutchSee all work

RaftLabs designs and builds API gateways using AWS API Gateway, Kong, and custom Node.js implementations. We deliver centralized authentication, rate limiting, request routing, and observability for organizations with multiple backend services or a public API product. A focused AWS API gateway with auth and rate limiting typically costs $8,000 to $20,000. A Kong-based or custom gateway with a full routing layer and developer portal typically costs $25,000 to $70,000. Most projects deliver in 6 to 12 weeks at a fixed cost.

Trusted by

Vodafone
Aldi
Nike
Microsoft
Heineken
Cisco
Calorgas
Energia Rewards
GE
Bank of America
T-Mobile
Valero
Techstars
East Ventures

As organisations move from a monolith to multiple services, each service inherits the same set of cross-cutting concerns: how to authenticate requests, how to rate limit consumers, how to log what's happening, and how to route traffic. When those concerns are solved independently in each service, the result is inconsistent behaviour -- a token that's invalid according to one service's validation logic is accepted by another's, rate limits differ between endpoints without intent, and debugging requires correlating logs from multiple services without a shared request identifier.

An API gateway centralises these concerns. Authentication is validated once at the gateway before requests reach any service. Rate limits are configured per consumer at the gateway and applied consistently across every endpoint they call. Every request generates a log entry with a correlation ID that flows to the backend service and appears in every log line from that request. The backend services receive pre-authenticated requests and focus on business logic. RaftLabs designs gateway architecture before implementation -- routing topology, authentication model, rate limit configuration, and observability setup -- so the gateway solves the right problems for your service footprint.

Capabilities

What we build

API gateway design and architecture

Gateway architecture begins with topology design rather than implementation -- a gateway configured to solve the wrong problems creates as much operational complexity as it removes. The design phase maps your current service footprint: backend services, their inter-dependencies, traffic origins (external consumers, internal services, mobile clients, partner integrations), and existing authentication mechanisms. From the service map, we recommend the appropriate gateway topology: a single edge gateway for organisations with 3--10 services and a unified consumer base; a BFF (Backend For Frontend) pattern with dedicated gateways per client type (mobile gateway optimised for connection constraints, web gateway optimised for latency, partner gateway with stricter rate limits and contract versioning) for organisations with divergent client requirements; or a mesh-plus-gateway architecture where internal service-to-service traffic uses a service mesh (Istio, Linkerd) and the edge gateway handles only external traffic. Technology selection is based on your existing infrastructure and operational capacity: AWS API Gateway for AWS-native teams who want managed infrastructure with no gateway servers to run; Kong Gateway (open-source or Enterprise) deployed on Kubernetes for teams that need plugin extensibility and infrastructure portability; Nginx with Lua scripting or a custom Node.js/Fastify gateway for organisations with routing logic that declarative configuration cannot express. The architecture document is produced and reviewed before any configuration or code is written.

Authentication and token validation

Authentication is validated once at the gateway before any request reaches a backend service. JWT validation: RS256 or ES256 signature verification using the public key fetched from the issuer's JWKS endpoint (Auth0, Cognito, Keycloak, Firebase Authentication); expiry and not-before claim checking; audience and issuer claim validation to prevent token reuse across services. Validated identity claims (user ID, roles, tenant ID, permissions) are extracted from the JWT payload and forwarded to backend services in X-User-Id, X-User-Role, and X-Tenant-Id request headers -- backend services receive pre-validated caller context and do not repeat signature verification. OAuth 2.0 token introspection for opaque tokens: the gateway calls the authorisation server's introspection endpoint (RFC 7662) and caches the result with a TTL aligned to typical token lifetimes (typically 30--300 seconds) to avoid an introspection call on every request. API key authentication for partner and developer integrations: key hashing (SHA-256) before storage, key rotation without service interruption (grace period where both old and new keys are valid), and per-key rate limits and scope restrictions. Mutual TLS (mTLS) for service-to-service authentication: client certificate verification at the gateway with certificate pinning for internal service identities, preventing lateral movement from a compromised service. Authorisation at the gateway level (route-level scope enforcement) separates identity verification (authentication) from permission checking (authorisation) so backend services only receive requests that have passed both.

Rate limiting and quota management

Rate limiting enforced at the gateway is the only effective layer for abusive or misconfigured clients -- a client sending 10,000 requests per second will exhaust backend service capacity before any service-level limiting can engage. Gateway rate limiting using the token bucket algorithm (smooth burst handling) or fixed window counter (simpler, predictable): limits defined per API key or OAuth client ID, per endpoint, and per IP address for unauthenticated requests. Redis-backed distributed rate limit counters ensure limits are consistent across gateway instances in a multi-node deployment -- an API consumer cannot bypass a 1,000 req/min limit by targeting different gateway instances. API tier differentiation maps subscription tiers to rate limit configurations: a free tier at 100 req/min per endpoint, a growth tier at 1,000 req/min, and an enterprise tier at 10,000 req/min or custom negotiated limits. Rate limit headers on every response: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (RFC 6585 compliant) -- clients implement adaptive backoff without guessing the reset window. HTTP 429 Too Many Requests response with a Retry-After header so well-behaved clients can recover automatically. Burst allowance using a leaky bucket variant for clients with legitimate spike patterns (a batch job that sends 500 requests in 10 seconds then is idle for 50 seconds should not be throttled at the 100 req/10s rate). Monthly and daily quota management with quota exhaustion notifications via webhook to the client's registered callback URL, and quota top-up API for self-service quota expansion.

Request routing and load balancing

Request routing at the gateway decouples client-facing URL structure from backend service topology -- clients call stable URLs while backend services are split, merged, renamed, or moved without requiring client changes. Path-based routing: /api/v1/users/* routes to the User Service, /api/v1/orders/* routes to the Order Service, /api/v1/payments/* routes to the Payment Service -- configured as routing rules with path matching, HTTP method matching, and optional header matching. Host-based routing for multi-tenant architectures: tenant-a.api.example.com routes to the Tenant A cluster; tenant-b.api.example.com routes to the Tenant B cluster. Weighted routing for canary deployments: 5% of traffic to the new service version, 95% to the current, with the split percentage adjustable without redeployment -- the standard mechanism for validating new versions under real traffic before full rollout. Round-robin and least-connections load balancing across multiple instances of the same backend service, with active health checks (HTTP GET /health, configurable interval and failure threshold) that remove unhealthy instances from the rotation without manual intervention. Circuit breaker pattern (Hystrix-inspired, configurable failure threshold and recovery timeout): when a backend service's error rate exceeds 50% over a 10-second window, the circuit opens and requests fail fast with a gateway-level response (503 or cached fallback) rather than queuing indefinitely. Automatic retry with exponential backoff (configurable retry count and initial delay) for 502, 503, and 504 responses from transient backend failures. Response caching at the gateway using Cache-Control headers for read-heavy, low-volatility endpoints (product catalogue, configuration endpoints) that reduce backend load and improve p99 latency.

Observability and logging

Observability built into the gateway from the start rather than added as an afterthought means every operational question -- which client is causing the error spike, which endpoint has the highest p99 latency, which backend service is the bottleneck -- has a data-backed answer. Structured access logs for every API request: timestamp (UTC, millisecond precision), client ID, API key identifier, HTTP method, request path, response status code, request body size, response body size, upstream backend service, upstream response time, total gateway time, and correlation ID. Log format is JSON, shipped to CloudWatch Logs, Datadog, Elasticsearch, or Splunk via a log forwarder configured at deployment. Correlation ID generated at the gateway (UUID v4) and injected into the X-Correlation-Id request header forwarded to backend services -- backend service logs that include the correlation ID allow a single slow request to be traced across the gateway log, the service log, and the database query log with a single ID. Distributed tracing integration with OpenTelemetry (trace context propagation via W3C Trace Context headers) for teams using Jaeger, Tempo, or AWS X-Ray. Metrics published per endpoint: request count, error rate (4xx and 5xx separately), p50/p95/p99 latency, and upstream service time. Dashboards in Grafana or CloudWatch with anomaly detection alerts: error rate above baseline, p95 latency exceeding configured threshold, and sudden traffic volume changes (both spikes indicating abuse and drops indicating client-side failures). Access log export to a SIEM (Splunk, Elastic SIEM, Microsoft Sentinel) for security monitoring, audit trail, and compliance with PCI DSS, SOC 2, or ISO 27001 access logging requirements.

Developer portal and API products

A developer portal is the operational layer that lets you scale an API product beyond internal use without creating a support burden on your engineering team. The portal handles self-service API credential management: developer sign-up, API key provisioning (key generated and displayed once, hashed before storage), sandbox credential management, and key rotation without engineering involvement. API documentation is generated from the gateway's registered OpenAPI 3.0 specifications and rendered in an interactive reference (Redoc or Swagger UI) with try-it-out functionality against the sandbox environment -- developers test real API calls from the documentation without writing code. Usage dashboard per developer account: request count, error count, quota consumed vs remaining, and per-endpoint breakdown for the last 30/7/1 day. Subscription and tier management: developers select a plan tier, the gateway configuration is updated programmatically, and usage is metered for billing. Versioning and deprecation management in the portal: new API versions are published with changelog documentation; sunset notices with end-of-life dates appear on deprecated versions; Sunset headers (RFC 8594) are injected by the gateway on responses from deprecated endpoints with a link to the migration guide. Webhook endpoints for developer-side events: quota warnings at 80% and 100%, key expiry notices, and breaking change announcements. The portal is built as a standalone web application (Next.js frontend, Node.js API backend) deployable on your domain or as a white-labelled product for API-first businesses offering their API as a commercial product.

Have an API gateway project?

Tell us your backend services, your current authentication and routing setup, and what you're trying to consolidate. We'll scope the gateway and give you a fixed cost.

Frequently asked questions

AWS API Gateway is the right choice if you're already on AWS, need minimal operational overhead, and your routing requirements are straightforward -- it handles authentication, rate limiting, and Lambda or HTTP backend routing without additional infrastructure to run. Kong is the choice for organisations that need the flexibility of a plugin ecosystem (custom auth, advanced routing logic, monitoring integrations) and want to run the gateway on their own infrastructure or in Kubernetes. A custom gateway makes sense when your routing logic is complex enough that configuration-driven gateways can't express it, or when you want full control over the gateway's behaviour without configuration API limitations. The recommendation follows your existing infrastructure and operational capacity.

The gateway handles versioning by routing requests based on the URL version prefix or Accept-Version header to different backend service versions. During a transition period, v1 traffic routes to the v1 service and v2 traffic routes to the v2 service -- allowing both to run in parallel while clients migrate. The gateway can also handle version aliasing: if 'latest' always points to the current stable version, the backend routing is updated in one place when a new version is promoted. Sunset headers informing clients of a version's planned end-of-life are injected by the gateway without backend service changes.

Yes, with different gateway configurations for each traffic type. External consumer traffic uses API key or OAuth 2.0 authentication with public-facing rate limits. Internal service-to-service traffic uses mutual TLS or JWT tokens issued by an internal identity service, with higher or no rate limits compared to external consumers. Separating external and internal gateways -- or using the same gateway with different routes and auth configurations -- keeps external consumer rate limiting from affecting internal service performance during high-traffic periods.

An AWS API Gateway configuration for a small number of services with authentication, rate limiting, and logging typically runs $8,000 to $20,000. A Kong-based or custom gateway with a full routing layer, developer portal, API key management, and observability typically runs $25,000 to $70,000. Fixed cost agreed before development starts.

Work with us

Tell us what you need. We'll tell you what it would take.

We scope API Gateway Development in 30 minutes. You walk away with a clear cost, timeline, and approach. No commitment required.

  • Scope and cost agreed before work starts. No surprises. No obligation.
  • Working prototype within 3 weeks of kickoff.
  • Pay by milestone. You see progress before each invoice.
  • 60-day post-launch warranty. Bug fixes, UI tweaks, and deployment support. No retainer.
  • All conversations are NDA-protected.