Scaling SaaS Infrastructure

Meta-Description: SaaS-Infrastruktur skalieren. Von Monolith zu Microservices, Database Sharding, Caching-Strategien und Auto-Scaling mit Kubernetes implementieren.

Keywords: SaaS Scaling, Horizontal Scaling, Database Sharding, Kubernetes, Load Balancing, Caching, CDN, Auto-Scaling, High Availability

Einführung

SaaS Infrastructure Scaling ermöglicht Wachstum ohne Performance-Einbußen. Von horizontalem Scaling über Database Sharding bis Kubernetes Auto-Scaling – die richtige Architektur entscheidet über Skalierbarkeit. Dieser Guide zeigt bewährte Patterns für wachsende SaaS-Produkte.

Scaling Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│              SAAS SCALING ARCHITECTURE                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                    CDN LAYER                         │   │
│  │  CloudFlare / AWS CloudFront / Vercel Edge          │   │
│  │  ├── Static assets (JS, CSS, images)               │   │
│  │  ├── Edge caching for API responses                │   │
│  │  └── DDoS protection                               │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │               LOAD BALANCER                          │   │
│  │  AWS ALB / Nginx / Traefik                          │   │
│  │  ├── SSL termination                               │   │
│  │  ├── Health checks                                 │   │
│  │  ├── Rate limiting                                 │   │
│  │  └── Geographic routing                            │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│         ┌────────────────┼────────────────┐                │
│         ▼                ▼                ▼                │
│  ┌───────────┐    ┌───────────┐    ┌───────────┐          │
│  │  App Pod  │    │  App Pod  │    │  App Pod  │          │
│  │  (API)    │    │  (API)    │    │  (API)    │          │
│  └───────────┘    └───────────┘    └───────────┘          │
│         │                │                │                │
│         └────────────────┼────────────────┘                │
│                          ▼                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                 CACHE LAYER                          │   │
│  │  Redis Cluster / Memcached                          │   │
│  │  ├── Session storage                               │   │
│  │  ├── Query caching                                 │   │
│  │  ├── Rate limit counters                           │   │
│  │  └── Real-time features (pub/sub)                  │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │               DATABASE LAYER                         │   │
│  │                                                     │   │
│  │  ┌──────────────┐  ┌──────────────┐               │   │
│  │  │   Primary    │──│   Replica    │               │   │
│  │  │   (Write)    │  │   (Read)     │               │   │
│  │  └──────────────┘  └──────────────┘               │   │
│  │         │                                          │   │
│  │         ▼ (at scale)                               │   │
│  │  ┌──────────────────────────────────────────┐     │   │
│  │  │         SHARDED DATABASE                  │     │   │
│  │  │  Shard 1 │ Shard 2 │ Shard 3 │ Shard N  │     │   │
│  │  │  (A-G)   │ (H-N)   │ (O-T)   │ (U-Z)    │     │   │
│  │  └──────────────────────────────────────────┘     │   │
│  │                                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Background Jobs: Redis Queue / BullMQ / SQS               │
│  Search: Elasticsearch / Meilisearch / Algolia             │
│  Storage: S3 / R2 / GCS                                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Scaling Stages

┌─────────────────────────────────────────────────────────────┐
│              SAAS SCALING JOURNEY                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  STAGE 1: Single Server (0-1K users)                       │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  App + DB on one server                             │   │
│  │  ✓ Simple deployment                                │   │
│  │  ✗ Single point of failure                          │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  STAGE 2: Separate DB (1K-10K users)                       │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  App Server ──► Managed Database (RDS, PlanetScale) │   │
│  │  ✓ Independent scaling                              │   │
│  │  ✓ Automatic backups                                │   │
│  │  + Add Redis for sessions/caching                   │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  STAGE 3: Horizontal Scaling (10K-100K users)              │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Load Balancer ──► Multiple App Instances           │   │
│  │                 ──► Read Replicas                   │   │
│  │  ✓ No single point of failure                       │   │
│  │  ✓ Zero-downtime deployments                        │   │
│  │  + CDN for static assets                            │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  STAGE 4: Microservices (100K+ users)                      │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  API Gateway ──► Service Mesh                       │   │
│  │              ──► Kubernetes                         │   │
│  │              ──► Database per service               │   │
│  │  ✓ Independent deployment & scaling                 │   │
│  │  ✓ Technology flexibility                           │   │
│  │  + Database sharding                                │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Database Read Replicas

// lib/db/replica.ts
import { PrismaClient } from '@prisma/client';

// Primary for writes
export const primaryDb = new PrismaClient({
  datasources: {
    db: { url: process.env.DATABASE_URL }
  }
});

// Replica for reads
export const replicaDb = new PrismaClient({
  datasources: {
    db: { url: process.env.DATABASE_REPLICA_URL }
  }
});

// Smart routing
interface DbOptions {
  write?: boolean;
}

export function getDb(options: DbOptions = {}): PrismaClient {
  return options.write ? primaryDb : replicaDb;
}

// Usage examples
export async function getUsers() {
  // Read from replica
  return getDb().user.findMany();
}

export async function createUser(data: UserCreateInput) {
  // Write to primary
  return getDb({ write: true }).user.create({ data });
}

// Read-after-write consistency helper
export async function createUserWithRead(data: UserCreateInput) {
  const user = await primaryDb.user.create({ data });

  // For immediate read-after-write, use primary
  // Replica lag is typically <100ms
  return user;
}

Prisma with Read Replicas Extension

// lib/db/prisma-replicas.ts
import { PrismaClient } from '@prisma/client';
import { readReplicas } from '@prisma/extension-read-replicas';

export const db = new PrismaClient().$extends(
  readReplicas({
    url: process.env.DATABASE_REPLICA_URL!,
    // Optional: multiple replicas for load distribution
    // url: [
    //   process.env.DATABASE_REPLICA_URL_1!,
    //   process.env.DATABASE_REPLICA_URL_2!
    // ]
  })
);

// All reads automatically go to replica
// All writes automatically go to primary

// Force primary for specific reads (read-after-write)
export async function getUserAfterUpdate(userId: string) {
  return db.$primary().user.findUnique({
    where: { id: userId }
  });
}

Caching Strategy

// lib/cache/redis.ts
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

interface CacheOptions {
  ttl?: number; // seconds
  tags?: string[];
}

export async function cache<T>(
  key: string,
  fetcher: () => Promise<T>,
  options: CacheOptions = {}
): Promise<T> {
  const { ttl = 3600 } = options;

  // Try cache first
  const cached = await redis.get(key);
  if (cached) {
    return JSON.parse(cached);
  }

  // Fetch fresh data
  const data = await fetcher();

  // Store in cache
  await redis.setex(key, ttl, JSON.stringify(data));

  // Track tags for invalidation
  if (options.tags) {
    for (const tag of options.tags) {
      await redis.sadd(`tag:${tag}`, key);
    }
  }

  return data;
}

export async function invalidateByTag(tag: string): Promise<void> {
  const keys = await redis.smembers(`tag:${tag}`);
  if (keys.length > 0) {
    await redis.del(...keys);
    await redis.del(`tag:${tag}`);
  }
}

// Cache-aside pattern for database queries
export async function cachedQuery<T>(
  queryKey: string,
  query: () => Promise<T>,
  ttl: number = 300
): Promise<T> {
  return cache(queryKey, query, { ttl });
}

// Usage
export async function getProjectsByTenant(tenantId: string) {
  return cachedQuery(
    `tenant:${tenantId}:projects`,
    () => db.project.findMany({
      where: { tenantId },
      orderBy: { updatedAt: 'desc' }
    }),
    600 // 10 minutes
  );
}

Multi-Level Caching

// lib/cache/multi-level.ts
import { LRUCache } from 'lru-cache';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

// L1: In-memory cache (per instance)
const memoryCache = new LRUCache<string, any>({
  max: 1000,
  ttl: 60 * 1000 // 1 minute
});

// L2: Redis (shared)
interface MultiLevelCacheOptions {
  memoryTtl?: number; // ms
  redisTtl?: number;  // seconds
}

export async function multiLevelCache<T>(
  key: string,
  fetcher: () => Promise<T>,
  options: MultiLevelCacheOptions = {}
): Promise<T> {
  const { memoryTtl = 60_000, redisTtl = 300 } = options;

  // L1: Check memory
  const memoryHit = memoryCache.get(key);
  if (memoryHit !== undefined) {
    return memoryHit as T;
  }

  // L2: Check Redis
  const redisHit = await redis.get(key);
  if (redisHit) {
    const data = JSON.parse(redisHit);
    memoryCache.set(key, data, { ttl: memoryTtl });
    return data;
  }

  // L3: Fetch from source
  const data = await fetcher();

  // Populate both caches
  memoryCache.set(key, data, { ttl: memoryTtl });
  await redis.setex(key, redisTtl, JSON.stringify(data));

  return data;
}

// Cache invalidation across instances via pub/sub
const subscriber = new Redis(process.env.REDIS_URL!);
subscriber.subscribe('cache:invalidate');

subscriber.on('message', (channel, key) => {
  if (channel === 'cache:invalidate') {
    memoryCache.delete(key);
  }
});

export async function invalidate(key: string): Promise<void> {
  memoryCache.delete(key);
  await redis.del(key);
  // Notify other instances
  await redis.publish('cache:invalidate', key);
}

Database Connection Pooling

// lib/db/pool.ts
import { Pool, PoolConfig } from 'pg';

const poolConfig: PoolConfig = {
  connectionString: process.env.DATABASE_URL,

  // Pool sizing
  min: 2,
  max: 20, // max connections per instance

  // Timeouts
  connectionTimeoutMillis: 5000,
  idleTimeoutMillis: 30000,

  // Query timeout
  statement_timeout: 30000,

  // Application name for monitoring
  application_name: 'saas-app'
};

export const pool = new Pool(poolConfig);

// Health check
export async function checkDbHealth(): Promise<boolean> {
  try {
    const result = await pool.query('SELECT 1');
    return result.rows.length > 0;
  } catch {
    return false;
  }
}

// Connection pool monitoring
pool.on('connect', () => {
  console.log('New connection established');
});

pool.on('error', (err) => {
  console.error('Unexpected error on idle client', err);
});

// Graceful shutdown
process.on('SIGTERM', async () => {
  await pool.end();
});

Database Sharding

// lib/db/sharding.ts
import { PrismaClient } from '@prisma/client';
import crypto from 'crypto';

interface ShardConfig {
  id: number;
  url: string;
  range: [number, number]; // Hash range
}

const SHARDS: ShardConfig[] = [
  { id: 0, url: process.env.SHARD_0_URL!, range: [0, 63] },
  { id: 1, url: process.env.SHARD_1_URL!, range: [64, 127] },
  { id: 2, url: process.env.SHARD_2_URL!, range: [128, 191] },
  { id: 3, url: process.env.SHARD_3_URL!, range: [192, 255] }
];

// Create client for each shard
const shardClients = new Map<number, PrismaClient>();

for (const shard of SHARDS) {
  shardClients.set(
    shard.id,
    new PrismaClient({
      datasources: { db: { url: shard.url } }
    })
  );
}

// Consistent hashing for shard selection
export function getShardId(tenantId: string): number {
  const hash = crypto
    .createHash('md5')
    .update(tenantId)
    .digest()[0]; // First byte (0-255)

  const shard = SHARDS.find(
    s => hash >= s.range[0] && hash <= s.range[1]
  );

  return shard?.id || 0;
}

export function getShardClient(tenantId: string): PrismaClient {
  const shardId = getShardId(tenantId);
  return shardClients.get(shardId)!;
}

// Usage
export async function getProjectsForTenant(tenantId: string) {
  const client = getShardClient(tenantId);
  return client.project.findMany({
    where: { tenantId }
  });
}

// Cross-shard queries (avoid if possible)
export async function globalSearch(query: string) {
  const results = await Promise.all(
    Array.from(shardClients.values()).map(client =>
      client.project.findMany({
        where: {
          OR: [
            { name: { contains: query, mode: 'insensitive' } },
            { description: { contains: query, mode: 'insensitive' } }
          ]
        },
        take: 10
      })
    )
  );

  return results.flat().slice(0, 50);
}

Background Job Processing

// lib/jobs/queue.ts
import { Queue, Worker, Job } from 'bullmq';
import Redis from 'ioredis';

const connection = new Redis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: null
});

// Define queues
export const emailQueue = new Queue('email', { connection });
export const reportQueue = new Queue('reports', { connection });
export const webhookQueue = new Queue('webhooks', { connection });

// Email worker
const emailWorker = new Worker(
  'email',
  async (job: Job) => {
    const { to, template, data } = job.data;
    await sendEmail(to, template, data);
  },
  {
    connection,
    concurrency: 10,
    limiter: {
      max: 100,
      duration: 1000 // 100 emails per second
    }
  }
);

// Report worker (CPU-intensive)
const reportWorker = new Worker(
  'reports',
  async (job: Job) => {
    const { tenantId, reportType, dateRange } = job.data;
    return generateReport(tenantId, reportType, dateRange);
  },
  {
    connection,
    concurrency: 2, // Lower concurrency for heavy jobs
    lockDuration: 300000 // 5 minute lock
  }
);

// Webhook worker with retry
const webhookWorker = new Worker(
  'webhooks',
  async (job: Job) => {
    const { url, payload, signature } = job.data;
    const response = await fetch(url, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-Signature': signature
      },
      body: JSON.stringify(payload)
    });

    if (!response.ok) {
      throw new Error(`Webhook failed: ${response.status}`);
    }
  },
  {
    connection,
    concurrency: 20,
    defaultJobOptions: {
      attempts: 5,
      backoff: {
        type: 'exponential',
        delay: 1000 // 1s, 2s, 4s, 8s, 16s
      }
    }
  }
);

// Add jobs
export async function queueEmail(
  to: string,
  template: string,
  data: Record<string, any>
) {
  return emailQueue.add('send', { to, template, data });
}

export async function queueReport(
  tenantId: string,
  reportType: string,
  dateRange: { start: Date; end: Date }
) {
  return reportQueue.add(
    'generate',
    { tenantId, reportType, dateRange },
    { priority: 10 }
  );
}

// Scheduled jobs
export async function setupScheduledJobs() {
  // Daily cleanup
  await emailQueue.add(
    'cleanup',
    {},
    {
      repeat: { cron: '0 2 * * *' } // 2 AM daily
    }
  );

  // Hourly metrics
  await reportQueue.add(
    'metrics',
    {},
    {
      repeat: { cron: '0 * * * *' } // Every hour
    }
  );
}

Kubernetes Auto-Scaling

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: saas-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: saas-api
  template:
    metadata:
      labels:
        app: saas-api
    spec:
      containers:
        - name: api
          image: saas-api:latest
          ports:
            - containerPort: 3000
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          env:
            - name: NODE_ENV
              value: "production"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-secrets
                  key: url
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5

---
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: saas-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: saas-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max

---
# Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: saas-api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: saas-api

Health Check Endpoints

// app/api/health/route.ts
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

// Liveness probe - is the app running?
export async function GET() {
  return NextResponse.json({ status: 'ok' });
}

// app/api/ready/route.ts
// Readiness probe - can the app handle traffic?
export async function GET() {
  const checks = await Promise.allSettled([
    checkDatabase(),
    checkRedis(),
    checkExternalServices()
  ]);

  const results = {
    database: checks[0].status === 'fulfilled',
    redis: checks[1].status === 'fulfilled',
    external: checks[2].status === 'fulfilled'
  };

  const allHealthy = Object.values(results).every(Boolean);

  return NextResponse.json(
    { status: allHealthy ? 'ready' : 'degraded', checks: results },
    { status: allHealthy ? 200 : 503 }
  );
}

async function checkDatabase(): Promise<boolean> {
  try {
    await db.$queryRaw`SELECT 1`;
    return true;
  } catch {
    return false;
  }
}

async function checkRedis(): Promise<boolean> {
  try {
    await redis.ping();
    return true;
  } catch {
    return false;
  }
}

async function checkExternalServices(): Promise<boolean> {
  // Check critical external dependencies
  try {
    const response = await fetch('https://api.stripe.com/v1/health', {
      method: 'HEAD',
      signal: AbortSignal.timeout(5000)
    });
    return response.ok;
  } catch {
    return true; // Don't fail readiness for optional services
  }
}

Rate Limiting at Scale

// lib/rate-limit/distributed.ts
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

interface RateLimitConfig {
  windowMs: number;
  maxRequests: number;
}

const TIERS: Record<string, RateLimitConfig> = {
  free: { windowMs: 60000, maxRequests: 60 },      // 60/min
  pro: { windowMs: 60000, maxRequests: 600 },      // 600/min
  enterprise: { windowMs: 60000, maxRequests: 6000 } // 6000/min
};

interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  resetAt: number;
}

export async function checkRateLimit(
  identifier: string,
  tier: string = 'free'
): Promise<RateLimitResult> {
  const config = TIERS[tier] || TIERS.free;
  const key = `ratelimit:${identifier}`;
  const now = Date.now();
  const windowStart = now - config.windowMs;

  // Sliding window using sorted set
  const pipeline = redis.pipeline();

  // Remove old entries
  pipeline.zremrangebyscore(key, 0, windowStart);

  // Count current window
  pipeline.zcard(key);

  // Add current request
  pipeline.zadd(key, now.toString(), `${now}:${Math.random()}`);

  // Set expiry
  pipeline.pexpire(key, config.windowMs);

  const results = await pipeline.exec();
  const currentCount = (results?.[1]?.[1] as number) || 0;

  const allowed = currentCount < config.maxRequests;
  const remaining = Math.max(0, config.maxRequests - currentCount - 1);
  const resetAt = now + config.windowMs;

  return { allowed, remaining, resetAt };
}

// Middleware
export async function rateLimitMiddleware(
  request: Request,
  identifier: string,
  tier: string
): Promise<Response | null> {
  const result = await checkRateLimit(identifier, tier);

  if (!result.allowed) {
    return new Response(
      JSON.stringify({ error: 'Rate limit exceeded' }),
      {
        status: 429,
        headers: {
          'Content-Type': 'application/json',
          'X-RateLimit-Limit': TIERS[tier]?.maxRequests.toString() || '60',
          'X-RateLimit-Remaining': '0',
          'X-RateLimit-Reset': result.resetAt.toString(),
          'Retry-After': Math.ceil((result.resetAt - Date.now()) / 1000).toString()
        }
      }
    );
  }

  return null; // Allow request
}

CDN & Edge Caching

// next.config.js
/** @type {import('next').NextConfig} */
const nextConfig = {
  // Static assets
  images: {
    remotePatterns: [
      { protocol: 'https', hostname: 'cdn.example.com' }
    ],
    minimumCacheTTL: 60 * 60 * 24 * 30 // 30 days
  },

  // Cache headers
  async headers() {
    return [
      {
        source: '/api/:path*',
        headers: [
          {
            key: 'Cache-Control',
            value: 'no-store, must-revalidate'
          }
        ]
      },
      {
        source: '/_next/static/:path*',
        headers: [
          {
            key: 'Cache-Control',
            value: 'public, max-age=31536000, immutable'
          }
        ]
      },
      {
        source: '/images/:path*',
        headers: [
          {
            key: 'Cache-Control',
            value: 'public, max-age=86400, stale-while-revalidate=604800'
          }
        ]
      }
    ];
  }
};

module.exports = nextConfig;

// Edge-cached API response
// app/api/products/route.ts
import { NextResponse } from 'next/server';

export const runtime = 'edge';
export const revalidate = 60; // ISR: revalidate every 60 seconds

export async function GET() {
  const products = await fetchProducts();

  return NextResponse.json(products, {
    headers: {
      'Cache-Control': 'public, s-maxage=60, stale-while-revalidate=300'
    }
  });
}

Monitoring & Observability

// lib/monitoring/metrics.ts
import { Counter, Histogram, Registry } from 'prom-client';

export const registry = new Registry();

// Request metrics
export const httpRequestsTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'path', 'status'],
  registers: [registry]
});

export const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration',
  labelNames: ['method', 'path'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5],
  registers: [registry]
});

// Database metrics
export const dbQueryDuration = new Histogram({
  name: 'db_query_duration_seconds',
  help: 'Database query duration',
  labelNames: ['operation', 'table'],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1],
  registers: [registry]
});

// Business metrics
export const activeUsers = new Counter({
  name: 'active_users_total',
  help: 'Total active users',
  labelNames: ['tier'],
  registers: [registry]
});

// Prometheus endpoint
// app/api/metrics/route.ts
import { registry } from '@/lib/monitoring/metrics';

export async function GET() {
  const metrics = await registry.metrics();

  return new Response(metrics, {
    headers: {
      'Content-Type': registry.contentType
    }
  });
}

Graceful Degradation

// lib/resilience/circuit-breaker.ts
interface CircuitBreakerOptions {
  failureThreshold: number;
  resetTimeout: number;
}

type CircuitState = 'CLOSED' | 'OPEN' | 'HALF_OPEN';

export class CircuitBreaker {
  private state: CircuitState = 'CLOSED';
  private failureCount = 0;
  private lastFailureTime?: number;

  constructor(private options: CircuitBreakerOptions) {}

  async execute<T>(fn: () => Promise<T>, fallback?: () => T): Promise<T> {
    if (this.state === 'OPEN') {
      if (this.shouldAttemptReset()) {
        this.state = 'HALF_OPEN';
      } else if (fallback) {
        return fallback();
      } else {
        throw new Error('Circuit breaker is OPEN');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      if (fallback) {
        return fallback();
      }
      throw error;
    }
  }

  private onSuccess(): void {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }

  private onFailure(): void {
    this.failureCount++;
    this.lastFailureTime = Date.now();

    if (this.failureCount >= this.options.failureThreshold) {
      this.state = 'OPEN';
    }
  }

  private shouldAttemptReset(): boolean {
    return (
      this.lastFailureTime !== undefined &&
      Date.now() - this.lastFailureTime >= this.options.resetTimeout
    );
  }
}

// Usage
const searchCircuit = new CircuitBreaker({
  failureThreshold: 5,
  resetTimeout: 30000 // 30 seconds
});

export async function searchProducts(query: string) {
  return searchCircuit.execute(
    () => elasticSearch.search(query),
    () => db.product.findMany({
      where: { name: { contains: query } },
      take: 20
    })
  );
}

Best Practices

Aspect	Recommendation
Caching	Multi-level: Memory → Redis → DB
Database	Read replicas, then sharding
Stateless	Session in Redis, not in memory
Jobs	Async for anything >100ms
Monitoring	Metrics, logs, traces
Graceful	Circuit breakers, fallbacks

Fazit

SaaS Infrastructure Scaling erfordert:

Horizontal First: Stateless Apps, Load Balancing
Caching: Multi-Level, Invalidation-Strategien
Database: Replicas, Pooling, später Sharding
Resilience: Circuit Breaker, Fallbacks
Observability: Metrics, Logging, Alerting

Skalierung ist ein kontinuierlicher Prozess – immer den nächsten Engpass addressieren.

Bildprompts

"SaaS scaling architecture diagram, multiple layers visualization"
"Database sharding illustration, data distribution across nodes"
"Kubernetes auto-scaling dashboard, pod metrics and scaling events"

Kontakt

Scaling SaaS Infrastructure

Scaling SaaS Infrastructure

Einführung

Scaling Architecture Overview

Scaling Stages

Database Read Replicas

Prisma with Read Replicas Extension

Caching Strategy

Multi-Level Caching

Database Connection Pooling

Database Sharding

Background Job Processing

Kubernetes Auto-Scaling

Health Check Endpoints

Rate Limiting at Scale

CDN & Edge Caching

Monitoring & Observability

Graceful Degradation

Best Practices

Fazit

Bildprompts

Quellen