Menu
Zurück zum Blog
2 min read
Backend

Scaling SaaS Infrastructure

SaaS-Infrastruktur skalieren. Von Monolith zu Microservices, Database Sharding, Caching-Strategien und Auto-Scaling mit Kubernetes implementieren.

SaaS ScalingHorizontal ScalingDatabase ShardingKubernetesLoad BalancingCaching
Scaling SaaS Infrastructure

Scaling SaaS Infrastructure

Meta-Description: SaaS-Infrastruktur skalieren. Von Monolith zu Microservices, Database Sharding, Caching-Strategien und Auto-Scaling mit Kubernetes implementieren.

Keywords: SaaS Scaling, Horizontal Scaling, Database Sharding, Kubernetes, Load Balancing, Caching, CDN, Auto-Scaling, High Availability


Einführung

SaaS Infrastructure Scaling ermöglicht Wachstum ohne Performance-Einbußen. Von horizontalem Scaling über Database Sharding bis Kubernetes Auto-Scaling – die richtige Architektur entscheidet über Skalierbarkeit. Dieser Guide zeigt bewährte Patterns für wachsende SaaS-Produkte.


Scaling Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│              SAAS SCALING ARCHITECTURE                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                    CDN LAYER                         │   │
│  │  CloudFlare / AWS CloudFront / Vercel Edge          │   │
│  │  ├── Static assets (JS, CSS, images)               │   │
│  │  ├── Edge caching for API responses                │   │
│  │  └── DDoS protection                               │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │               LOAD BALANCER                          │   │
│  │  AWS ALB / Nginx / Traefik                          │   │
│  │  ├── SSL termination                               │   │
│  │  ├── Health checks                                 │   │
│  │  ├── Rate limiting                                 │   │
│  │  └── Geographic routing                            │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│         ┌────────────────┼────────────────┐                │
│         ▼                ▼                ▼                │
│  ┌───────────┐    ┌───────────┐    ┌───────────┐          │
│  │  App Pod  │    │  App Pod  │    │  App Pod  │          │
│  │  (API)    │    │  (API)    │    │  (API)    │          │
│  └───────────┘    └───────────┘    └───────────┘          │
│         │                │                │                │
│         └────────────────┼────────────────┘                │
│                          ▼                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                 CACHE LAYER                          │   │
│  │  Redis Cluster / Memcached                          │   │
│  │  ├── Session storage                               │   │
│  │  ├── Query caching                                 │   │
│  │  ├── Rate limit counters                           │   │
│  │  └── Real-time features (pub/sub)                  │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │               DATABASE LAYER                         │   │
│  │                                                     │   │
│  │  ┌──────────────┐  ┌──────────────┐               │   │
│  │  │   Primary    │──│   Replica    │               │   │
│  │  │   (Write)    │  │   (Read)     │               │   │
│  │  └──────────────┘  └──────────────┘               │   │
│  │         │                                          │   │
│  │         ▼ (at scale)                               │   │
│  │  ┌──────────────────────────────────────────┐     │   │
│  │  │         SHARDED DATABASE                  │     │   │
│  │  │  Shard 1 │ Shard 2 │ Shard 3 │ Shard N  │     │   │
│  │  │  (A-G)   │ (H-N)   │ (O-T)   │ (U-Z)    │     │   │
│  │  └──────────────────────────────────────────┘     │   │
│  │                                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Background Jobs: Redis Queue / BullMQ / SQS               │
│  Search: Elasticsearch / Meilisearch / Algolia             │
│  Storage: S3 / R2 / GCS                                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Scaling Stages

┌─────────────────────────────────────────────────────────────┐
│              SAAS SCALING JOURNEY                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  STAGE 1: Single Server (0-1K users)                       │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  App + DB on one server                             │   │
│  │  ✓ Simple deployment                                │   │
│  │  ✗ Single point of failure                          │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  STAGE 2: Separate DB (1K-10K users)                       │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  App Server ──► Managed Database (RDS, PlanetScale) │   │
│  │  ✓ Independent scaling                              │   │
│  │  ✓ Automatic backups                                │   │
│  │  + Add Redis for sessions/caching                   │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  STAGE 3: Horizontal Scaling (10K-100K users)              │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Load Balancer ──► Multiple App Instances           │   │
│  │                 ──► Read Replicas                   │   │
│  │  ✓ No single point of failure                       │   │
│  │  ✓ Zero-downtime deployments                        │   │
│  │  + CDN for static assets                            │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  STAGE 4: Microservices (100K+ users)                      │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  API Gateway ──► Service Mesh                       │   │
│  │              ──► Kubernetes                         │   │
│  │              ──► Database per service               │   │
│  │  ✓ Independent deployment & scaling                 │   │
│  │  ✓ Technology flexibility                           │   │
│  │  + Database sharding                                │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Database Read Replicas

// lib/db/replica.ts
import { PrismaClient } from '@prisma/client';

// Primary for writes
export const primaryDb = new PrismaClient({
  datasources: {
    db: { url: process.env.DATABASE_URL }
  }
});

// Replica for reads
export const replicaDb = new PrismaClient({
  datasources: {
    db: { url: process.env.DATABASE_REPLICA_URL }
  }
});

// Smart routing
interface DbOptions {
  write?: boolean;
}

export function getDb(options: DbOptions = {}): PrismaClient {
  return options.write ? primaryDb : replicaDb;
}

// Usage examples
export async function getUsers() {
  // Read from replica
  return getDb().user.findMany();
}

export async function createUser(data: UserCreateInput) {
  // Write to primary
  return getDb({ write: true }).user.create({ data });
}

// Read-after-write consistency helper
export async function createUserWithRead(data: UserCreateInput) {
  const user = await primaryDb.user.create({ data });

  // For immediate read-after-write, use primary
  // Replica lag is typically <100ms
  return user;
}

Prisma with Read Replicas Extension

// lib/db/prisma-replicas.ts
import { PrismaClient } from '@prisma/client';
import { readReplicas } from '@prisma/extension-read-replicas';

export const db = new PrismaClient().$extends(
  readReplicas({
    url: process.env.DATABASE_REPLICA_URL!,
    // Optional: multiple replicas for load distribution
    // url: [
    //   process.env.DATABASE_REPLICA_URL_1!,
    //   process.env.DATABASE_REPLICA_URL_2!
    // ]
  })
);

// All reads automatically go to replica
// All writes automatically go to primary

// Force primary for specific reads (read-after-write)
export async function getUserAfterUpdate(userId: string) {
  return db.$primary().user.findUnique({
    where: { id: userId }
  });
}

Caching Strategy

// lib/cache/redis.ts
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

interface CacheOptions {
  ttl?: number; // seconds
  tags?: string[];
}

export async function cache<T>(
  key: string,
  fetcher: () => Promise<T>,
  options: CacheOptions = {}
): Promise<T> {
  const { ttl = 3600 } = options;

  // Try cache first
  const cached = await redis.get(key);
  if (cached) {
    return JSON.parse(cached);
  }

  // Fetch fresh data
  const data = await fetcher();

  // Store in cache
  await redis.setex(key, ttl, JSON.stringify(data));

  // Track tags for invalidation
  if (options.tags) {
    for (const tag of options.tags) {
      await redis.sadd(`tag:${tag}`, key);
    }
  }

  return data;
}

export async function invalidateByTag(tag: string): Promise<void> {
  const keys = await redis.smembers(`tag:${tag}`);
  if (keys.length > 0) {
    await redis.del(...keys);
    await redis.del(`tag:${tag}`);
  }
}

// Cache-aside pattern for database queries
export async function cachedQuery<T>(
  queryKey: string,
  query: () => Promise<T>,
  ttl: number = 300
): Promise<T> {
  return cache(queryKey, query, { ttl });
}

// Usage
export async function getProjectsByTenant(tenantId: string) {
  return cachedQuery(
    `tenant:${tenantId}:projects`,
    () => db.project.findMany({
      where: { tenantId },
      orderBy: { updatedAt: 'desc' }
    }),
    600 // 10 minutes
  );
}

Multi-Level Caching

// lib/cache/multi-level.ts
import { LRUCache } from 'lru-cache';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

// L1: In-memory cache (per instance)
const memoryCache = new LRUCache<string, any>({
  max: 1000,
  ttl: 60 * 1000 // 1 minute
});

// L2: Redis (shared)
interface MultiLevelCacheOptions {
  memoryTtl?: number; // ms
  redisTtl?: number;  // seconds
}

export async function multiLevelCache<T>(
  key: string,
  fetcher: () => Promise<T>,
  options: MultiLevelCacheOptions = {}
): Promise<T> {
  const { memoryTtl = 60_000, redisTtl = 300 } = options;

  // L1: Check memory
  const memoryHit = memoryCache.get(key);
  if (memoryHit !== undefined) {
    return memoryHit as T;
  }

  // L2: Check Redis
  const redisHit = await redis.get(key);
  if (redisHit) {
    const data = JSON.parse(redisHit);
    memoryCache.set(key, data, { ttl: memoryTtl });
    return data;
  }

  // L3: Fetch from source
  const data = await fetcher();

  // Populate both caches
  memoryCache.set(key, data, { ttl: memoryTtl });
  await redis.setex(key, redisTtl, JSON.stringify(data));

  return data;
}

// Cache invalidation across instances via pub/sub
const subscriber = new Redis(process.env.REDIS_URL!);
subscriber.subscribe('cache:invalidate');

subscriber.on('message', (channel, key) => {
  if (channel === 'cache:invalidate') {
    memoryCache.delete(key);
  }
});

export async function invalidate(key: string): Promise<void> {
  memoryCache.delete(key);
  await redis.del(key);
  // Notify other instances
  await redis.publish('cache:invalidate', key);
}

Database Connection Pooling

// lib/db/pool.ts
import { Pool, PoolConfig } from 'pg';

const poolConfig: PoolConfig = {
  connectionString: process.env.DATABASE_URL,

  // Pool sizing
  min: 2,
  max: 20, // max connections per instance

  // Timeouts
  connectionTimeoutMillis: 5000,
  idleTimeoutMillis: 30000,

  // Query timeout
  statement_timeout: 30000,

  // Application name for monitoring
  application_name: 'saas-app'
};

export const pool = new Pool(poolConfig);

// Health check
export async function checkDbHealth(): Promise<boolean> {
  try {
    const result = await pool.query('SELECT 1');
    return result.rows.length > 0;
  } catch {
    return false;
  }
}

// Connection pool monitoring
pool.on('connect', () => {
  console.log('New connection established');
});

pool.on('error', (err) => {
  console.error('Unexpected error on idle client', err);
});

// Graceful shutdown
process.on('SIGTERM', async () => {
  await pool.end();
});

Database Sharding

// lib/db/sharding.ts
import { PrismaClient } from '@prisma/client';
import crypto from 'crypto';

interface ShardConfig {
  id: number;
  url: string;
  range: [number, number]; // Hash range
}

const SHARDS: ShardConfig[] = [
  { id: 0, url: process.env.SHARD_0_URL!, range: [0, 63] },
  { id: 1, url: process.env.SHARD_1_URL!, range: [64, 127] },
  { id: 2, url: process.env.SHARD_2_URL!, range: [128, 191] },
  { id: 3, url: process.env.SHARD_3_URL!, range: [192, 255] }
];

// Create client for each shard
const shardClients = new Map<number, PrismaClient>();

for (const shard of SHARDS) {
  shardClients.set(
    shard.id,
    new PrismaClient({
      datasources: { db: { url: shard.url } }
    })
  );
}

// Consistent hashing for shard selection
export function getShardId(tenantId: string): number {
  const hash = crypto
    .createHash('md5')
    .update(tenantId)
    .digest()[0]; // First byte (0-255)

  const shard = SHARDS.find(
    s => hash >= s.range[0] && hash <= s.range[1]
  );

  return shard?.id || 0;
}

export function getShardClient(tenantId: string): PrismaClient {
  const shardId = getShardId(tenantId);
  return shardClients.get(shardId)!;
}

// Usage
export async function getProjectsForTenant(tenantId: string) {
  const client = getShardClient(tenantId);
  return client.project.findMany({
    where: { tenantId }
  });
}

// Cross-shard queries (avoid if possible)
export async function globalSearch(query: string) {
  const results = await Promise.all(
    Array.from(shardClients.values()).map(client =>
      client.project.findMany({
        where: {
          OR: [
            { name: { contains: query, mode: 'insensitive' } },
            { description: { contains: query, mode: 'insensitive' } }
          ]
        },
        take: 10
      })
    )
  );

  return results.flat().slice(0, 50);
}

Background Job Processing

// lib/jobs/queue.ts
import { Queue, Worker, Job } from 'bullmq';
import Redis from 'ioredis';

const connection = new Redis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: null
});

// Define queues
export const emailQueue = new Queue('email', { connection });
export const reportQueue = new Queue('reports', { connection });
export const webhookQueue = new Queue('webhooks', { connection });

// Email worker
const emailWorker = new Worker(
  'email',
  async (job: Job) => {
    const { to, template, data } = job.data;
    await sendEmail(to, template, data);
  },
  {
    connection,
    concurrency: 10,
    limiter: {
      max: 100,
      duration: 1000 // 100 emails per second
    }
  }
);

// Report worker (CPU-intensive)
const reportWorker = new Worker(
  'reports',
  async (job: Job) => {
    const { tenantId, reportType, dateRange } = job.data;
    return generateReport(tenantId, reportType, dateRange);
  },
  {
    connection,
    concurrency: 2, // Lower concurrency for heavy jobs
    lockDuration: 300000 // 5 minute lock
  }
);

// Webhook worker with retry
const webhookWorker = new Worker(
  'webhooks',
  async (job: Job) => {
    const { url, payload, signature } = job.data;
    const response = await fetch(url, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-Signature': signature
      },
      body: JSON.stringify(payload)
    });

    if (!response.ok) {
      throw new Error(`Webhook failed: ${response.status}`);
    }
  },
  {
    connection,
    concurrency: 20,
    defaultJobOptions: {
      attempts: 5,
      backoff: {
        type: 'exponential',
        delay: 1000 // 1s, 2s, 4s, 8s, 16s
      }
    }
  }
);

// Add jobs
export async function queueEmail(
  to: string,
  template: string,
  data: Record<string, any>
) {
  return emailQueue.add('send', { to, template, data });
}

export async function queueReport(
  tenantId: string,
  reportType: string,
  dateRange: { start: Date; end: Date }
) {
  return reportQueue.add(
    'generate',
    { tenantId, reportType, dateRange },
    { priority: 10 }
  );
}

// Scheduled jobs
export async function setupScheduledJobs() {
  // Daily cleanup
  await emailQueue.add(
    'cleanup',
    {},
    {
      repeat: { cron: '0 2 * * *' } // 2 AM daily
    }
  );

  // Hourly metrics
  await reportQueue.add(
    'metrics',
    {},
    {
      repeat: { cron: '0 * * * *' } // Every hour
    }
  );
}

Kubernetes Auto-Scaling

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: saas-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: saas-api
  template:
    metadata:
      labels:
        app: saas-api
    spec:
      containers:
        - name: api
          image: saas-api:latest
          ports:
            - containerPort: 3000
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          env:
            - name: NODE_ENV
              value: "production"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-secrets
                  key: url
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5

---
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: saas-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: saas-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max

---
# Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: saas-api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: saas-api

Health Check Endpoints

// app/api/health/route.ts
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

// Liveness probe - is the app running?
export async function GET() {
  return NextResponse.json({ status: 'ok' });
}

// app/api/ready/route.ts
// Readiness probe - can the app handle traffic?
export async function GET() {
  const checks = await Promise.allSettled([
    checkDatabase(),
    checkRedis(),
    checkExternalServices()
  ]);

  const results = {
    database: checks[0].status === 'fulfilled',
    redis: checks[1].status === 'fulfilled',
    external: checks[2].status === 'fulfilled'
  };

  const allHealthy = Object.values(results).every(Boolean);

  return NextResponse.json(
    { status: allHealthy ? 'ready' : 'degraded', checks: results },
    { status: allHealthy ? 200 : 503 }
  );
}

async function checkDatabase(): Promise<boolean> {
  try {
    await db.$queryRaw`SELECT 1`;
    return true;
  } catch {
    return false;
  }
}

async function checkRedis(): Promise<boolean> {
  try {
    await redis.ping();
    return true;
  } catch {
    return false;
  }
}

async function checkExternalServices(): Promise<boolean> {
  // Check critical external dependencies
  try {
    const response = await fetch('https://api.stripe.com/v1/health', {
      method: 'HEAD',
      signal: AbortSignal.timeout(5000)
    });
    return response.ok;
  } catch {
    return true; // Don't fail readiness for optional services
  }
}

Rate Limiting at Scale

// lib/rate-limit/distributed.ts
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

interface RateLimitConfig {
  windowMs: number;
  maxRequests: number;
}

const TIERS: Record<string, RateLimitConfig> = {
  free: { windowMs: 60000, maxRequests: 60 },      // 60/min
  pro: { windowMs: 60000, maxRequests: 600 },      // 600/min
  enterprise: { windowMs: 60000, maxRequests: 6000 } // 6000/min
};

interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  resetAt: number;
}

export async function checkRateLimit(
  identifier: string,
  tier: string = 'free'
): Promise<RateLimitResult> {
  const config = TIERS[tier] || TIERS.free;
  const key = `ratelimit:${identifier}`;
  const now = Date.now();
  const windowStart = now - config.windowMs;

  // Sliding window using sorted set
  const pipeline = redis.pipeline();

  // Remove old entries
  pipeline.zremrangebyscore(key, 0, windowStart);

  // Count current window
  pipeline.zcard(key);

  // Add current request
  pipeline.zadd(key, now.toString(), `${now}:${Math.random()}`);

  // Set expiry
  pipeline.pexpire(key, config.windowMs);

  const results = await pipeline.exec();
  const currentCount = (results?.[1]?.[1] as number) || 0;

  const allowed = currentCount < config.maxRequests;
  const remaining = Math.max(0, config.maxRequests - currentCount - 1);
  const resetAt = now + config.windowMs;

  return { allowed, remaining, resetAt };
}

// Middleware
export async function rateLimitMiddleware(
  request: Request,
  identifier: string,
  tier: string
): Promise<Response | null> {
  const result = await checkRateLimit(identifier, tier);

  if (!result.allowed) {
    return new Response(
      JSON.stringify({ error: 'Rate limit exceeded' }),
      {
        status: 429,
        headers: {
          'Content-Type': 'application/json',
          'X-RateLimit-Limit': TIERS[tier]?.maxRequests.toString() || '60',
          'X-RateLimit-Remaining': '0',
          'X-RateLimit-Reset': result.resetAt.toString(),
          'Retry-After': Math.ceil((result.resetAt - Date.now()) / 1000).toString()
        }
      }
    );
  }

  return null; // Allow request
}

CDN & Edge Caching

// next.config.js
/** @type {import('next').NextConfig} */
const nextConfig = {
  // Static assets
  images: {
    remotePatterns: [
      { protocol: 'https', hostname: 'cdn.example.com' }
    ],
    minimumCacheTTL: 60 * 60 * 24 * 30 // 30 days
  },

  // Cache headers
  async headers() {
    return [
      {
        source: '/api/:path*',
        headers: [
          {
            key: 'Cache-Control',
            value: 'no-store, must-revalidate'
          }
        ]
      },
      {
        source: '/_next/static/:path*',
        headers: [
          {
            key: 'Cache-Control',
            value: 'public, max-age=31536000, immutable'
          }
        ]
      },
      {
        source: '/images/:path*',
        headers: [
          {
            key: 'Cache-Control',
            value: 'public, max-age=86400, stale-while-revalidate=604800'
          }
        ]
      }
    ];
  }
};

module.exports = nextConfig;

// Edge-cached API response
// app/api/products/route.ts
import { NextResponse } from 'next/server';

export const runtime = 'edge';
export const revalidate = 60; // ISR: revalidate every 60 seconds

export async function GET() {
  const products = await fetchProducts();

  return NextResponse.json(products, {
    headers: {
      'Cache-Control': 'public, s-maxage=60, stale-while-revalidate=300'
    }
  });
}

Monitoring & Observability

// lib/monitoring/metrics.ts
import { Counter, Histogram, Registry } from 'prom-client';

export const registry = new Registry();

// Request metrics
export const httpRequestsTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'path', 'status'],
  registers: [registry]
});

export const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration',
  labelNames: ['method', 'path'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5],
  registers: [registry]
});

// Database metrics
export const dbQueryDuration = new Histogram({
  name: 'db_query_duration_seconds',
  help: 'Database query duration',
  labelNames: ['operation', 'table'],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1],
  registers: [registry]
});

// Business metrics
export const activeUsers = new Counter({
  name: 'active_users_total',
  help: 'Total active users',
  labelNames: ['tier'],
  registers: [registry]
});

// Prometheus endpoint
// app/api/metrics/route.ts
import { registry } from '@/lib/monitoring/metrics';

export async function GET() {
  const metrics = await registry.metrics();

  return new Response(metrics, {
    headers: {
      'Content-Type': registry.contentType
    }
  });
}

Graceful Degradation

// lib/resilience/circuit-breaker.ts
interface CircuitBreakerOptions {
  failureThreshold: number;
  resetTimeout: number;
}

type CircuitState = 'CLOSED' | 'OPEN' | 'HALF_OPEN';

export class CircuitBreaker {
  private state: CircuitState = 'CLOSED';
  private failureCount = 0;
  private lastFailureTime?: number;

  constructor(private options: CircuitBreakerOptions) {}

  async execute<T>(fn: () => Promise<T>, fallback?: () => T): Promise<T> {
    if (this.state === 'OPEN') {
      if (this.shouldAttemptReset()) {
        this.state = 'HALF_OPEN';
      } else if (fallback) {
        return fallback();
      } else {
        throw new Error('Circuit breaker is OPEN');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      if (fallback) {
        return fallback();
      }
      throw error;
    }
  }

  private onSuccess(): void {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }

  private onFailure(): void {
    this.failureCount++;
    this.lastFailureTime = Date.now();

    if (this.failureCount >= this.options.failureThreshold) {
      this.state = 'OPEN';
    }
  }

  private shouldAttemptReset(): boolean {
    return (
      this.lastFailureTime !== undefined &&
      Date.now() - this.lastFailureTime >= this.options.resetTimeout
    );
  }
}

// Usage
const searchCircuit = new CircuitBreaker({
  failureThreshold: 5,
  resetTimeout: 30000 // 30 seconds
});

export async function searchProducts(query: string) {
  return searchCircuit.execute(
    () => elasticSearch.search(query),
    () => db.product.findMany({
      where: { name: { contains: query } },
      take: 20
    })
  );
}

Best Practices

AspectRecommendation
**Caching**Multi-level: Memory → Redis → DB
**Database**Read replicas, then sharding
**Stateless**Session in Redis, not in memory
**Jobs**Async for anything >100ms
**Monitoring**Metrics, logs, traces
**Graceful**Circuit breakers, fallbacks

Fazit

SaaS Infrastructure Scaling erfordert:

  1. Horizontal First: Stateless Apps, Load Balancing
  2. Caching: Multi-Level, Invalidation-Strategien
  3. Database: Replicas, Pooling, später Sharding
  4. Resilience: Circuit Breaker, Fallbacks
  5. Observability: Metrics, Logging, Alerting

Skalierung ist ein kontinuierlicher Prozess – immer den nächsten Engpass addressieren.


Bildprompts

  1. "SaaS scaling architecture diagram, multiple layers visualization"
  2. "Database sharding illustration, data distribution across nodes"
  3. "Kubernetes auto-scaling dashboard, pod metrics and scaling events"

Quellen