Deepgram Nova-2 für deutsche Spracherkennung: Production Guide

Meta-Description: Optimierung von Deepgram Nova-2 für deutsche Sprache. Dialekterkennung, Fachvokabular-Boosting, Streaming-Integration und Produktions-Konfiguration.

Keywords: Deepgram, Nova-2, German ASR, Speech-to-Text, Deutsche Spracherkennung, STT API, Voice Recognition Germany

Einführung

Deutsch ist eine Tier-1-Sprache bei Deepgram mit 5-10% Word Error Rate (WER). Doch für Production-Qualität braucht es Feintuning: Dialekte, Fachbegriffe, Komposita – die deutsche Sprache hat ihre Eigenheiten.

Baseline Performance

Metrik	Nova-2 (Deutsch)	Whisper Large	Google STT
WER	8-10%	12-15%	10-12%
Latenz	~150ms	~2000ms	~300ms
Preis/min	$0.0043	$0.006	$0.016
Streaming	Ja	Nein	Ja

Basis-Setup für Deutsch

// src/services/deepgram-german.ts
import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';

interface GermanASRConfig {
  model: 'nova-2' | 'nova-3';
  language: 'de' | 'de-DE' | 'de-AT' | 'de-CH';
  smartFormat: boolean;
  punctuate: boolean;
  diarize: boolean;
  keywords: string[];
  endpointing: number;
}

const defaultGermanConfig: GermanASRConfig = {
  model: 'nova-2',
  language: 'de',
  smartFormat: true,
  punctuate: true,
  diarize: false,
  keywords: [],
  endpointing: 500
};

export class GermanASR {
  private client = createClient(process.env.DEEPGRAM_API_KEY!);

  async transcribeStream(
    config: Partial<GermanASRConfig> = {}
  ) {
    const finalConfig = { ...defaultGermanConfig, ...config };

    const connection = this.client.listen.live({
      model: finalConfig.model,
      language: finalConfig.language,
      smart_format: finalConfig.smartFormat,
      punctuate: finalConfig.punctuate,
      diarize: finalConfig.diarize,
      keywords: finalConfig.keywords,
      endpointing: finalConfig.endpointing,

      // Deutsche Spezialoptionen
      numerals: true,         // "dreiundzwanzig" → "23"
      profanity_filter: false, // Für vollständige Transkription
      redact: false,
      replace: [],
      search: [],
      utterance_end_ms: 1000,
      interim_results: true
    });

    return connection;
  }

  async transcribeFile(
    audioBuffer: Buffer,
    config: Partial<GermanASRConfig> = {}
  ) {
    const finalConfig = { ...defaultGermanConfig, ...config };

    const result = await this.client.listen.prerecorded.transcribeFile(
      audioBuffer,
      {
        model: finalConfig.model,
        language: finalConfig.language,
        smart_format: finalConfig.smartFormat,
        punctuate: finalConfig.punctuate,
        diarize: finalConfig.diarize,
        keywords: finalConfig.keywords,

        // Für Batch: Zusätzliche Optionen
        paragraphs: true,
        summarize: 'v2',
        topics: true,
        intents: true,
        sentiment: true
      }
    );

    return result;
  }
}

Keyword Boosting für Fachvokabular

Das Problem mit deutschen Fachbegriffen

Standard ASR:  "Der Patient hat Rücken Syndrom"
Mit Boosting:  "Der Patient hat Rückensyndrom"

Standard ASR:  "Wir nutzen Kühn a tees"
Mit Boosting:  "Wir nutzen Kubernetes"

Implementation

// src/config/german-keywords.ts

// Medizinische Begriffe
const medicalKeywords = [
  'Rückensyndrom',
  'Bandscheibenvorfall',
  'Computertomografie',
  'Magnetresonanztomografie',
  'Blutdruckmessung',
  'Cholesterinspiegel',
  'Elektrokardiogramm',
  // ... weitere
];

// Tech-Begriffe
const techKeywords = [
  'Kubernetes',
  'TypeScript',
  'PostgreSQL',
  'WebSocket',
  'Microservices',
  'Containerisierung',
  'Deployment',
  'Repository',
  // ... weitere
];

// E-Commerce
const ecommerceKeywords = [
  'Mehrwertsteuer',
  'Rechnungsstellung',
  'Gewährleistung',
  'Widerrufsrecht',
  'Versandkosten',
  'Zahlungsabwicklung',
  // ... weitere
];

// Domain-spezifische Keywords zusammenstellen
export function getKeywordsForDomain(domain: string): string[] {
  switch (domain) {
    case 'medical':
      return [...medicalKeywords, ...germanBaseKeywords];
    case 'tech':
      return [...techKeywords, ...germanBaseKeywords];
    case 'ecommerce':
      return [...ecommerceKeywords, ...germanBaseKeywords];
    default:
      return germanBaseKeywords;
  }
}

// Basis-Keywords für alle Domains
const germanBaseKeywords = [
  // Zahlen (für bessere Erkennung)
  'einundzwanzig',
  'zweiunddreißig',
  'fünfundvierzig',

  // Häufige Firmennamen
  'Deutsche Bahn',
  'Volkswagen',
  'Siemens',
  'Bosch',

  // Abkürzungen
  'GmbH',
  'AG',
  'KG',
  'e.V.'
];

Keyword Boosting mit Intensitäten

// Deepgram unterstützt Keyword-Intensifier
const boostedKeywords = [
  'Kubernetes:2',      // Stark boosten
  'PostgreSQL:2',
  'Deployment:1',      // Normal boosten
  'Repository:1',
  'GmbH:3'            // Sehr stark boosten
];

const connection = deepgram.listen.live({
  model: 'nova-2',
  language: 'de',
  keywords: boostedKeywords
});

Deutsche Dialekte

Regionale Anpassungen

// src/config/german-dialects.ts
type GermanDialect =
  | 'hochdeutsch'      // Standard
  | 'bayerisch'
  | 'schwäbisch'
  | 'sächsisch'
  | 'plattdeutsch'
  | 'österreichisch'
  | 'schweizerdeutsch';

interface DialectConfig {
  languageCode: string;
  additionalKeywords: string[];
  postProcessing: (text: string) => string;
}

const dialectConfigs: Record<GermanDialect, DialectConfig> = {
  hochdeutsch: {
    languageCode: 'de-DE',
    additionalKeywords: [],
    postProcessing: (text) => text
  },

  bayerisch: {
    languageCode: 'de-DE',
    additionalKeywords: [
      'Servus', 'Grüß Gott', 'Pfiat di',
      'Brezn', 'Semmel', 'Weißwurst'
    ],
    postProcessing: (text) => {
      // Bayrische Ausdrücke normalisieren wenn gewünscht
      return text
        .replace(/\bfei\b/g, 'wirklich')
        .replace(/\bgell\b/g, 'nicht wahr');
    }
  },

  österreichisch: {
    languageCode: 'de-AT',
    additionalKeywords: [
      'Servus', 'Grüß Gott', 'Baba',
      'Paradeiser', 'Erdapfel', 'Sackerl',
      'Jänner', 'Feber'
    ],
    postProcessing: (text) => text
  },

  schweizerdeutsch: {
    languageCode: 'de-CH',
    additionalKeywords: [
      'Grüezi', 'Merci', 'Ade',
      'Velo', 'Natel', 'Trottoir',
      'Rüebli', 'Zmorge'
    ],
    postProcessing: (text) => text
  },

  // ... weitere Dialekte
};

export function getDialectConfig(dialect: GermanDialect): DialectConfig {
  return dialectConfigs[dialect] || dialectConfigs.hochdeutsch;
}

Komposita-Handling

Deutsche Komposita sind eine Herausforderung:

// src/processing/composita-handler.ts

// Problem: "Kraft fahrzeug haft pflicht versicherung"
// Gewünscht: "Kraftfahrzeughaftpflichtversicherung"

class CompositaProcessor {
  // Häufige Komposita-Teile
  private prefixes = [
    'Kraft', 'Fahrzeug', 'Haft', 'Pflicht', 'Versicherung',
    'Daten', 'Schutz', 'Verarbeitung', 'Einwilligung',
    'Geschäfts', 'Führung', 'Bericht', 'Erstattung'
  ];

  // Bekannte vollständige Komposita
  private knownComposita = new Set([
    'Kraftfahrzeughaftpflichtversicherung',
    'Datenschutzgrundverordnung',
    'Bundesausbildungsförderungsgesetz',
    'Geschäftsführer',
    'Einwilligungserklärung'
  ]);

  process(text: string): string {
    // Versuche aufeinanderfolgende Wörter zu verbinden
    const words = text.split(' ');
    const result: string[] = [];

    let i = 0;
    while (i < words.length) {
      let combined = words[i];
      let j = i + 1;

      // Versuche mit nächsten Wörtern zu kombinieren
      while (j < words.length) {
        const potential = combined + words[j];

        if (this.isLikelyCompositum(potential)) {
          combined = potential;
          j++;
        } else {
          break;
        }
      }

      result.push(combined);
      i = j;
    }

    return result.join(' ');
  }

  private isLikelyCompositum(word: string): boolean {
    // Bekannt?
    if (this.knownComposita.has(word)) return true;

    // Beginnt mit bekanntem Prefix und ist lang genug?
    const hasKnownPrefix = this.prefixes.some(p =>
      word.startsWith(p) && word.length > p.length + 3
    );

    return hasKnownPrefix;
  }
}

Post-Processing Pipeline

// src/processing/german-postprocessor.ts
import { CompositaProcessor } from './composita-handler';

interface PostProcessorConfig {
  normalizeNumbers: boolean;
  fixComposita: boolean;
  correctCommonErrors: boolean;
  dialect: GermanDialect;
}

class GermanPostProcessor {
  private compositaProcessor = new CompositaProcessor();

  process(text: string, config: PostProcessorConfig): string {
    let result = text;

    // 1. Häufige Fehler korrigieren
    if (config.correctCommonErrors) {
      result = this.correctCommonErrors(result);
    }

    // 2. Komposita zusammenfügen
    if (config.fixComposita) {
      result = this.compositaProcessor.process(result);
    }

    // 3. Zahlen normalisieren
    if (config.normalizeNumbers) {
      result = this.normalizeNumbers(result);
    }

    // 4. Dialekt-spezifische Anpassungen
    const dialectConfig = getDialectConfig(config.dialect);
    result = dialectConfig.postProcessing(result);

    return result;
  }

  private correctCommonErrors(text: string): string {
    const corrections: [RegExp, string][] = [
      [/\beine mail\b/gi, 'eine E-Mail'],
      [/\bwhats app\b/gi, 'WhatsApp'],
      [/\bwifi\b/gi, 'WLAN'],
      [/\bapp\b/gi, 'App'],
      [/\bcloud\b/gi, 'Cloud'],
      [/\bsmart phone\b/gi, 'Smartphone'],
      [/\bhandy\b/gi, 'Handy'],
      [/\bwebsite\b/gi, 'Webseite'],
      [/\bhome office\b/gi, 'Homeoffice'],
    ];

    let result = text;
    for (const [pattern, replacement] of corrections) {
      result = result.replace(pattern, replacement);
    }

    return result;
  }

  private normalizeNumbers(text: string): string {
    // "drei komma fünf prozent" → "3,5 Prozent"
    const numberWords: Record<string, string> = {
      'null': '0', 'eins': '1', 'zwei': '2', 'drei': '3',
      'vier': '4', 'fünf': '5', 'sechs': '6', 'sieben': '7',
      'acht': '8', 'neun': '9', 'zehn': '10',
      'elf': '11', 'zwölf': '12', 'dreizehn': '13',
      'zwanzig': '20', 'dreißig': '30', 'vierzig': '40',
      'fünfzig': '50', 'hundert': '100', 'tausend': '1000'
    };

    let result = text;

    // Komma-Zahlen
    result = result.replace(
      /(\w+)\s+komma\s+(\w+)/gi,
      (_, before, after) => {
        const num1 = numberWords[before.toLowerCase()] || before;
        const num2 = numberWords[after.toLowerCase()] || after;
        return `${num1},${num2}`;
      }
    );

    return result;
  }
}

Production-Konfiguration

// src/config/production-german-asr.ts
export const productionConfig = {
  // Deepgram Settings
  deepgram: {
    model: 'nova-2',
    language: 'de',
    smart_format: true,
    punctuate: true,
    diarize: false, // Nur wenn nötig (kostet extra)
    endpointing: 500,
    interim_results: true,
    utterance_end_ms: 1000,
    vad_events: true,

    // Keywords für Domain
    keywords: getKeywordsForDomain('tech')
  },

  // Post-Processing
  postProcessing: {
    normalizeNumbers: true,
    fixComposita: true,
    correctCommonErrors: true,
    dialect: 'hochdeutsch' as GermanDialect
  },

  // Retry Logic
  retry: {
    maxRetries: 3,
    backoffMs: 1000
  },

  // Monitoring
  monitoring: {
    logTranscripts: false, // DSGVO!
    logLatency: true,
    logErrors: true
  }
};

Metriken & Monitoring

// src/monitoring/asr-metrics.ts
interface ASRMetrics {
  requestId: string;
  timestamp: Date;

  // Performance
  latencyMs: number;
  audioLengthMs: number;
  processingRatio: number; // audioLength / latency

  // Quality
  wordCount: number;
  confidenceScore: number;
  alternativesCount: number;

  // Errors
  errorType?: string;
  errorMessage?: string;
}

class ASRMonitor {
  async track(metrics: ASRMetrics) {
    // Latenz-Anomalien erkennen
    if (metrics.latencyMs > 500) {
      console.warn(`High ASR latency: ${metrics.latencyMs}ms`);
    }

    // Niedrige Konfidenz flaggen
    if (metrics.confidenceScore < 0.7) {
      console.warn(`Low confidence: ${metrics.confidenceScore}`);
    }

    // Metriken speichern (für Dashboards)
    await this.store(metrics);
  }
}

Fazit

Deutsche Spracherkennung mit Deepgram Nova-2 erfordert:

Keyword Boosting: Für Fachvokabular und Eigennamen
Dialekt-Awareness: de-DE, de-AT, de-CH unterscheiden
Komposita-Handling: Post-Processing für zusammengesetzte Wörter
Domänen-Anpassung: Keywords je nach Use Case

Mit diesen Optimierungen ist 5-8% WER auch für komplexe deutsche Fachsprache erreichbar.

Bildprompts

"German language sound waves transforming into text, blue and gold colors, technical illustration"
"Map of Germany, Austria, Switzerland with different speech bubbles, dialect visualization"
"Long German compound word being assembled like building blocks, playful tech illustration"

Kontakt

Deepgram Nova-2 für deutsche Spracherkennung: Production Guide

Deepgram Nova-2 für deutsche Spracherkennung: Production Guide

Einführung

Baseline Performance

Basis-Setup für Deutsch

Keyword Boosting für Fachvokabular

Das Problem mit deutschen Fachbegriffen

Implementation

Keyword Boosting mit Intensitäten

Deutsche Dialekte

Regionale Anpassungen

Komposita-Handling

Post-Processing Pipeline

Production-Konfiguration

Metriken & Monitoring

Fazit

Bildprompts

Quellen