[Design] Semantic Caching #30

missBerg · 2024-12-05T21:39:36Z

Produce a design proposal for the feature Semantic Caching.
Include:

Motivation
Feature Definition
Control Plane API
Technical Implementation Proposal

This issue was created from conversation during Dec 5th Community meeting
https://docs.google.com/document/d/10e1sfsF-3G3Du5nBHGmLjXw5GVMqqCvFDqp_O65B0_w/edit?tab=t.0#bookmark=id.l5miyf5qkodx

Krishanx92 · 2024-12-12T06:14:15Z

Motivation

Semantic caching in an LLM Gateway reduces API call costs by reusing responses for semantically similar requests, minimizing expensive LLM interactions. Traditional caching fails when queries have slight differences in wording, while semantic caching identifies intent using vector embeddings. This approach significantly improves response times, as cache hits are served in milliseconds compared to LLM processing delays. By reducing redundant LLM requests, semantic caching enhances cost efficiency and optimizes token usage. Ultimately, semantic caching achieves higher cache hit rates and smarter cache utilization, making the LLM Gateway more efficient.

Key Benefits

Latency Reduction: Quicker response times since computation is bypassed.
Cost Efficiency: Reduced computation costs for API-based LLMs.
Improved Scalability: Handles higher throughput by avoiding redundant computations.

Will provide detail documentation for this.

missBerg added enhancement New feature or request api Control Plane API design labels Dec 5, 2024

Krishanx92 mentioned this issue Dec 12, 2024

[Discussion] Semantic Caching for LLM/AI Gateways #10

Open

missBerg assigned Krishanx92 Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Design] Semantic Caching #30

[Design] Semantic Caching #30

missBerg commented Dec 5, 2024 •

edited

Loading

Krishanx92 commented Dec 12, 2024 •

edited

Loading

[Design] Semantic Caching #30

[Design] Semantic Caching #30

Comments

missBerg commented Dec 5, 2024 • edited Loading

Krishanx92 commented Dec 12, 2024 • edited Loading

Motivation

Key Benefits

missBerg commented Dec 5, 2024 •

edited

Loading

Krishanx92 commented Dec 12, 2024 •

edited

Loading