-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Semantic Caching
- Loading branch information
Showing
11 changed files
with
155 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,21 +1,25 @@ | ||
name = "ProToPortal" | ||
uuid = "f9496bd6-a3bb-4afc-927d-7268532ebfa9" | ||
authors = ["J S <[email protected]> and contributors"] | ||
version = "0.3.0" | ||
version = "0.4.0" | ||
|
||
[deps] | ||
Base64 = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f" | ||
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a" | ||
GenieFramework = "a59fdf5c-6bf0-4f5d-949c-a137c9e2f353" | ||
GenieSession = "03cc5b98-4f21-4eb6-99f2-22eced81f962" | ||
HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3" | ||
PromptingTools = "670122d1-24a8-4d70-bfce-740807c42192" | ||
SemanticCaches = "03ba8f0e-aaaa-4626-a19b-56297996781b" | ||
|
||
[compat] | ||
Aqua = "0.7" | ||
Dates = "<0.0.1, 1" | ||
GenieFramework = "2.1" | ||
GenieSession = "1" | ||
PromptingTools = "0.33" | ||
HTTP = "1" | ||
PromptingTools = "0.37.1" | ||
SemanticCaches = "0.2" | ||
Test = "<0.0.1, 1" | ||
julia = "1.10" | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,6 @@ | ||
using Pkg | ||
Pkg.activate(".") | ||
using GenieFramework | ||
ENV["GENIE_HOST"] = "127.0.0.1" | ||
ENV["PORT"] = "8000" | ||
## ENV["GENIE_ENV"] = "prod" | ||
include("app.jl") # hack for hot-reloading when fixing things | ||
Genie.loadapp(); | ||
up(async = true); | ||
## Required to support semantic caching | ||
ENV["DATADEPS_ALWAYS_ACCEPT"] = "true" | ||
using ProToPortal | ||
ProToPortal.launch(8000, "0.0.0.0"; async = false, cached = true, cache_verbose = true) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
## Define the new caching mechanism as a layer for HTTP | ||
## See documentation [here](https://juliaweb.github.io/HTTP.jl/stable/client/#Quick-Examples) | ||
""" | ||
CacheLayer | ||
A module providing caching of LLM requests for ProToPortal. | ||
It caches 3 URL paths: | ||
- `/v1/chat/completions` (for OpenAI API) | ||
- `/v1/embeddings` (for OpenAI API) | ||
- `/v1/rerank` (for Cohere API) | ||
# How to use | ||
You can use the layer directly | ||
`CacheLayer.get(req)` | ||
You can push the layer globally in all HTTP.jl requests | ||
`HTTP.pushlayer!(CacheLayer.cache_layer)` | ||
You can remove the layer later | ||
`HTTP.poplayer!()` | ||
""" | ||
module CacheLayer | ||
|
||
using SemanticCaches, HTTP | ||
using PromptingTools: JSON3 | ||
|
||
const SEM_CACHE = SemanticCache() | ||
const HASH_CACHE = HashCache() | ||
|
||
function cache_layer(handler) | ||
return function (req; kw...) | ||
VERBOSE = Base.get(ENV, "CACHES_VERBOSE", "true") == "true" | ||
if req.method == "POST" && !isempty(req.body) | ||
body = JSON3.read(copy(req.body)) | ||
## chat/completions is for OpenAI, v1/messages is for Anthropic | ||
if occursin("v1/chat/completions", req.target) || | ||
occursin("v1/messages", req.target) | ||
## We're in chat completion endpoint | ||
temperature_str = haskey(body, :temperature) ? body[:temperature] : "-" | ||
cache_key = string("chat-", body[:model], "-", temperature_str) | ||
input = join([m["content"] for m in body[:messages]], " ") | ||
elseif occursin("v1/embeddings", req.target) | ||
cache_key = string("emb-", body[:model]) | ||
## We're in embedding endpoint | ||
input = join(body[:input], " ") | ||
elseif occursin("v1/rerank", req.target) | ||
cache_key = string("rerank-", body[:model], "-", body[:top_n]) | ||
input = join([body[:query], body[:documents]...], " ") | ||
else | ||
## Skip, unknown API | ||
VERBOSE && @info "Skipping cache for $(req.method) $(req.target)" | ||
return handler(req; kw...) | ||
end | ||
## Check the cache | ||
|
||
VERBOSE && @info "Check if we can cache this request ($(length(input)) chars)" | ||
active_cache = length(input) > 5000 ? HASH_CACHE : SEM_CACHE | ||
item = active_cache(cache_key, input; verbose = 2 * VERBOSE) # change verbosity to 0 to disable detailed logs | ||
if !isvalid(item) | ||
VERBOSE && @info "Cache miss! Pinging the API" | ||
# pass the request along to the next layer by calling `cache_layer` arg `handler` | ||
resp = handler(req; kw...) | ||
item.output = resp | ||
# Let's remember it for the next time | ||
push!(active_cache, item) | ||
end | ||
## Return the calculated or cached result | ||
return item.output | ||
end | ||
# pass the request along to the next layer by calling `cache_layer` arg `handler` | ||
# also pass along the trailing keyword args `kw...` | ||
return handler(req; kw...) | ||
end | ||
end | ||
|
||
# Create a new client with the auth layer added | ||
HTTP.@client [cache_layer] | ||
|
||
end # module |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
""" | ||
launch( | ||
port::Int = get(ENV, "PORT", 8000), host::String = get( | ||
ENV, "GENIE_HOST", "127.0.0.1"); | ||
async::Bool = true, cached::Bool = true, cache_verbose::Bool = false) | ||
Launches ProToPortal in the browser. | ||
Defaults to: `http://127.0.0.1:8000`. | ||
This is a convenience wrapper around `Genie.up`, to customize the server configuration use `Genie.up()` and `Genie.config`. | ||
# Arguments | ||
- `port::Union{Int, String} = get(ENV, "PORT", "8000")`: The port to launch the server on. | ||
- `host::String = get(ENV, "GENIE_HOST", "127.0.0.1")`: The host to launch the server on. | ||
- `async::Bool = true`: Whether to launch the server asynchronously, ie, in the background. | ||
- `cached::Bool = true`: Whether to use semantic caching of the requests. | ||
- `cache_verbose::Bool = true`: Whether to print verbose information about the caching process. | ||
If you want to remove the cache layer later, you can use `import HTTP; HTTP.poplayer!()`. | ||
""" | ||
function launch( | ||
port::Union{Int, String} = get(ENV, "PORT", "8000"), | ||
host::String = get(ENV, "GENIE_HOST", "127.0.0.1"); | ||
async::Bool = true, cached::Bool = true, cache_verbose::Bool = true) | ||
## Loads app.jl in the root directory | ||
Genie.loadapp(pkgdir(ProToPortal)) | ||
|
||
## Enables caching | ||
ENV["CACHES_VERBOSE"] = cache_verbose ? "true" : "false" | ||
if cached | ||
@info "Caching enabled globally (for all requests, see `CacheLayer` module for details). Remove with `HTTP.poplayer!()`" | ||
HTTP.pushlayer!(CacheLayer.cache_layer) | ||
end | ||
## Convert to INT | ||
port_ = port isa Integer ? port : tryparse(Int, port) | ||
@assert port_ isa Integer "Port must be an integer. Provided: $port" | ||
up(port_, host; async) | ||
end |
28a5de8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
Release notes:
Added
launch
to make it easier to launch the app.cached=false
in thelaunch()
function.Commits
28a5de8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/110842
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
Also, note the warning: This looks like a new registration that registers version 0.4.0.
Ideally, you should register an initial release with 0.0.1, 0.1.0 or 1.0.0 version numbers
This can be safely ignored. However, if you want to fix this you can do so. Call register() again after making the fix. This will update the Pull request.