EmbeddedLLM
Pinned Loading
Repositories
- infinity-executable Public Forked from michaelfeil/infinity
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
EmbeddedLLM/infinity-executable’s past year of commit activity - JamAIBase Public
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
EmbeddedLLM/JamAIBase’s past year of commit activity - SageAttention-rocm Public Forked from thu-ml/SageAttention
ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
EmbeddedLLM/SageAttention-rocm’s past year of commit activity - torchac_rocm Public Forked from LMCache/torchac_cuda
ROCm Implementation of torchac_cuda from LMCache
EmbeddedLLM/torchac_rocm’s past year of commit activity - LMCache-ROCm Public Forked from LMCache/LMCache
ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
EmbeddedLLM/LMCache-ROCm’s past year of commit activity - skypilot Public Forked from skypilot-org/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
EmbeddedLLM/skypilot’s past year of commit activity