- CCF list 2019: [chinese] [international]
- The Tail in Scale
- FA2 source code
- (OSDI'20) Serving DNNs like Clockwork: Performance Predictability from the Bottom Up
- (SC'20) BATCH: Machine Learning Inference Serving on Serverless Platforms with Adaptive Batching
- (NSDI'17) Clipper: A Low-Latency Online Prediction Serving System
- (ATC'21) InFaas: Automated Model-less Inference Serving
- (OSDI'20) AntMan: Dynamic Scaling on GPU Clusters for Deep Learning
- (SoCC'21) Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines
- (OSDI'18) Gandiva: introspective cluster scheduling for deep learning
- (Middleware'17) Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency
- [partial] Borg; RAS; Trarfik; Ngnix; blog
- [low priority]: MapReduce; GFS; BigTable
- [low priority] Spanner(google); B4; Dynamo
- [book]: Designing Data-Intensive Applications
-
AntMan
-
BATCH
-
INFless
- Problem
- Insights
- Solution
- Other
-
MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving
-
Problem
-
Insights
-
Solution
-
Other
-
-
[Serverless] 🔖 Cloud Programming Simplified: A Berkeley View on Serverless Computing
- history of cloud computing
- motivations for serverless computing
- limitations of serverless
-
[Serverless] Evaluation of Production Serverless Computing Environments
- evaluates the performance of production serverless
- "serverless is powered by container technologies which have near zero start-up delay and deleting latency."
- "a container is deployed and terminated within a few milliseconds for the function invocation w/ pre warmup policy"
-
[Serverless] Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider
- characterize the FaaS workload of Azure Functions
- propose a practical resource management policy to reduce the number of cold starts
-
[Serverless] 🔖 Xanadu: Mitigating cascading cold starts in serverless function chain deployments
- (pipeline) deploy resources for the execution chain before traffic bursts
- cascading cold start increase linearly with chain length
-
InferLine: latency-aware provisioning and scaling for prediction serving pipelines
-
Problem
-
Insight
-
Solution
-
Other
-
-
[Spot instance] Tributary: spot-dancing for elastic services with latency SLOs
- Transient Instance (AWS Spot Instance)
- Trace: ClarkNet & WITS & ...
-
[Spot instance] Cocktail: A Multidimensional Optimization for Model Serving in Cloud
- Ensemble Learning
- Transient Instance
- "DeepAR-estimator"
- Trace: Wikipedia & tweet
- Twine: A Unified Cluster Management System for Shared Infrastructure
- Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications
- Autopilot: workload autoscaling at Google
- Piccolo: ---
-
Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis
-
AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers
- 😏❤️🔖 Template: An example [GPU Scheduling]