You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be interesting to see the effect of scaling to multiple instances behind the same endpoint. How does inference latency change as endpoints start to scale (automatically, we could also add parameters for scaling policy), can we support the more transactions with auto-scaling instances while keeping the latency below a threshold and what are the cost implications of doing that. This needs to be fleshed out but this is an interesting area.
This would also need to include support for the Inference Configuration feature that is now available with SageMaker.
The text was updated successfully, but these errors were encountered:
It would be interesting to see the effect of scaling to multiple instances behind the same endpoint. How does inference latency change as endpoints start to scale (automatically, we could also add parameters for scaling policy), can we support the more transactions with auto-scaling instances while keeping the latency below a threshold and what are the cost implications of doing that. This needs to be fleshed out but this is an interesting area.
This would also need to include support for the Inference Configuration feature that is now available with SageMaker.
The text was updated successfully, but these errors were encountered: