How to use Cross Region Inference in Amazon Bedrock
What does this feature help with?
Being in AWS Solutions Architecture team, I have the luxury of being exposed to latest in tech. Bedrock team launched a new feature called Cross Region Inference to improve resiliency of LLM based workloads.
So what does this feature help with?
You deploy a workload in a region,say us-east-1, and use Bedrock api for a model in us-east-1 for inferencing, let’s call this the source region. Region in AWS terms is a slice of a physical geographical region. For eg , US has 4 AWS regions(us-east-1, us-east-2, us-west-1, us-west-2), EU has 5 AWS regions (eu-central-1,eu-west-1 and so on) etc.. Usually you will call Bedrock of the AWS region you have deployed the workload to. This region has quota limits for different models, meaning X Tokens Per Minute and Y Requests Per Minute per model. When calling a bedrock API, your workload is consuming from this X and Y. Suppose, your workload doesn’t need the physical regional boundary and would like to take advantage of the capacity located in another region, you can use Cross-Region-Inference. An abstraction layer is built on top of the on-demand pool of compute resources dedicated for LLMS in the AWS regions. In other terms, while using CRIS, your requests will be routed to either us-east-1 OR us-west-2 depending on spare capacity available. Your chances of getting throttled by Server Side congestions are lower in CRIS.