How to use Cross Region Inference in Amazon Bedrock

4 min readSep 17, 2024

What does this feature help with?

Being in AWS Solutions Architecture team, I have the luxury of being exposed to latest in tech. Bedrock team launched a new feature called Cross Region Inference to improve resiliency of LLM based workloads.

So what does this feature help with?

You deploy a workload in a region,say us-east-1, and use Bedrock api for a model in us-east-1 for inferencing, let’s call this the source region. Region in AWS terms is a slice of a physical geographical region. For eg , US has 4 AWS regions(us-east-1, us-east-2, us-west-1, us-west-2), EU has 5 AWS regions (eu-central-1,eu-west-1 and so on) etc.. Usually you will call Bedrock of the AWS region you have deployed the workload to. This region has quota limits for different models, meaning X Tokens Per Minute and Y Requests Per Minute per model. When calling a bedrock API, your workload is consuming from this X and Y. Suppose, your workload doesn’t need the physical regional boundary and would like to take advantage of the capacity located in another region, you can use Cross-Region-Inference. An abstraction layer is built on top of the on-demand pool of compute resources dedicated for LLMS in the AWS regions. In other terms, while using CRIS, your requests will be routed to either us-east-1 OR us-west-2 depending on spare capacity available. Your chances of getting throttled by Server Side congestions are lower in CRIS.

How to use Cross Region Inference in Amazon Bedrock

Written by Aswathy Prasad

No responses yet