Apply Rate Limit Assertion
The Apply Rate Limit assertion allows you to limit the rate of transactions passing through the CA API Gateway for a given user, client IP address, or other identifier. When this limit is reached, the Gateway can either begin throttling requests or it can attempt to delay the requests until the rate falls below the limit. You can also set a maximum concurrency level to prevent a user from monopolizing resources.
gateway90
The
Apply Rate Limit
assertion allows you to limit the rate of transactions passing through the CA API Gateway for a given user, client IP address, or other identifier. When this limit is reached, the Gateway can either begin throttling requests or it can attempt to delay the requests until the rate falls below the limit. You can also set a maximum concurrency level to prevent a user from monopolizing resources.Use this assertion only if you need to limit the flow of transactions entering the Gateway.
This page contains the following topics:
2
2
Understanding the Apply Rate Limit Assertion
The following topics are provided to clarify how the rate limit is applied:
3
3
The Token Bucket Algorithm
The Apply Rate Limit assertion uses a token bucket algorithm to shape traffic. To allow a request through, a counter must spend a token from the bucket. Tokens are generated in the bucket and accumulate when there are no requests until a maximum number is reached. The rate at which tokens are generated in the bucket for a given number of seconds is set by the configured rate limit.
Without spreading the limit over time, the token bucket can hold a maximum of 1.5 tokens. If a request is sent through an idle counter, the counter does not have enough tokens to allow a second request until at least half the rate limit has elapsed.
When you spread the limit over time, you enable bursts of traffic because the bucket is allowed to hold more tokens. The number of tokens the bucket can hold depends on the rate limit and the
spread limit over
setting. If you are spreading the limit over 5 seconds, then up to 5 seconds worth of tokens are allowed to accumulate in the bucket when a counter is idle.Rate Limit | Effect | Notes |
Without Spreading the limit over time | The Gateway only accepts requests arriving no sooner than 1/limit of a second. | For a maximum limit of 10 requests per second, the second request can be sent after half of 1/10th a sec, or 1/20th a sec. Over time, this will only allow through messages at a rate equal to the limit. |
Spreading the limit over time | Allow requests to arrive in arbitrary bursts that exceed the Max requests per second rate over an X second window. | Recommended. For a maximum limit of 10 requests per second, over 5 seconds, the bucket can hold up to 50 tokens. In this mode, the counter will have enough tokens to spend to allow a burst of 50 requests arriving all at once. After this, the bucket is empty and if traffic continues to arrive, the bucket continues to behave as if no Spread Limit is enabled. |
The following graph illustrates the difference between a rate limit with or without a Spread Limit. Spreading over time allows for more traffic and throttles fewer requests.
rate_limit_arc2

Concurrency
You can limit concurrency per-counter using the concurrency limit.
The intent of the global maxQueuedThreads setting is to prevent all Gateway transport pool threads from being delayed inside rate limit counters at the same time. If this isn't a concern, disable this limit by setting it to a very high value. However, this may cause the Gateway to run out of available threads and stop responding to new requests. If you just want to increase it, a consider setting a limit of two-thirds of the Gateway's httpCoreConcurency.
Applying Rate Limit to Gateway Clusters
If you have a cluster of gateways, the limits entered in this assertion are divided among the number of "up" nodes in the cluster. A node is considered “up” if it has posted its status within the past 8 seconds (configurable via the
ratelimit.clusterStatusInterval
cluster property). The Apply Rate Limit Assertion checks the status of cluster nodes every 43 seconds (configurable via the ratelimit.clusterPollInterval
cluster property).The Gateway automatically adjusts the rates internally when nodes are added or removed from a cluster. There is no need to modify the values in this assertion. If no authenticated user is established in the policy, then the IP address of the requestor is used instead in the Apply Rate Limit Assertion.
Configure the Apply Rate Limit Assertion Properties
The Apply Rate Limit assertion is available in the assertions panel.
Drag the Apply Rate Limit Assertion from the assertions panel into a policy, or right-click the Apply Rate Limit assertion in an existing policy and select Rate Limit Properties.

Configure the Apply Rate Limit Properties as follows:
Setting | Description |
Maximum requests per second | Specify how many requests per second should be processed by the Gateway or cluster. You can enter a context variable that resolves to the maximum requests value. The context variable must either be single-value or multivalued with a specific index reference. |
Cluster wide | If the Gateway cluster comprises more than one node, this setting determines whether the value entered in the Maximum requests per second field is split among the nodes or applied to each node.
|
Spread limit over X sec window | Determines whether to allow a burst of requests to be spread across a window of time or whether to enforce a hard cap.
|
Limit each | Use the drop-down list to indicate how limiting should occur:
This limit breakdown impacts both the maximum number of requests per second as well as the maximum concurrency. For example, if you choose “by client IP address” and set the maximum concurrency to 10 and maximum number of requests per second to 100, the assertion will fail if any incoming IP address exceeds either the concurrency of 10 or the 100 requests per second; all IP addresses combined are permitted to exceed these limits however. You can combine multiple instances of this assertion to impose difference limits by different breakdown factors, such as “maximum 10 per IP and maximum 100 for all combined”. To help you construct a custom format, the entry box will display the actual node identifier and context variable associated with each of the other limit options once you've selected the Custom option. For example, when you first open the Rate Limit Properties, User or client IP is selected by default. Now, choose Custom and then reselect User or client IP . You will see that the actual coding behind this is <node identifier>-${request.clientid} . |
When limit exceeded | Specify what happens when the rate limit is exceeded:
The number of threads that can be queued within a node is defined by the ratelimit.maxQueuedThreads cluster property. For more information, see Rate Limit Cluster Properties. |
Maximum concurrent requests | Indicate whether to enforce concurrency limits for a given named rate limiter (as specified by the Limit each setting).
Select this check box to split the value across all the nodes in the cluster. For example, if the maximum is 10, each node in a 5-node cluster will result in a concurrency limit of 2 requests per node. Clear this check box to allow the maximum requests value on each node. For example, if the maximum is 10, every node in the cluster will be allowed 10 concurrent requests.The concurrency counter is incremented when a request passes through the Apply Rate Limit Assertion (even if the assertion ends up failing). The counter is decremented once the request is completely finished. |
Audit Detail Codes
The Gateway audit log displays the codes and messages associated with the apply rate limit assertion.
In the list of message codes, "{0}", "{1}", etc., are placeholders for messages that may vary depending on the context of the audit.
Messages may convert non-identifiable characters into a string literal of their Unicode value. For example, if "null" is being expressed in a message, it will be displayed as "\u0000" (the Unicode representation for null).
6900 WARNING Quota exceeded on counter {0}. Assertion limit is {1} current counter value is {2} 6901 INFO Quota already exceeded on counter {0} 6902 WARNING Invalid Quota Counter ID: {0} 6903 WARNING Configured max quota value {0} is too large. The max value allowed is {1} (where: {0} is the value found at runtime and {1} is the maximum allowed value) 6950 INFO Rate limit exceeded on rate limiter {0} 6951 INFO Unable to further delay request for rate limiter {0}, because maximum delay has been reached 6952 INFO Unable to delay request for rate limiter {0}, because queued thread limit has been reached 6953 INFO Concurrency exceeded on rate limiter {0}. 6954 INFO Rate limit of {0} exceeds maximum rate limit of {1}. Setting maximum limit to {2}