How to overcome AWS Lambda Function's 100-concurrent-invocation limit
Bài đăng này đã không được cập nhật trong 8 năm
I have just completed a project using AWS S3
+ AWS Lambda
to resize users' uploaded images, and discovered some of the limits of this stacks
I'll assume that you all know what AWS S3 is and its capability.
What is AWS Lambda?
AWS Lambda is a compute service where you can upload your code to AWS Lambda and the service can run the code on your behalf using AWS infrastructure. After you upload your code and create what we call a Lambda function, AWS Lambda takes care of provisioning and managing the servers that you use to run the code.
-- From AWS Lambda documentation --
Meaning, you write your piece of script and AWS Lambda
will execute your script using AWS Infrastructure, therefore you don't need to rent an AWS instance to run your script and maintain that instance. AWS Lambda will scale up as per request.
That really sounds promising for an application which requires image to be resized after uploading.
But AWS Lambda has a deadly limit which is "Concurrent executions is capped at 100".
AWS says you can request to increase the limit but at what level is enough? 200, 300? You'll never satisfy because with a single user uploading his 500 images can cause the throttled invocations.
Previous implementation
In this implementation, we allow user to upload his files to our FTP server and use s3fs to mount to an AWS S3
bucket. Then AWS S3
will send notification to AWS Lambda
and AWS Lambda
will get the image and start processing.
This is a Push
model. For more information of Push/Pull
model in AWS Lambda
please visit their document
But s3fs
was behaving like a normal fs
, for each file it will upload 3 times:
- first time with file size of 0
- second time with file size of 0
- third time with file size of full size
This means AWS S3
will send 3 notifications to AWS Lambda
and with 500 files it can heat up AWS Lambda
up to 700 invocation in one minute.
AWS Lambda
will try its best to retry the throttled invocations in 6 hours, but it's not so reliable that we decided to find a new solution.
Analyze the problem
-
Firstly, the number of invocations are 3 times as many as the number of uploaded files. This means if we can somehow invoke
AWS Lambda
function only when a full-size file is uploaded, then we can reduce the risk of throttled invocations. -
Secondly, the invocations are not distributed evenly (meaning user uploads 500 files and 500 files will invoke
AWS Lambda
almost within a time window, causing burst invocations).
The solution
AWS S3 event notification
Let's say AWS Lambda
will not be invoked whenever a new file is uploaded but will be queued to be invoked. Luckily, AWS
provides a wonderful queue system called AWS SQS
and fortunately AWS S3
can send notification to AWS SQS
.
Instead of letting AWS Lambda
be invoked by user upload, we use AWS SQS
to copy user upload to a new AWS S3
bucket and this new bucket will trigger AWS Lambda
.
Why should we have a new AWS S3
bucket to copy file?
- So that we can validate the uploaded file types and decide to process valid files (jpg, png)
- So that the original
AWS S3
bucket will not triggerAWS Lambda
burstly.
AWS SQS polling
AWS SQS
support polling messages. This poller will be hosted in an AWS EC2
instance.
The poller runs and stops after processing a number of messages, say 100 messages then stop.
After a short period (5 cycles) then monit
will turn on the poller again.
What we should do in the future?
We will continue to monitor the new solution and try to improve it.
**P/S AWS sucks money faster than a black hole **
Some useful references
All rights reserved