I have just completed a project using
AWS S3 +
AWS Lambda to resize users' uploaded images, and discovered some of the limits of this stacks
I'll assume that you all know what AWS S3 is and its capability.
What is AWS Lambda?
AWS Lambda is a compute service where you can upload your code to AWS Lambda and the service can run the code on your behalf using AWS infrastructure. After you upload your code and create what we call a Lambda function, AWS Lambda takes care of provisioning and managing the servers that you use to run the code.
-- From AWS Lambda documentation --
Meaning, you write your piece of script and
AWS Lambda will execute your script using AWS Infrastructure, therefore you don't need to rent an AWS instance to run your script and maintain that instance. AWS Lambda will scale up as per request.
That really sounds promising for an application which requires image to be resized after uploading.
But AWS Lambda has a deadly limit which is "Concurrent executions is capped at 100".
AWS says you can request to increase the limit but at what level is enough? 200, 300? You'll never satisfy because with a single user uploading his 500 images can cause the throttled invocations.
In this implementation, we allow user to upload his files to our FTP server and use s3fs to mount to an
AWS S3 bucket. Then
AWS S3 will send notification to
AWS Lambda and
AWS Lambda will get the image and start processing.
This is a
Push model. For more information of
Push/Pull model in
AWS Lambda please visit their document
s3fs was behaving like a normal
fs, for each file it will upload 3 times:
- first time with file size of 0
- second time with file size of 0
- third time with file size of full size
AWS S3 will send 3 notifications to
AWS Lambda and with 500 files it can heat up
AWS Lambda up to 700 invocation in one minute.
AWS Lambda will try its best to retry the throttled invocations in 6 hours, but it's not so reliable that we decided to find a new solution.
Analyze the problem
Firstly, the number of invocations are 3 times as many as the number of uploaded files. This means if we can somehow invoke
AWS Lambdafunction only when a full-size file is uploaded, then we can reduce the risk of throttled invocations.
Secondly, the invocations are not distributed evenly (meaning user uploads 500 files and 500 files will invoke
AWS Lambdaalmost within a time window, causing burst invocations).
AWS S3 event notification
AWS Lambda will not be invoked whenever a new file is uploaded but will be queued to be invoked. Luckily,
AWS provides a wonderful queue system called
AWS SQS and fortunately
AWS S3 can send notification to
Instead of letting
AWS Lambda be invoked by user upload, we use
AWS SQS to copy user upload to a new
AWS S3 bucket and this new bucket will trigger
Why should we have a new
AWS S3 bucket to copy file?
- So that we can validate the uploaded file types and decide to process valid files (jpg, png)
- So that the original
AWS S3bucket will not trigger
AWS SQS polling
AWS SQS support polling messages. This poller will be hosted in an
AWS EC2 instance.
The poller runs and stops after processing a number of messages, say 100 messages then stop.
After a short period (5 cycles) then
monit will turn on the poller again.
What we should do in the future?
We will continue to monitor the new solution and try to improve it.
**P/S AWS sucks money faster than a black hole **