Serverless computing enables solutions without having to worry about the infrastructure that powers the solution. Cloud providers take care of the underlying infrastructure and manage the allocation of machine resources. This shift in serverless computing changes the way the solution design is arrived and architected. Cloud providers enable serverless computing by offering different services and tools.
The goal of every product engineering team is to focus on the actual product without having to worry about the infrastructure to enable it. Moving to serverless is quite easy, however at this point, it is crucial to have a clear foundational understanding of how it works. Some of the caveats with pure serverless models are –
- Vendor lock-in,
- Not an easy migration to other vendors,
- SLA compromises, and so on.
But at the same time, you get benefits like –
- Pay as you go,
- Limitless computing,
- Limitless storage,
- Go global in minutes, and so on.
Serverless computing is offered in 2 different flavors –
- Serverless container-based offerings – In this model, the deployment images will still be container images, however, the user need not take care of underlying servers or clusters.
- Container-less or managed services offerings – In this model, different services will be offered by the cloud provider for different needs and functionalities.
As there are pros and cons in going serverless either way, currently, the container-based serverless solution is more matured than the other, for numerous good reasons. However, many teams want to try out managed service offerings as a serverless computing execution model. Having built many applications around this model, I wanted to share some thoughts around the same which may help others in validating whether managed serverless computing is the way to go forward or not for their solutions.
From my experience, I would like to share some of the challenges faced with serverless computing using the popular or must-to-use managed services offerings in AWS.
1) API Gateway –
API gateway is the most common serverless service that acts as an entry point to access many other services in the form of APIs. In the case of regional gateways, there are limitations to export the same API to different regions. To create a clone of the same API from one region to another, first export the API in the form of swagger or Open API 3.0 and then import the same in the destined region. This is good in one way, if some modifications have to be made for the API, then it can be done in the exported JSON or YAML file before importing it. However, at the console level, AWS can add a simple ‘Export or Clone’ functionality of the API from one region to another.
2) Lambda –
Lambda is the serverless compute instance offered by AWS. It is very simple to use, however, some guidelines and standards on how to use and utilize the service would make it simpler to maintain.
A) Guidelines for Lambda concurrency – Unreserved vs reserved vs provisioned
B) Guidelines for Lambda versioning – Lambda alias vs Lambda versions vs S3 versions
C) Lambda applications – Support to deploy application code from other code repositories apart from CodeCommit and GitHub. S3 may be a good option.
3) Cognito –
Cognito is the identity management solution from AWS. It comes with authorization and resource server. This is helpful when the entire solution is AWS cloud-native. However, for better use of the managed identity solution, AWS should address the below limitations.
A) A secured way to export the users from Cognito
B) Show logged-in devices for federated users like the Cognito users
C) Support for Identity provider (IdP) initiated SSO in case of federated login
4) S3 –
More granular ways to have a signed URL. Currently, the signed URL validity is determined based on the expires parameter and this is the only parameter that can be customized. As there is growing adoption of S3, it would be nice to have more parameters attached to the signed URL and that gives more control for the users to share the signed URL with more constraints.
5) WAF –
With the recent launch of WAF v2, it is not possible to have support for both Geo and IP conditions in the same rule. Earlier in the AWF classic version, this configuration with WAF rules used to be possible.
6) GuardDuty –
GuradDuty is the managed services offering from AWS for continuous monitoring and threat detection on the AWS account. AWS account spans across multiple regions. There is no easier way to enable GuardDuty in all the regions even though it is highly recommended to turn on the GuardDuty services in all the regions. Yes, GuardDuty is a regional service and has to be enabled manually in every region. Global events like IAM related activities will be reported under every region.
7) Step Functions –
It provides an easier way to orchestrate the serverless workflows. However, step-functions are asynchronous and act as a state machine. Since step functions can be accessed directly from the API gateway it would be nice if step functions can return the data after processing. It would come in handy if a REST call through API gateway relies on 2 or more lambdas and other networking calls before finally passing back the result or data to the requestor of the service.
8) VPC – VPN tunnel –
It is common for enterprises to have on-prem data centers. In some cases, there would be a need for cloud services to talk to on-prem services. Among many different ways to enable cloud <-> On-prem communication, the most common solution is to have a VPN tunnel by having a VPC at cloud and VPG at on-prem. If there is no traffic going in the VPN tunnel then the VPN tunnel gets into sleep mode because of the idle time out. This may pose a greater risk since there is no auto waking up of the VPN tunnel. To prevent this problem use some networking or alarm mechanism to generate keepalive pings.
9) CloudWatch –
CloudWatch is the managed service used for logs, metrics, event management of AWS components. One of the immediate concerns is that the retention period of the logs by default would be set to ‘never expire’. However, this can be changed for the log groups with some limitations –
- Retentions of logs can not be set based on the size of the log
- No lifecycle management can be set on the logs to move the logs to S3 automatically
- Only manual pushing of logs from CloudWatch to S3
Also, the below limitations can be improved for better application development experience.
A) Better customization of log metrics
B) More ways to add graphs in different ways
C) Ability to deploy CloudWatch for on-prem services
10) DynamoDB –
DynamoDB with DAX is a very good solution to solve many of the database needs for your application. However, there are some hard limits which can not be resolved with service ticket (unlike the soft limits) –
- 10GB limitation on the partition key
- Nested attribute depth of 32 levels
11) CloudFront –
Content delivery network from AWS. It is very common to host Single Page Applications (SPA) using a static website hosting in S3 with CloudFront distribution. However, handling the invalidation of the cached CDN whenever there is a new deployment of code in S3 is a little tricky. When the deployments are done using a CI/CD pipeline, there is no automatic way to point to the new folder (Typically it would be the new version of the app) at the ‘Origin path’ in CloudFront’s ‘Origin Settings’. Even though the deployment follows a pipeline, the Origin Path has to be manually edited. However, If all of the pipelines have to be completely automated, one workaround is to keep the folder name at S3 bucket the same and deploy the code to the same folder every time a new build is generated and invalidate the cache at CDN using the AWS CLI commands in the pipeline.
12) SQS –
AWS SQS has almost everything that a queue service would offer. Some nice to have features –
- Direct integration with CloudWatch logs apart from the CloudWatch metrics
- Retention policies based on the message size
- 256 KB on the message size could have been a soft limit with a request to increase the message size by opening a service ticket with AWS. This should be reviewed by AWS on a case by case basis
13) SES –
Simple email service is a managed email service from AWS. Some of the caveats in this service include –
A) SES accepts email message of up-to only 10 MB in size
B) SES can only send messages of up-to only 50 recipients and that includes all addresses on the “To:,” “CC:,” and “BCC:” fields
14) SNS –
SNS would be even better if some of the below concerns are addressed by AWS –
- More ways (or AWS tools) to customize the email and SMS content
- More ways to debug or troubleshoot SNS
|SI No||AWS Component||Concerns/Limitations||Implementations/Workarounds|
|1||API Gateway||Clone or copy API from one region to another||Export API Gateway schema using the swagger and OpenAPI 3.0 and import at the another region|
|A) Use unreserved concurrency for most cases|
B) Use S3 versions and deploy from S3
C) Use Lambda functions
|3||Cognito||Granular control on identity management||Use Cognito triggers with Lambda at different stages of identity management|
|4||S3||Granular control on signed URL||Use CloudFront URL by attaching WAF for granular control|
|5||WAF||Geo and IP conditions in a single rule – Limitations on the rule setup||Add Trusted IP lists and threat lists to GuardDuty|
|6||GuardDuty||A) One-click enablement in all regions|
B) Unified view to see the findings
|A) Manually enable GuardDuty in every region|
B) View findings under different regions. Export findings to S3 and have a unified view using BI tools
|7||Step Functions||Supports only asynchronous orchestration of serverless workflow||Orchestrating too many lambdas or networking calls or database calls fits well with the container-based serverless execution model|
|8||VPC – VPN tunnel||Idle time out of VPN tunnel with no traffic||Alarm to generate Keepalive pings|
|9||CloudWatch||Log retentions and automatic lifecycle management||Use cloud watch event rules to move the logs to S3 and utilize the lifecycle management policies from S3|
|10||DynamoDB||Partition key limitations and nested attribute depth of 32 levels||More caution needed on designing the DB and selecting the partition key|
|11||CloudFront||CloudFront automatic cache invalidation on new app deployment||When the pipeline copies the new code to S3 folder invalidate the CloudFront cache|
|12||SQS||A) Logging to cloud watch|
B) 256 KB message size
|A) Use lambda triggers based on the logging needs|
B) While using JSON, do not have too many characters for key and values
|13||SES||A) 10 MB email size|
B) 50 recipients list
|A) Big attachments can be saved in S3 and just share the signed URL or URL restricted with CloudFront and WAF|
B) Split the recipient list and send emails
|14||SNS||Better troubleshooting mechanisms||Try to utilize the maximum out of CloudWatch metrics and alarms|
I have discussed some of the common tools that engineering teams would use predominantly while moving towards serverless implementation in AWS. However, there are a handful of other serverless services like IAM, Kinesis firehose, CloudTrail, X-Ray, KMS, Secrets Manager, CodeCommit, CodeDeploy, CodePipeline that can be used for some architectures and solution design. In my view, I did not have any concerns with those tools in our use cases.
Share your thoughts on your journey towards serverless cloud computing.