Category: <span>Launch</span>

AWS Lambda Extensions Are Now Generally Available – Get Started with Your Favorite Operations Tools Today

In October 2020, we announced the preview of AWS Lambda extensions, which you can use to easily integrate Lambda functions with your favorite tools for monitoring, observability, security, and governance.

Today, I’m happy to announce the general availability of AWS Lambda Extensions which comes with new performance improvements and an expanded set of partners. As part of the GA release, we have enabled functions to send responses as soon as the function code is complete without waiting for the included extensions to finish. This enables extensions to perform activities like sending telemetry to a preferred destination after the function’s response has been returned. We also welcome extensions from new partners: Imperva, Instana, Sentry, Site24x7, and the AWS Distro for OpenTelemetry.

You can use Lambda extensions for use cases such as capturing diagnostic information before, during, and after function invocation; automatically instrumenting your code without needing code changes; fetching configuration settings or secrets before the function invocation; detecting and alerting on function activity through security agents; and sending telemetry to custom destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Kinesis, Amazon Elasticsearch Service directly and asynchronously from your Lambda functions.

Customers are drawn to the vision of Serverless. The reduced operational responsibility frees them up to focus on their business problems. To help customers monitor, observe, secure, and govern their functions, AWS Lambda provides native integrations for logs and metrics through Amazon CloudWatch, tracing through AWS X-Ray, tracking configuration changes through AWS Config, and recording API calls through AWS CloudTrail In addition, AWS Lambda partners provide tools for application management, API integration, deployment, monitoring, and security.

AWS Lambda extensions provide a simple way to extend the Lambda execution environment, which is where your function code is executed. AWS customers, partners, and the open source community can use the new Lambda Extensions API to build their own extensions, which are companion processes that augment the capabilities of Lambda functions. To learn how to build your own extensions, see the Building Extensions for AWS Lambda – In preview blog post. The post also includes information about changes to the Lambda lifecycle.

How AWS Lambda Extensions Works
AWS Lambda extensions are designed to be the easiest way to plug in the tools you use today without complex installation or configuration management. You can add tools to your functions using Lambda layers or include them in the image for functions deployed as container images.

Lambda extensions use the Extensions API to register for function and execution environment lifecycle events. In response to these events, extensions can start new processes or run logic. Lambda extensions can also use the Runtime Logs API to subscribe to a stream of the same logs that the Lambda service sends to Amazon CloudWatch directly from the Lambda execution environment. Lambda streams the logs to the extension, and the extension can then process, filter, and send the logs to any preferred destination.

Most customers will use Lambda extensions without needing to know about the capabilities of the Extensions API. You can just consume capabilities of an extension by configuring the options in your Lambda functions.

How to Use Lambda Extensions
You can install and manage extensions using the Lambda console, the AWS Command Line Interface (CLI), or infrastructure as code (IaC) services and tools such as AWS CloudFormation, AWS Serverless Application Model (AWS SAM), and Terraform.

To use Lambda extensions to integrate existing tools with your Lambda functions, choose your a Lambda function and on the Configuration tab, choose Monitoring and Operations tools.

On the Extensions page, you can find available extensions from AWS Lambda partners. Choose an extension to view its installation instructions.

AWS Lambda Extensions Partners
At this launch, Lambda extensions integrate with these AWS Lambda partners who have provided the following information to introduce their extensions. (I am updating this article with links as they are published.)

  • AppDynamics provides end-to-end transaction tracing for AWS Lambda. With the AppDynamics extension, it is no longer mandatory for developers to include the AppDynamics tracer as a dependency in their function code, making tracing transactions across hybrid architectures even simpler.
  • Coralogix is a log analytics and cloud security platform that empowers thousands of companies to improve security and accelerate software delivery, allowing you to get deep insights without paying for the noise. Coralogix can now read Lambda function logs and metrics directly, without using CloudWatch or Amazon S3, reducing the latency, and cost of observability.
  • The Datadog extension brings comprehensive, real-time visibility to your serverless applications. Combined with Datadog’s integration with AWS, you get metrics, traces, and logs to help you monitor, detect, and resolve issues at any scale. The Datadog extension makes it easier than ever to get telemetry from your serverless workloads.
  • The Dynatrace extension makes it even easier to bring AWS Lambda metrics and traces into the Dynatrace platform for intelligent observability and automatic root cause detection. Get comprehensive, end-to-end observability with the flip of a switch and no code changes.
  • Epsagon helps you monitor, troubleshoot, and lower the cost of your Lambda functions. Epsagon’s extension reduces the overhead of sending traces to the Epsagon service, with minimal performance impact to your function.
  • HashiCorp Vault allows you to secure, store, and tightly control access to your application’s secrets and sensitive data. With the Vault extension, you can now authenticate and securely retrieve dynamic secrets before your Lambda function is invoked.
  • Honeycomb is a powerful observability tool that helps you debug your entire production app stack. Honeycomb’s extension decreases the overhead, latency, and cost of sending events to the Honeycomb service, while increasing reliability.
  • Instana Enterprise Observability Platform ingests performance metrics, traces requests, and profiles processes to make observability work for the enterprise. The Instana Lambda extension offers modification-free, low latency tracing of Lambda functions backed by their real-time Enterprise Observability Platform.
  • Imperva Serverless Protection protects organizations from vulnerabilities created by misconfigured apps and code-level security risks in serverless computing environments. The Imperva extension enables customers to easily embed additional security in their DevOps processes for serverless applications without requiring any code changes, leading to faster time to market.
  • Lumigo provides a monitoring and observability platform for serverless and microservices applications. The Lumigo extension enables the new Lumigo Lambda Profiler to see a breakdown of function resources, including CPU, memory, and network metrics. Use the extension to receive actionable insights to reduce Lambda runtime duration and cost, fix bottlenecks, and increase efficiency.
  • Check Point CloudGuard provides full lifecycle security for serverless applications. The CloudGuard extension enables Function Self Protection data aggregation as an out-of-process extension, providing detection and alerting on application layer attacks.
  • New Relic enables you to efficiently monitor, troubleshoot, and optimize your Lambda functions. New Relic’s extension allows you send your Lambda service platform logs directly to New Relic’s unified observability platform, allowing you to quickly visualize data with minimal latency and cost.
  • Thundra provides an application debugging, observability and security platform for serverless, container and virtual machine (VM) workloads. The Thundra extension adds asynchronous telemetry reporting functionality to the Thundra agents, getting rid of network latency.
  • Splunk offers an enterprise-grade cloud monitoring solution for real-time full-stack visibility at scale. The Splunk extension provides a simplified runtime-independent interface to collect high-resolution observability data with minimal overhead. Monitor, manage, and optimize the performance and cost of your serverless applications with Splunk Observability solutions.
  • Sentry’s extension enables developers to monitor code health. From error tracking to performance monitoring, developers can see issues more clearly, solve them quicker, and continuously stay informed about the health of their applications, all without making code changes.
  • Site24x7 provides a performance monitoring solution for DevOps and IT operations. The Site24x7 extension enables real-time observability into your Lambda functions. It enables you to monitor critical Lambda metrics and function executions logs and optimize execution time and performance.
  • The Sumo Logic extension enables you to get instant visibility into the health and performance of your mission-critical applications using AWS Lambda. With this extension and Sumo Logic’s continuous intelligence platform, you can now ensure that all your Lambda functions are running as expected by analyzing function, platform, and extension logs to quickly identify and remediate errors and exceptions.

Here are Lambda extensions from AWS services:

  • AWS AppConfig helps you manage, store, and safely deploy application configurations to your hosts at runtime. The AWS AppConfig extension integrates Lambda and AWS AppConfig seamlessly. Lambda functions have simple access to external configuration settings quickly and easily. Developers can now dynamically change their Lambda function’s configuration safely using robust validation features.
  • Amazon CodeGuru Profiler helps developers improve application performance and reduce costs by pinpointing an application’s most expensive line of code. It provides recommendations for improving code to save money. The Lambda integration removes the need to change any code or redeploy packages.
  • Amazon CloudWatch Lambda Insights enables you to efficiently monitor, troubleshoot, and optimize Lambda functions. The Lambda Insights extension simplifies the collection, visualization, and investigation of detailed compute performance metrics, errors, and logs. You can more easily isolate and correlate performance problems to optimize your Lambda environments.
  • AWS Distro for OpenTelemetry is a secure, production-ready, AWS-supported distribution of the OpenTelemetry project. The Lambda extension runs the OpenTelemetry collector and enables functions to send trace data to AWS monitoring services such as AWS X-Ray and to any destination such as Honeycomb and Lightstep that supports OpenTelemetry Protocol (OTLP) using the OTLP exporter.

To get started with Lambda extensions, use the links provided to install these extensions.

Things to Know
Here are a couple of things to keep in mind:

Pricing: Extensions share the same billing model as Lambda functions and you are charged for compute time used in all phases of the Lambda lifecycle. For function invocations, you pay for requests served and the compute time used to run your code and all extensions, together, in 1ms increments. To learn more about billing for extensions, visit the Lambda FAQs page.

Performance: Lambda extensions might impact the performance of your function because they share resources such as CPU, memory, and storage with the function, and because extensions are initialized before function code. For example, if an extension performs compute-intensive operations, you might see your function’s execution duration increase because the extension and your function code share the same CPU resources.

Because Lambda uses allocates proportional CPU power based on the memory setting, you might see increased execution and initialization duration at lower memory settings as more processes compete for the same CPU resources. You can use CloudWatch metrics such as PostRuntimeExecutionDuration to measure the extra time the extension takes after the function execution and MaxMemoryUsed to measure the increase in memory used.

Available Now
The performance improvements announced as part of GA are currently in US East (N. Virginia), Europe (Ireland), and Europe (Milan) Regions.

You can also build your own extensions. To learn how to build extensions, see the Lambda Extensions API in the AWS Lambda Developer Guide. You can send feedback through the AWS forum for AWS Lambda or through your usual AWS Support contacts.

Channy

Update. Watch a quick introductory video and a deep dive playlist about AWS Lambda Extensions for more information.

New – Create Microsoft SQL Server Instances of Amazon RDS on AWS Outposts

Last year, we announced Amazon Relational Database Service (Amazon RDS) on AWS Outposts, which allows you to deploy fully managed database instances in your on-premises environments. AWS Outposts is a fully managed service that extends AWS infrastructure, AWS services, APIs, and tools to virtually any datacenter, co-location space, or on-premises facility for a truly consistent hybrid experience.

You can deploy Amazon RDS on Outposts to set up, operate, and scale MySQL and PostgreSQL relational databases on premises, just as you would in the cloud. Amazon RDS on Outposts provides cost-efficient and resizable capacity for on-premises databases and automates time-consuming administrative tasks, including infrastructure provisioning, database setup, patching, and backups, so you can focus on your applications.

Today, I am happy to announce support for Microsoft SQL Server on Outposts as a new database engine. You can deploy SQL Server on Outposts for low latency workloads that need to be run in close proximity to your on-premises data and applications. All operations that are currently supported for MySQL and PostgreSQL on RDS on Outposts can be performed with RDS for SQL Server on Outposts.

Creating a SQL Server Instance on Outposts
To get started with Amazon RDS for SQL Server on Outposts, in the Amazon RDS console, choose Create Database. Use the AWS Region that serves as home base for your Outpost.

Creating a SQL Server instance is similar to creating a MySQL or PostgreSQL database engine on Outposts. For Database location, choose On-premises. For On-premises creation method, choose RDS on Outposts, as shown here:

In Engine options, for Engine type, choose Microsoft SQL Server, and then choose your edition (SQL Server Enterprise Edition, Standard Edition, or Web Edition) and version. The latest available minor version of each major release, from SQL Server 2016, 2017 up to the newest, which is 2019. There are plans to add more editions and versions based on your feedback.

RDS for SQL Server on Outposts supports the license-included licensing model. You do not need separately purchase Microsoft SQL Server licenses. The license included pricing is inclusive of software and Amazon RDS management capabilities.

In DB instance class, choose the size of the instance. You can select between Standard classes (db.m5) or Memory Optimized classes (db.r5) for SQL Server.

The process for configuring an Amazon Virtual Private Cloud  (Amazon VPC) subnet for your Outpost, database credentials, and the desired amount of SSD storage is same as creating RDS instances. When everything is ready to go, choose Create database. The stat of your instance starts out as Creating and transitions to Available when your DB instance is ready:

After the DB instance is ready, you simply run a test query to use the new endpoint:

Now Available
Amazon RDS for Microsoft SQL Server on AWS Outposts is now available on your Outpost today. When you use Amazon RDS on Outposts, as with Amazon RDS, you pay only for what you use. There is no minimum fee. For more information, see the RDS on Outposts pricing page.

To learn more, see the product page and Working with Amazon RDS on AWS Outposts in the Amazon RDS User Guide. Please send us feedback either in the AWS forum for Amazon RDS or through your usual AWS support contacts.

Learn more about Amazon RDS for SQL Server on Outposts and get started today.

Channy

AWS Local Zones Are Now Open in Boston, Miami, and Houston

AWS Local Zones place select AWS services (compute, storage, database, and so forth) close to large population, industry, and IT centers. They support use cases such as real-time gaming, hybrid migrations, and media & entertainment content creation that need single-digit millisecond latency for end-users in a specific geographic area.

Last December I told you about our plans to launch a total of fifteen AWS Local Zones in 2021, and also announced the preview of the Local Zones in Boston, Miami, and Houston. Today I am happy to announce that these three Local Zones are now ready to host your production workloads, joining the existing pair of Local Zones in Los Angeles. As I mentioned in my original post, each Local Zone is a child of a particular parent region, and is managed by the control plane in the region. The parent region for all three of these zones is US East (N. Virginia).

Using Local Zones
To get started, I need to enable the Zone(s) of interest. I can do this from the command line (modify-availability-zone-group), via an API call (ModifyAvailabilityZoneGroup), or from within the EC2 Console. From the console, I enter the parent region and click Zones in the Account attributes:

I can see the Local Zones that are within the selected parent region. I click Manage for the desired Local Zone:

I click Enabled and Update zone group, and I am ready to go!

I can create Virtual Private Clouds (VPCs), launch EC2 instances, create SSD-backed EBS volumes (gp2), and set up EKS and ECS clusters in the Local Zone (see the AWS Local Zones Features page for a full list of supported services).

The new Local Zones currently offer the following instances:

Name Category vCPUs Memory
(GiB)
t3.xlarge General Purpose 4 16
c5d.2xlarge Compute Intensive 8 16
g4dn.2xlarge GPU 8 32
r5d.2xlarge Memory Intensive 8 64

I can also use EC2 Auto Scaling, AWS CloudFormation, and Amazon CloudWatch in the parent region to monitor and control these resources.

Local Zones in Action
AWS customers are already making great use of all five operational Local Zones! After talking with my colleagues, I learned that the use cases for Local Zones can be grouped into two categories:

Distributed Edge – These customers want to place selected parts of their gaming, social media, and voice assistant applications in multiple, geographically disparate locations in order to deliver a low-latency experience to their users.

Locality – These customers need access to cloud services in specific locations that are in close proximity to their existing branch offices, data centers, and so forth. In addition to low-latency access, they often need to process and store data within a specific geographic region in order to meet regulatory requirements. These customers often run a third party software VPN appliance on Amazon EC2 instance to connect to the desired Local Zone.

Here are a few examples:

Ambra Health provides a cloud-based medical image management suite that empowers some of the largest health systems, radiology practices, and clinical research organizations. The suite replaces traditional desktop image viewers, and uses Local Zones to provide radiologists with rapid access to high quality images so that they can focus on improving patient outcomes.

Couchbase is an award-winning distributed NoSQL cloud database for modern enterprise applications. Couchbase is using AWS Local Zones to provide low latency and single-digit millisecond data access times for applications, ensuring developers’ apps are always available and fast. Using Local Zones, along with Couchbase’s edge computing capabilities, means that their customers are able to store, query, search, and analyze data in real-time

Edgegap is a game hosting service provider focused on providing the best online experience for their customers. AWS Local Zones gives their customers (game development studios such as Triple Hill Interactive, Agog Entertainment, and Cofa Games) the ability to take advantage of ever-growing list of locations and to deploy games with a simple API call.

JackTrip is using Local Zones to allow musicians in multiple locations to collaboratively perform well-synchronized live music over the Internet.

Masomo is an interactive entertainment company that focuses on mobile games including Basketball Arena and Head Ball 2. They use Local Zones to deploy select, latency-sensitive portions of their game servers close to end users, with the goal of improving latency, reducing churn, and providing players with a great experience.

Supercell deploys game servers in multiple AWS regions, and evaluates all new regions as they come online. They are already using Local Zones as deployment targets and considering additional Local Zones as they become available in order to bring the latency-sensitive portions of game servers closer to more end users.

Takeda (a global biopharmaceutical company) is planning to create a hybrid environment that spans multiple scientific centers in and around Boston, with compute-intensive R&D workloads running in the Boston Local Zone.

Ubitus deploys game servers in multiple locations in order to reduce latency and to provide users with a consistent, high-quality experience. They see Local Zones as a (no pun intended) game-changer, and will use it them to deploy and test clusters in multiple cities in pursuit of that consistent experience.

Learn More
Here are some resources to help you learn more about AWS Local Zones:

Stay Tuned
We are currently working on twelve additional AWS Local Zones (Atlanta, Chicago, Dallas, Denver, Kansas City, Las Vegas, Minneapolis, New York, Philadelphia, Phoenix, Portland, and Seattle) and plan to open them throughout the remainder of 2021. We are also putting plans in place to expand to additional locations, both in the US and elsewhere. If you would like to express your interest in a particular location, please let us know by filling out the AWS Local Zones Interest Form.

Over time we plan to add additional AWS services, including AWS Direct Connect and more EC2 instance types in these new Local Zones, all driven by feedback from our customers.

Jeff;

 

Happy 10th Birthday – AWS Identity and Access Management

Amazon S3 turned 15 earlier this year, and Amazon EC2 will do the same in a couple of months. Today we are celebrating the tenth birthday of AWS Identity and Access Management (IAM).

The First Decade
Let’s take a walk through the last decade and revisit some of the most significant IAM launches:

Am IAM policy in text form, shown in Windows Notepad.May 2011 – We launched IAM, with the ability to create users, groups of users, and to attach policy documents to either one, with support for fifteen AWS services. The AWS Policy Generator could be used to build policies from scratch, and there was also a modest collection of predefined policy templates. This launch set the standard for IAM, with fine-grained permissions for actions and resources, and the use of conditions to control when a policy is in effect. This model has scaled along with AWS, and remains central to IAM today.

August 2011 – We introduced the ability for you to use existing identities by federating into your AWS Account, including support for short-term temporary AWS credentials.

June 2012 – With the introduction of IAM Roles for EC2 instances, we made it easier for code running on an EC2 instance to make calls to AWS services.

February 2015 – We launched Managed Policies, and simultaneously turned the existing IAM policies into first-class objects that could be created, named, and used for multiple IAM users, groups, or roles.

AWS Organizations, with a root account and three accounts inside.February 2017 – We launched AWS Organizations, and gave you the ability to to implement policy-based management that spanned multiple AWS accounts, grouped into a hierarchy of Organizational Units. This launch also marked the debut of Service Control Policies (SCPs) that gave you the power to place guard rails around the level of access allowed within the accounts of an Organization.

April 2017 – Building on the IAM Roles for EC2 Instances, we introduced service-linked roles. This gave you the power to delegate permissions to AWS services, and made it easier for you to work with AWS services that needed to call other AWS services on your behalf.

December 2017 – We introduced AWS Single Sign-On to make it easier for you to centrally manage access to AWS accounts and your business applications. SSO is built on top of IAM and takes advantage of roles, temporary credentials, and other foundational IAM features.

November 2018 – We introduced Attribute-Based Access Control (ABAC) as a complement to the original Role-Based Access Control to allow you to use various types of user, resource, and environment attributes to drive policy & permission decisions. This launch allowed you to tag IAM users and roles, which allowed you to match identity attributes and resource attributes in your policies. After this launch, we followed up with support for the use of ABAC in conjunction with AWS SSO and Cognito.

IAM Access Analyzer, showing some active findingsDecember 2019 – We introduced IAM Access Analyzer to analyze your policies and determine which resources can be accessed publicly or from other accounts.

March 2021 – We added policy validation (over 100 policy checks) and actionable recommendations to IAM Access Analyzer in order to help you to construct IAM policies and SCPs that take advantage of time-tested AWS best practices.

April 2021 – We made it possible for you to generate least-privilege IAM policy templates based on access activity.

Then and Now
In the early days, a typical customer might use IAM to control access to a handful of S3 buckets, EC2 instances, and SQS queues, all in a single AWS account. These days, some of our customers use IAM to control access to billions of objects that span multiple AWS accounts!

Because every call to an AWS API must call upon IAM to check permissions, the IAM team has focused on availability and scalability from the get-go. Back in 2011 the “can the caller do this?” function handled a couple of thousand requests per second. Today, as new services continue to appear and the AWS customer base continues to climb, this function now handles more than 400 million API calls per second worldwide.

As you can see from my summary, IAM has come quite a long way from its simple yet powerful beginnings just a decade ago. While much of what was true a decade ago remains true today, I would like to call your attention to a few best practices that have evolved over time.

Multiple Accounts – Originally, customers generally used a single AWS account and multiple users. Today, in order to accommodate multiple business units and workloads, we recommend the use of AWS Organizations and multiple accounts. Even if your AWS usage is relatively simple and straightforward at first, your usage is likely to grow in scale and complexity, and it is always good to plan for this up front. To learn more, read Establishing Your Best Practice AWS Environment.

Users & SSO – In a related vein, we recommend that you use AWS SSO to create and manage users centrally, and then grant them access to one or more AWS accounts. To learn more, read the AWS Single Sign-On User Guide.

Happy Birthday, IAM
In line with our well-known penchant for Customer Obsession, your feedback is always welcome! What new IAM features and capabilities would you like to see in the decade to come? Leave us a comment and I will make sure that the team sees it.

And with that, happy 10th birthday, IAM!

Jeff;

 

Get Started Using Amazon FSx File Gateway for Fast, Cached Access to File Server Data in the Cloud

As traditional workloads continue to migrate to the cloud, some customers have been unable to take advantage of cloud-native services to host data typically held on their on-premises file servers. For example, data commonly used for team and project file sharing, or with content management systems, has needed to reside on-premises due to issues of high latency, or constrained or shared bandwidth, between customer premises and the cloud.

Today, I’m pleased to announce Amazon FSx File Gateway, a new type of AWS Storage Gateway that helps you access data stored in the cloud with Amazon FSx for Windows File Server, instead of continuing to use and manage on-premises file servers. Amazon FSx File Gateway uses network optimization and caching so it appears to your users and applications as if the shared data were still on-premises. By moving and consolidating your file server data into Amazon FSx for Windows File Server, you can take advantage of the scale and economics of cloud storage, and divest yourself of the undifferentiated maintenance involved in managing on-premises file servers, while Amazon FSx File Gateway solves issues around latency and bandwidth.

Replacing On-premises File Servers
Amazon FSx File Gateway is an ideal solution to consider when replacing your on-premises file servers. Low-latency access ensures you can continue to use latency-sensitive on-premises applications, and caching conserves shared bandwidth between your premises and the cloud, which is especially important when you have many users all attempting to access file share data directly.

You can attach an Amazon FSx file system and present it through a gateway to your applications and users provided they are all members of the same Active Directory domain, and the AD infrastructure can be hosted in AWS Directory Service, or managed on-premises.

Your data, as mentioned, resides in Amazon FSx for Windows File Server, a fully managed, highly reliable and resilient file system, eliminating the complexity involved in setting up and operating file servers, storage volumes, and backups. Amazon FSx for Windows File Server provides a fully native Windows file system in the cloud, with full Server Message Block (SMB) protocol support, and is accessible from Windows, Linux, and macOS systems running in the cloud or on-premises. Built on Windows Server, Amazon FSx for Windows File Server also exposes a rich set of administrative features including file restoration, data deduplication, Active Directory integration, and access control via Access Control Lists (ACLs).

Choosing the Right Gateway
You may be aware of Amazon S3 File Gateway (originally named File Gateway), and might now be wondering which type of workload is best suited for the two gateways:

  • With Amazon S3 File Gateway, you can access data stored in Amazon Simple Storage Service (Amazon S3) as files, and it’s also a solution for file ingestion into S3 for use in running object-based workloads and analytics, and for processing data that exists in on-premises files.
  • Amazon FSx File Gateway, on the other hand, is a solution for moving network-attached storage (NAS) into the cloud while continuing to have low-latency, seamless access for your on-premises users. This includes two general-purpose NAS use-cases that use the SMB file protocol: end-user home directories and departmental or group file shares. Amazon FSx File Gateway supports multiple users sharing files, with advanced data management features such as access controls, snapshots for data protection, integrated backup, and more.

One additional unique feature I want to note is Amazon FSx File Gateway integration with backups. This includes backups taken directly within Amazon FSx and those coordinated by AWS Backup. Prior to a backup starting, Amazon FSx for Windows File Server communicates with each attached gateway to ensure any uncommitted data gets flushed. This helps further reduce your administrative overhead and worries when moving on-premises file shares into the cloud.

Working with Amazon FSx File Gateway
Amazon FSx File Gateway is available using multiple platform options. You can order and deploy a hardware appliance into your on-premises environment, deploy as a virtual machine into your on-premises environment (VMware ESXi, Microsoft Hyper-V, Linux KVM), or deploy in cloud as an Amazon Elastic Compute Cloud (Amazon EC2) instance. The available options are displayed as you start to create a gateway from the AWS Storage Gateway Management Console, together with setup instructions for each option.

Below, I choose to use an EC2 instance for my gateway.

FSx File Gateway platform options

The process of setting up a gateway is pretty straightforward and as the documentation here goes into detail, I’m not going to repeat the flow in this post. Essentially, the steps involved are to first create a gateway, then join it to your domain. Next, you attach an Amazon FSx file system. After that, your remote clients can work with the data on the file system, but the important difference is that they connect using a network share to the gateway instead of to the Amazon FSx file system.

Below is the general configuration for my gateway, created in US East (N. Virginia).

FSx File Gateway Details

And here are the details of my Amazon FSx file system, running in an Amazon Virtual Private Cloud (VPC) in US East (N. Virginia), that will be attached to my gateway.

FSx File System Details

Note that I have created and activated the gateway in the same region as the source Amazon FSx file system, and will manage the gateway from US East (N. Virginia). The gateway virtual machine (VM) is deployed as an EC2 instance running in a VPC in our remote region, US West (Oregon). I’ve also established a peering connection between the two VPCs.

Once I have attached the Amazon FSx file system to my Amazon FSx File Gateway, in the AWS Storage Gateway Management Console I select FSx file systems and then the respective file system instance. This gives me the details of the command needed by my remote users to connect to the gateway.

Viewing the attached Amazon FSx File System

Exploring an End-user Scenario with Amazon FSx File Gateway
Let’s explore a scenario that may be familiar to many readers, that of a “head office” that has moved its NAS into the cloud, with one or more “branch offices” in remote locations that need to connect to those shares and the files they hold. In this case, my head office/branch office scenario is for a fictional photo agency, and is set up so I can explore the gateway’s cache refresh functionality. For this, I’m imagining a scenario where a remote user deletes some files accidentally, and then needs to contact an admin in the head office to have them restored. This is possibly a fairly common scenario, and one I know I’ve had to both request, and handle, in my career!

My head office for my fictional agency is located in US East (N. Virginia) and the local admin for that office (me) has a network share attached to the Amazon FSx file system instance. My branch office, where my agency photographers work, is located in the US West (Oregon) region, and users there connect to my agency’s network over a VPN (an AWS Direct Connect setup could also be used). In this scenario, I simulate the workstations at each office using Amazon Elastic Compute Cloud (EC2) instances.

In my fictional agency, photographers upload images to my agency’s Amazon FSx file system, connected via a network share to the the gateway. Even though my fictional head office, and the Amazon FSx file system itself are resources located on the east coast, the gateway and its cache provide a fast, low latency connection for users in the remote branch office, making it seem as though there is a local NAS. After photographers upload images from their assignments, additional staff in the head office do some basic work on them, and make the partly-processed images available back to the photographers on the west coast via the file share.

The image below illustrates the resource setup for my fictional agency.

My sample head/branch office setup, as AWS resources

I have set up scheduled multiple daily backups for the file system, as you might expect, but I’ve also gone a step further and enabled shadow copies on my Amazon FSx file system. Remember, Amazon FSx for Windows File Server is a Windows File Server instance, it just happens to be running in the cloud. You can find details of how to set up shadow copies (which are not enabled by default) in the documentation here. For the purposes of the fictional scenario in this blog post, I set up a schedule so that my shadow copies are taken every hour.

Back to my fictional agency. One of my photographers on the west coast, Alice, is logged in and working with a set of images that have already had some work done on them by the head office. In this image, it’s apparent Alice is connected and working on her images via the network share IP marked in an earlier image in this post – this is the gateway file share.

Suddenly, disaster strikes and Alice accidentally deletes all of the files in the folder she was working in. Picking up the phone, she calls the admin (me) in the east coast head office and explains the situation, wondering if we can get the files back.

Since I’d set up scheduled daily backups of the file system, I could probably restore the deleted files from there. This would involve a restore to a new file system, then copying the files from that new file system to the existing one (and deleting the new file system instance afterwards). But, having enabled shadow copies, in this case I can restore the deleted files without resorting to the backups. And, because I enabled automated cache refreshes on my gateway, with the refresh period set to every 5 minutes, Alice will see the restored files relatively quickly.

My admin machine (in the east coast office) has a network share to the Amazon FSx file system, so I open an explorer view onto the share, right-click the folder in question, and select Restore previous versions. This gives me a dialog where I can select the most recent shadow copy.

Restoring the file data from shadow copies

I ask Alice to wait 5 minutes, then refresh her explorer view. The changes in the Amazon FSx file system are propagated to the cache on the gateway and sure enough, she sees the files she accidentally deleted and can resume work. (When I saw this happen for real in my test setup, even though I was expecting it, I let out a whoop of delight!). Overall, I hope you can see how easy it is to set up and operate an Amazon FSx File Gateway with an Amazon FSx for Windows File Server.

Get Started Today with Amazon FSx File Gateway
Amazon FSx File Gateway provides a low-latency, efficient connection for remote users when moving on-premises Windows file systems into the cloud. This benefits users who experience higher latencies, and shared or limited bandwidth, between their premises and the cloud. Amazon FSx File Gateway is available today in all commercial AWS regions where Amazon FSx for Windows File Server is available. It’s also available in the AWS GovCloud (US-West) and AWS GovCloud (US-East) regions, and the Amazon China (Beijing), and China (Ningxia) regions.

You can learn more on this feature page, and get started right away using the feature documentation.

AQUA (Advanced Query Accelerator) – A Speed Boost for Your Amazon Redshift Queries

Amazon Redshift already provides up to 3x better price-performance at any scale than any other cloud data warehouse. We do this by designing our own hardware and by using Machine Learning (ML).

For example, we launched the SSD-based RA3 nodes for Amazon Redshift at the end of 2019 (Amazon Redshift Update – Next-Generation Compute Instances and Managed, Analytics-Optimized Storage) and added additional node sizes last April (Amazon Redshift update – ra3.4xlarge Nodes), and last December (Amazon Redshift Launches RA3.xlplus Nodes With Managed Storage). In addition to high-bandwidth networking, RA3 nodes incorporate a sophisticated data management model. As I said when we launched the RA3 nodes:

There’s a cache of large-capacity, high-performance SSD-based storage on each instance, backed by S3, for scale, performance, and durability. The storage system uses multiple cues, including data block temperature, data blockage, and workload patterns, to manage the cache for high performance. Data is automatically placed into the appropriate tier, and you need not do anything special to benefit from the caching or the other optimizations.

Our customers use RA3 nodes to maintain very large data sets and are seeing great results. From digital interactive entertainment to tracking impressions and performance for media buys, Amazon Redshift and RA3 nodes help our customers to store and query data at world scale, with up to 32 PB of data in a single data warehouse.

On the downside, it turns out that advances in storage performance have outpaced those in CPU performance, even as data warehouses continue to grow. The combination of large amounts of data (often accessed by queries that mandate a full scan), and limits on network traffic, can result in a situation where network and CPU bandwidth become limiting factors.

We can do something about that…

Introducing AQUA
Today we are making the ra3.4xl and ra3.16xl nodes even more powerful with the addition of AQUA (Advanced Query Accelerator). Building on the caches that I told you about earlier, and taking advantage of the AWS Nitro System and custom FPGA-based acceleration, AQUA pushes the computation needed to handle reduction and aggregation queries closer to the data. This reduces network traffic, offloads work from the CPUs in the RA3 nodes, and allows AQUA to improve the performance of those queries by up to 10x, at no extra cost and without any code changes. AQUA also makes use of a fast, high-bandwidth connection to Amazon Simple Storage Service (Amazon S3).

You can watch this video to learn a lot more about how AQUA uses the custom-designed hardware in the AQUA nodes to accelerate queries. The benefit comes about in several different ways. Each node performs the reduction and aggregation operations in parallel with the others. In addition to getting the n-fold speedup due to parallelism, the amount of data that must be sent to and processed on the compute nodes is generally far smaller (often just 5% of the original). Here’s a diagram that shows how all of the elements come together to accelerate queries:

If you are already using ra3.4xl or ra3.16xl nodes to host your data warehouse, you can start using AQUA in minutes. You simply enable AQUA for your clusters, restart them, and benefit from vastly improved performance for your reduction and aggregation queries. If you are ready to move into the future with RA3 and AQUA, you can create a new RA3-based cluster from a snapshot of your existing one, or you can use Classic resize to do an in-place upgrade.

Using AQUA
I don’t happen to have a data warehouse! I used a snapshot provided by the Redshift team to create a pair of clusters. The first one (prod-cluster) does not have AQUA enabled, and the second one (test-cluster) does:

To create the AQUA-enabled cluster, I simply choose Turn on on the Cluster configuration page:

My queries will use the lineitem table, which has over 18 billion rows:

I create a session on each cluster and disable the Redshift result cache:

And then I run the same query on both clusters:

select sum(l_orderkey), count(*) from lineitem where
l_comment similar to 'slyly %' or
l_comment similar to 'plant %' or
l_comment similar to 'fina %' or
l_comment similar to 'quick %' or
l_comment similar to 'slyly %' or
l_comment similar to 'quickly %' or
l_comment similar to ' %about%' or
l_comment similar to ' final%' or
l_comment similar to ' %final%' or
l_comment similar to ' breach%' or
l_comment similar to ' egular%' or
l_comment similar to ' %closely%' or
l_comment similar to ' closely%' or
l_comment similar to ' %idea%' or
l_comment similar to ' idea%' ; 

If you take a look at the diagram above (and perhaps watch the video), you can see why AQUA can handle queries of this type very efficiently. Instead of sequentially scanning all 18 billion or so rows on the compute nodes, AQUA distributes the collection of similar to expressions to multiple AQUA nodes where they are run in parallel.

The query on the cluster that has AQUA enabled finishes in less than a minute:

The query on the cluster that does not have AQUA enabled finishes in a little under 4 minutes:

As is always the case with databases, complex data, and equally complex queries, your mileage will vary. For example, you could imagine a query that did a complex JOIN of rows SELECTed from multiple tables, where each SELECT would benefit from AQUA, and the overall speedup could be even greater. As you can see from the simple query that I used for this post, AQUA can dramatically reduce query time and perhaps even enable some new types of somewhat real-time queries that were simply not possible or practical in the past.

Things to Know
Here are a couple of interesting facts about AQUA:

Cluster Version – Your clusters must be running Redshift version 1.0.24421 or later in order to be able to make use of AQUA. To learn more about how to enable and disable AQUA, read Managing an AQUA Cluster.

Relevant Queries – AQUA is designed to deliver up to 10X performance on queries that perform large scans, aggregates, and filtering with LIKE and SIMILAR_TO predicates. Over time we expect to add support for additional queries.

Security – All data cached by AQUA is encrypted using your keys. After performing a filtering or aggregation operation, AQUA compresses the results, encrypts them, and returns them to Redshift.

Regions – AQUA is available today in the US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), and Asia Pacific (Tokyo) Regions, and will be coming to Europe (Frankfurt), Asia Pacific (Sydney), and Asia Pacific (Singapore) in the first half of 2021.

Pricing – As I mentioned earlier, there’s no additional charge for AQUA.

Try AQUA Today
If you are using ra3.4xl or ra3.16xl nodes to power your Redshift cluster, you can enable AQUA, restart the cluster, and run some test queries within minutes. Take AQUA for a spin and let me know what you think!

Jeff;