Category: <span>News</span>

Step Functions Distributed Map – A Serverless Solution for Large-Scale Parallel Data Processing

I am excited to announce the availability of a distributed map for AWS Step Functions. This flow extends support for orchestrating large-scale parallel workloads such as the on-demand processing of semi-structured data.

Step Function’s map state executes the same processing steps for multiple entries in a dataset. The existing map state is limited to 40 parallel iterations at a time. This limit makes it challenging to scale data processing workloads to process thousands of items (or even more) in parallel. In order to achieve higher parallel processing prior to today, you had to implement complex workarounds to the existing map state component.

The new distributed map state allows you to write Step Functions to coordinate large-scale parallel workloads within your serverless applications. You can now iterate over millions of objects such as logs, images, or .csv files stored in Amazon Simple Storage Service (Amazon S3). The new distributed map state can launch up to ten thousand parallel workflows to process data.

You can process data by composing any service API supported by Step Functions, but typically, you will invoke Lambda functions to process the data with code written in your favorite programming language.

Step Functions distributed map supports a maximum concurrency of up to 10,000 executions in parallel, which is well above the concurrency supported by many other AWS services. You can use the maximum concurrency feature of the distributed map to ensure that you do not exceed the concurrency of a downstream service. There are two factors to consider when working with other services. First, the maximum concurrency supported by the service for your account. Second, the burst and ramping rates, which determine how quickly you can achieve the maximum concurrency.

Let’s use Lambda as an example. Your functions’ concurrency is the number of instances that serve requests at a given time. The default maximum concurrency quota for Lambda is 1,000 per AWS Region. You can ask for an increase at any time. For an initial burst of traffic, your functions’ cumulative concurrency in a Region can reach an initial level of between 500 and 3000, which varies per Region. The burst concurrency quota applies to all your functions in the Region.

When using a distributed map, be sure to verify the quota on downstream services. Limit the distributed map maximum concurrency during your development, and plan for service quota increases accordingly.

To compare the new distributed map with the original map state flow, I created this table.

Original map state flow New distributed map flow
Sub workflows
  • Runs a sub-workflow for each item in an array. The array must be passed from the previous state.
  • Each iteration of the sub-workflow is called a map iteration, and its events are added to the state machine’s execution history.
  • Runs a sub-workflow for each item in an array or Amazon S3 dataset.
  • Each sub-workflow is run as a totally separate child execution, with its own event history.
Parallel branches Map iterations run in parallel, with an effective maximum concurrency of around 40 at a time. Can pass millions of items to multiple child executions, with concurrency of up to 10,000 executions at a time.
Input source Accepts only a JSON array as input. Accepts input as Amazon S3 object list, JSON arrays or files, csv files, or Amazon S3 inventory.
Payload 256 KB Each iteration receives a reference to a file (Amazon S3) or a single record from a file (state input). Actual file processing capability is limited by Lambda storage and memory.
Execution history 25,000 events Each iteration of the map state is a child execution, with up to 25,000 events each (express mode has no limit on execution history).

Sub-workflows within a distributed map work with both Standard workflows and the low-latency, short-duration Express Workflows.

This new capability is optimized to work with S3. I can configure the bucket and prefix where my data are stored directly from the distributed map configuration. The distributed map stops reading after 100 million items and supports JSON or csv files of up to 10GB.

When processing large files, think about downstream service capabilities. Let’s take Lambda again as an example. Each input—a file on S3, for example—must fit within the Lambda function execution environment in terms of temporary storage and memory. To make it easier to handle large files, Lambda Powertools for Python introduced a new streaming feature to fetch, transform, and process S3 objects with minimal memory footprint. This allows your Lambda functions to handle files larger than the size of their execution environment. To learn more about this new capability, check the Lambda Powertools documentation.

Let’s See It in Action
For this demo, I will create a workflow that processes one thousand dog images stored on S3. The images are already stored on S3.

➜  ~ aws s3 ls awsnewsblog-distributed-map/images/
2022-11-08 15:03:36      27034 n02085620_10074.jpg
2022-11-08 15:03:36      34458 n02085620_10131.jpg
2022-11-08 15:03:36      12883 n02085620_10621.jpg
2022-11-08 15:03:36      34910 n02085620_1073.jpg
...

➜  ~ aws s3 ls awsnewsblog-distributed-map/images/ | wc -l
    1000

The workflow and the S3 bucket must be in the same Region.

To get started, I navigate to the Step Functions page of the AWS Management Console and select Create state machine. On the next page, I choose to design my workflow using the visual editor. The distributed map works with Standard workflows, and I keep the default selection as-is. I select Next to enter the visual editor.

Distributed Map - create a workflowIn the visual editor, I search and select the Map component on the left-side pane, and I drag it to the workflow area. On the right side, I configure the component. I choose Distributed as Processing mode and Amazon S3 as Item Source.

Distributed maps are natively integrated with S3. I enter the name of the bucket (awsnewsblog-distributed-map) and the prefix (images) where my images are stored.

On the Runtime Settings section, I choose Express for Child workflow type. I also may decide to restrict the Concurrency limit. It helps to ensure we operate within the concurrency quotas of the downstream services (Lambda in this demo) for a particular account or Region.

By default, the output of my sub-workflows will be aggregated as state output, up to 256KB. To process larger outputs, I may choose to Export map state results to Amazon S3.

Distributed Map - add a Lambda invocation

Finally, I define what to do for each file. In this demo, I want to invoke a Lambda function for each file in the S3 bucket. The function exists already. I search for and select the Lambda invocation action on the left-side pane. I drag it to the distributed map component. Then, I use the right-side configuration panel to select the actual Lambda function to invoke: AWSNewsBlogDistributedMap in this example.

Distributed Map - add a Lambda invocation

When I am done, I select Next. I select Next again on the Review generated code page (not shown here).

On the Specify state machine settings page, I enter a Name for my state machine and the IAM Permissions to run. Then, I select Create state machine.

Create State Machine - Final ScreenNow I am ready to start the execution. On the State machine page, I select the new workflow and select Start execution. I can optionally enter a JSON document to pass to the workflow. In this demo, the workflow does not handle the input data. I leave it as-is, and I select Start execution.

Start workflow execution Start workflow execution - pass input data

During the execution of the workflow, I can monitor the progress. I observe the number of iterations, and the number of items successfully processed or in error.

I can drill down on one specific execution to see the details.

Distributed Map - monitor execution details

With just a few clicks, I created a large-scale and heavily parallel workflow able to handle a very large quantity of data.

Which AWS Service Should I Use
As often happens on AWS, you might observe an overlap between this new capability and existing services such as AWS Glue, Amazon EMR, or Amazon S3 Batch Operations. Let’s try to differentiate the use cases.

In my mental model, data scientists and data engineers use AWS Glue and EMR to process large amounts of data. On the other hand, application developers will use Step Functions to add serverless data processing into their applications. Step Functions is able to scale from zero quickly, which makes it a good fit for interactive workloads where customers may be waiting for the results. Finally, system administrators and IT operation teams are likely to use Amazon S3 Batch Operations for single-step IT automation operations such as copying, tagging, or changing permissions on billions of S3 objects.

Pricing and Availability
AWS Step Functions’ distributed map is generally available in the following ten AWS Regions: US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Singapore, Sydney, Tokyo), Canada (Central), and Europe (Frankfurt, Ireland, Stockholm).

The pricing model for the existing inline map state does not change. For the new distributed map state, we charge one state transition per iteration. Pricing varies between Regions, and it starts at $0.025 per 1,000 state transitions. When you process your data using express workflows, you are also charged based on the number of requests for your workflow and its duration. Again, prices vary between Regions, but they start at $1.00 per 1 million requests and $0.06 per GB-hour (prorated to 100ms).

For the same amount of iterations, you will observe a cost reduction when using the combination of the distributed map and standard workflows compared to the existing inline map. When you use express workflows, expect the costs to stay the same for more value with the distributed map.

I am really excited to discover what you will build using this new capability and how it will unlock innovation. Go start to build highly parallel serverless data processing workflows today!

— seb

New for Amazon SageMaker – Perform Shadow Tests to Compare Inference Performance Between ML Model Variants

As you move your machine learning (ML) workloads into production, you need to continuously monitor your deployed models and iterate when you observe a deviation in your model performance. When you build a new model, you typically start validating the model offline using historical inference request data. But this data sometimes fails to account for current, real-world conditions. For example, new products might become trending that your product recommendation model hasn’t seen yet. Or, you experience a sudden spike in the volume of inference requests in production that you never exposed your model to before.

Today, I’m excited to announce Amazon SageMaker support for shadow testing!

Deploying a model in shadow mode lets you conduct a more holistic test by routing a copy of the live inference requests for a production model to the new (shadow) model. Yet, only the responses from the production model are returned to the calling application. Shadow testing helps you build further confidence in your model and catch potential configuration errors and performance issues before they impact end users. Once you complete a shadow test, you can use the deployment guardrails for SageMaker inference endpoints to safely update your model in production.

Get Started with Amazon SageMaker Shadow Testing
You can create shadow tests using the new SageMaker Inference Console and APIs. Shadow testing gives you a fully managed experience for setup, monitoring, viewing, and acting on the results of shadow tests. If you have existing workflows built around SageMaker endpoints, you can also deploy a model in shadow mode using the existing SageMaker Inference APIs.

On the SageMaker console, select Inference and Shadow tests to create, monitor, and deploy shadow tests.

Amazon SageMaker Shadow Tests

To create a shadow test, select an existing (or create a new) SageMaker endpoint and production variant you want to test against.

Amazon SageMaker - Create Shadow Test

Next, configure the proportion of traffic to send to the shadow variant, the comparison metrics you want to evaluate, and the duration of the test. You can also enable data capture for your production and shadow variant.

Amazon SagMaker - Create Shadow Test

That’s it. SageMaker now automatically deploys the new variant in shadow mode and routes a copy of the inference requests to it in real time, all within the same endpoint. The following diagram illustrates this workflow.

Amazon SageMaker - Shadow Testing

Note that only the responses of the production variant are returned to the calling application. You can choose to either discard or log the responses of the shadow variant for offline comparison.

You can also use shadow testing to validate changes you made to any component in your production variant, including the serving container or ML instance. This can be useful when you’re upgrading to a new framework version of your serving container, applying patches, or if you want to make sure that there is no impact to latency or error rate due to this change. Similarly, if you consider moving to another ML instance type, for example, Amazon EC2 C7g instances based on AWS Graviton processors, or EC2 G5 instances powered by NVIDIA A10G Tensor Core GPUs, you can use shadow testing to evaluate the performance on production traffic prior to rollout.

You can monitor the progress of the shadow test and performance metrics such as latency and error rate through a live dashboard. On the SageMaker console, select Inference and Shadow tests, then select the shadow test you want to monitor.

Amazon SageMaker - Monitor Shadow Test

Amazon SageMaker - Monitor Shadow Test

If you decide to promote the shadow model to production, select Deploy shadow variant and define the infrastructure configuration to deploy the shadow variant.

Amazon SageMaker - Deploy Shadow Variant

Amazon SageMaker - Deploy Shadow Variant

You can also use the SageMaker deployment guardrails if you want to add linear or canary traffic shifting modes and auto rollbacks to your update.

Availability and Pricing
SageMaker support for shadow testing is available today in all AWS Regions where SageMaker hosting is available except for the AWS GovCloud (US) Regions and AWS China Regions.

There is no additional charge for SageMaker shadow testing other than usage charges for the ML instances and ML storage provisioned to host the shadow variant. The pricing for ML instances and ML storage dimensions is the same as the real-time inference option. There is no additional charge for data processed in and out of shadow deployments. The SageMaker pricing page has all the details.

To learn more, visit Amazon SageMaker shadow testing.

Start validating your new ML models with SageMaker shadow tests today!

— Antje

AWS Machine Learning University New Educator Enablement Program to Build Diverse Talent for ML/AI Jobs

AWS Machine Learning University is now providing a free educator enablement program. This program provides faculty at community colleges, minority-serving institutions (MSIs), and historically Black colleges and universities (HBCUs) with the skills and resources to teach data analytics, artificial intelligence (AI), and machine learning (ML) concepts to build a diverse pipeline for in-demand jobs of today and tomorrow.

According to the National Science Foundation, Black and Hispanic or Latino students earn bachelor’s degrees in Computer Science—the dominant pathway to AI/ML—at a much lower rate than their white peers, earning less than 11 percent of computer science degrees awarded. However, research shows that having diverse perspectives among skilled practitioners and across the AI/ML lifecycle contributes to the development of AI/ML systems that are safe, trustworthy, and have less bias. 

In 2018, we announced the Machine Learning University (MLU) to share with all developers the same courses that we used to train engineers at Amazon and AWS. This platform offers self-service, self-paced, AI/ML digital courses.

Machine Learning University home page

And today, we add this new program to our AI/ML training offering. Although anyone could access the MLU self-paced learning, it places the burden on the learner to source prerequisite work and solutions. This educator enablement program takes the concepts and lessons developed by MLU and makes them more accessible to educators. It offers a year-round educator enablement program with lesson planning, course playbooks, and access to free compute resources.

Program Details
Educators are onboarded in small-group cohorts into bootcamps where they will learn the material and deep dive into how to teach it via instructor-led lectures and hands-on projects. Educators who complete the bootcamp can take part in different year-round development opportunities, such as a dedicated Slack channel to share teaching best practices, education topic series and virtual study sessions moderated by MLU instructors, and regional events for continued professional development. Also, they will receive continuing education credits and AWS-provided stipends.

Faculty and students get access to instructional material through Amazon SageMaker Studio Lab. SageMaker Studio Lab was announced last year and is AWS’s free (no credit card required) ML development environment. It provides computing and storage for anybody that wants to learn and experiment with ML. Institutions can unlock additional resources to support their ML programs by registering for AWS Academy. AWS Academy unlocks all the AWS services for a complete AI/ML program.

Community colleges and universities can integrate this educator enablement program into their computer science, information technology, and business curricula to create an AI/ML course, certificate, or degree. We have worked with educators and education boards such as Houston Community College to create content that is vetted for credit-worthy and degree-earning curricula.

In August 2022, we launched our first educator bootcamp in partnership with The Coding School. The bootcamp was delivered over two weeks, offering lectures, case studies, and hands-on projects. 25 educators completed the Educator Machine Learning Bootcamp, representing 22 US community colleges and universities.

Learn More and Join The Program
During 2023, AWS Machine Learning University will run six educator-enablement cohorts starting in January. The program will give priority consideration to educators at community colleges, MSIs, and HBCUs, in alignment with this program mission to increase access to AI/ML technology to historically underserved and underrepresented students.

If you are a computer science educator or part of a board of educators interested in fostering more depth in your computer science coursework, you should sign up for the educator enablement program.

Marcia

GC Security and Wazuh sign a partnership agreement

San Jose, California, November 2022. We are delighted to announce that GC Security has signed a partnership agreement with Wazuh. GC Security is a worldwide cybersecurity solutions provider that aims to create safer digital environments to drive innovation, transformation, and value generation.

GC Security uses manual and automated inputs from its risk-centralizing platform to perform context and critical analysis using artificial intelligence, allowing a risk-based approach that employs the insights and tactics used by cybercriminals.

In addition to producing technical outputs, the analyses enable actions to be taken to mitigate risk, execute inputs to measure cybersecurity maturity and engage all levels of the company.

“GC Security has been part of the Wazuh community since the first months of its launch. We are very familiar with the tool and trust its capabilities: collecting, centralizing, and correlating logs to create rules while also being an excellent monitoring solution. As an organic user with a long history of using Wazuh, this partnership is a natural and motivating move for our company to partake in Wazuh’s ongoing growth in our market,” commented Rodrigo Gava, Head of Operations of GC Security.

GC Security works with different segments and company sizes, with numerous cases in industries such as telecommunications, financial institutions, urban mobility, retail, and technology. 

“We are pleased to know that GC Security has been part of the Wazuh community for so long and that they rely on our capabilities to provide their customers with the best assistance. It is a great honor that a company with such global experience in cybersecurity teamed up with Wazuh,” remarked Alberto Gonzalez, COO at Wazuh.

If you want to learn more about GC Security, please visit its official website. For more information on Wazuh Partnerships, please visit our partners’ page.

The post GC Security and Wazuh sign a partnership agreement appeared first on Wazuh.

Getting Started with Amazon ECS Anywhere – Now Generally Available

Since Amazon Elastic Container Service (Amazon ECS) was launched in 2014, AWS has released other options for running Amazon ECS tasks outside of an AWS Region such as AWS Wavelength, an offering for mobile edge devices or AWS Outposts, a service that extends to customers’ environments using hardware owned and fully managed by AWS.

But some customers have applications that need to run on premises due to regulatory, latency, and data residency requirements or the desire to leverage existing infrastructure investments. In these cases, customers have to install, operate, and manage separate container orchestration software and need to use disparate tooling across their AWS and on-premises environments. Customers asked us for a way to manage their on-premises containers without this added complexity and cost.

Following Jeff’s preannouncement last year, I am happy to announce the general availability of Amazon ECS Anywhere, a new capability in Amazon ECS that enables customers to easily run and manage container-based applications on premises, including virtual machines (VMs), bare metal servers, and other customer-managed infrastructure.

With ECS Anywhere, you can run and manage containers on any customer-managed infrastructure using the same cloud-based, fully managed, and highly scalable container orchestration service you use in AWS today. You no longer need to prepare, run, update, or maintain your own container orchestrators on premises, making it easier to manage your hybrid environment and leverage the cloud for your infrastructure by installing simple agents.

ECS Anywhere provides consistent tooling and APIs for all container-based applications and the same Amazon ECS experience for cluster management, workload scheduling, and monitoring both in the cloud and on customer-managed infrastructure. You can now enjoy the benefits of reduced cost and complexity by running container workloads such as data processing at edge locations on your own hardware maintaining reduced latency, and in the cloud using a single, consistent container orchestrator.

Amazon ECS Anywhere – Getting Started
To get started with ECS Anywhere, register your on-premises servers or VMs (also referred to as External instances) in the ECS cluster. The AWS Systems Manager Agent, Amazon ECS container agent, and Docker must be installed on these external instances. Your external instances require an IAM role that permits them to communicate with AWS APIs. For more information, see Required IAM permissions in the ECS Developer Guide.

To create a cluster for ECS Anywhere, on the Create Cluster page in the ECS console, choose the Networking Only template. This option is for use with either AWS Fargate or external instance capacity. We recommend that you use the AWS Region that is geographically closest to the on-premises servers you want to register.

This creates an empty cluster to register external instances. On the ECS Instances tab, choose Register External Instances to get activation codes and an installation script.

On the Step 1: External instances activation details page, in Activation key duration (in days), enter the number of days the activation key should remain active. The activation key can be used for up to 1,000 activations. In Number of instances, enter the number of external instances you want to register to your cluster. In Instance role, enter the IAM role to associate with your external instances.

Choose Next step to get a registration command.

On the Step 2: Register external instances page, copy the registration command. Run this command on the external instances you want to register to your cluster.

Paste the registration command in your on-premise servers or VMs. Each external instance is then registered as an AWS Systems Manager managed instance, which is then registered to your Amazon ECS clusters.

Both x86_64 and ARM64 CPU architectures are supported. The following is a list of supported operating systems:

  • CentOS 7, CentOS 8
  • RHEL 7
  • Fedora 32, Fedora 33
  • openSUSE Tumbleweed
  • Ubuntu 18, Ubuntu 20
  • Debian 9, Debian 10
  • SUSE Enterprise Server 15

When the ECS agent has started and completed the registration, your external instance will appear on the ECS Instances tab.

You can also add your external instances to the existing cluster. In this case, you can see both Amazon EC2 instances and external instances are prefixed with mi-* together.

Now that the external instances are registered to your cluster, you are ready to create a task definition. Amazon ECS provides the requiresCompatibilities parameter to validate that the task definition is compatible with the the EXTERNAL launch type when creating your service or running your standalone task. The following is an example task definition:

{
	"requiresCompatibilities": [
		"EXTERNAL"
	],
	"containerDefinitions": [{
		"name": "nginx",
		"image": "public.ecr.aws/nginx/nginx:latest",
		"memory": 256,
		"cpu": 256,
		"essential": true,
		"portMappings": [{
			"containerPort": 80,
			"hostPort": 8080,
			"protocol": "tcp"
		}]
	}],
	"networkMode": "bridge",
	"family": "nginx"
}

You can create a task definition in the ECS console. In Task Definition, choose Create new task definition. For Launch type, choose EXTERNAL and then configure the task and container definitions to use external instances.

On the Tasks tab, choose Run new task. On the Run Task page, for Cluster, choose the cluster to run your task definition on. In Number of tasks, enter the number of copies of that task to run with the EXTERNAL launch type.

Or, on the Services tab, choose Create. Configure service lets you specify copies of your task definition to run and maintain in a cluster. To run your task in the registered external instance, for Launch type, choose EXTERNAL. When you choose this launch type, load balancers, tag propagation, and service discovery integration are not supported.

The tasks you run on your external instances must use the bridge, host, or none network modes. The awsvpc network mode isn’t supported. For more information about each network mode, see Choosing a network mode in the Amazon ECS Best Practices Guide.

Now you can run your tasks and associate a mix of EXTERNAL, FARGATE, and EC2 capacity provider types with the same ECS service and specify how you would like your tasks to be split across them.

Things to Know
Here are a couple of things to keep in mind:

Connectivity: In the event of loss of network connectivity between the ECS agent running on the on-premises servers and the ECS control plane in the AWS Region, existing ECS tasks will continue to run as usual. If tasks still have connectivity with other AWS services, they will continue to communicate with them for as long as the task role credentials are active. If a task launched as part of a service crashes or exits on its own, ECS will be unable to replace it until connectivity is restored.

Monitoring: With ECS Anywhere, you can get Amazon CloudWatch metrics for your clusters and services, use the CloudWatch Logs driver (awslogs) to get your containers’ logs, and access the ECS CloudWatch event stream to monitor your clusters’ events.

Networking: ECS external instances are optimized for running applications that generate outbound traffic or process data. If your application requires inbound traffic, such as a web service, you will need to employ a workaround to place these workloads behind a load balancer until the feature is supported natively. For more information, see Networking with ECS Anywhere.

Data Security: To help customers maintain data security, ECS Anywhere only sends back to the AWS Region metadata related to the state of the tasks or the state of the containers (whether they are running or not running, performance counters, and so on). This communication is authenticated and encrypted in transit through Transport Layer Security (TLS).

ECS Anywhere Partners
ECS Anywhere integrates with a variety of ECS Anywhere partners to help customers take advantage of ECS Anywhere and provide additional functionality for the feature. Here are some of the blog posts that our partners wrote to share their experiences and offerings. (I am updating this article with links as they are published.)

Now Available
Amazon ECS Anywhere is now available in all commercial regions except AWS China Regions where ECS is supported. With ECS Anywhere, there are no minimum fees or upfront commitments. You pay per instance hour for each managed ECS Anywhere task. ECS Anywhere free tier includes 2200 instance hours per month for six months per account for all regions. For more information, see the pricing page.

To learn more, see ECS Anywhere in the Amazon ECS Developer Guide. Please send feedback to the AWS forum for Amazon ECS or through your usual AWS Support contacts.

Get started with the Amazon ECS Anywhere today.

Channy

Update. Watch a cool demo of ECS Anywhere to operate a Raspberry Pi cluster at home office and read its deep-dive blog post.

AWS Lambda Extensions Are Now Generally Available – Get Started with Your Favorite Operations Tools Today

In October 2020, we announced the preview of AWS Lambda extensions, which you can use to easily integrate Lambda functions with your favorite tools for monitoring, observability, security, and governance.

Today, I’m happy to announce the general availability of AWS Lambda Extensions which comes with new performance improvements and an expanded set of partners. As part of the GA release, we have enabled functions to send responses as soon as the function code is complete without waiting for the included extensions to finish. This enables extensions to perform activities like sending telemetry to a preferred destination after the function’s response has been returned. We also welcome extensions from new partners: Imperva, Instana, Sentry, Site24x7, and the AWS Distro for OpenTelemetry.

You can use Lambda extensions for use cases such as capturing diagnostic information before, during, and after function invocation; automatically instrumenting your code without needing code changes; fetching configuration settings or secrets before the function invocation; detecting and alerting on function activity through security agents; and sending telemetry to custom destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Kinesis, Amazon Elasticsearch Service directly and asynchronously from your Lambda functions.

Customers are drawn to the vision of Serverless. The reduced operational responsibility frees them up to focus on their business problems. To help customers monitor, observe, secure, and govern their functions, AWS Lambda provides native integrations for logs and metrics through Amazon CloudWatch, tracing through AWS X-Ray, tracking configuration changes through AWS Config, and recording API calls through AWS CloudTrail In addition, AWS Lambda partners provide tools for application management, API integration, deployment, monitoring, and security.

AWS Lambda extensions provide a simple way to extend the Lambda execution environment, which is where your function code is executed. AWS customers, partners, and the open source community can use the new Lambda Extensions API to build their own extensions, which are companion processes that augment the capabilities of Lambda functions. To learn how to build your own extensions, see the Building Extensions for AWS Lambda – In preview blog post. The post also includes information about changes to the Lambda lifecycle.

How AWS Lambda Extensions Works
AWS Lambda extensions are designed to be the easiest way to plug in the tools you use today without complex installation or configuration management. You can add tools to your functions using Lambda layers or include them in the image for functions deployed as container images.

Lambda extensions use the Extensions API to register for function and execution environment lifecycle events. In response to these events, extensions can start new processes or run logic. Lambda extensions can also use the Runtime Logs API to subscribe to a stream of the same logs that the Lambda service sends to Amazon CloudWatch directly from the Lambda execution environment. Lambda streams the logs to the extension, and the extension can then process, filter, and send the logs to any preferred destination.

Most customers will use Lambda extensions without needing to know about the capabilities of the Extensions API. You can just consume capabilities of an extension by configuring the options in your Lambda functions.

How to Use Lambda Extensions
You can install and manage extensions using the Lambda console, the AWS Command Line Interface (CLI), or infrastructure as code (IaC) services and tools such as AWS CloudFormation, AWS Serverless Application Model (AWS SAM), and Terraform.

To use Lambda extensions to integrate existing tools with your Lambda functions, choose your a Lambda function and on the Configuration tab, choose Monitoring and Operations tools.

On the Extensions page, you can find available extensions from AWS Lambda partners. Choose an extension to view its installation instructions.

AWS Lambda Extensions Partners
At this launch, Lambda extensions integrate with these AWS Lambda partners who have provided the following information to introduce their extensions. (I am updating this article with links as they are published.)

  • AppDynamics provides end-to-end transaction tracing for AWS Lambda. With the AppDynamics extension, it is no longer mandatory for developers to include the AppDynamics tracer as a dependency in their function code, making tracing transactions across hybrid architectures even simpler.
  • Coralogix is a log analytics and cloud security platform that empowers thousands of companies to improve security and accelerate software delivery, allowing you to get deep insights without paying for the noise. Coralogix can now read Lambda function logs and metrics directly, without using CloudWatch or Amazon S3, reducing the latency, and cost of observability.
  • The Datadog extension brings comprehensive, real-time visibility to your serverless applications. Combined with Datadog’s integration with AWS, you get metrics, traces, and logs to help you monitor, detect, and resolve issues at any scale. The Datadog extension makes it easier than ever to get telemetry from your serverless workloads.
  • The Dynatrace extension makes it even easier to bring AWS Lambda metrics and traces into the Dynatrace platform for intelligent observability and automatic root cause detection. Get comprehensive, end-to-end observability with the flip of a switch and no code changes.
  • Epsagon helps you monitor, troubleshoot, and lower the cost of your Lambda functions. Epsagon’s extension reduces the overhead of sending traces to the Epsagon service, with minimal performance impact to your function.
  • HashiCorp Vault allows you to secure, store, and tightly control access to your application’s secrets and sensitive data. With the Vault extension, you can now authenticate and securely retrieve dynamic secrets before your Lambda function is invoked.
  • Honeycomb is a powerful observability tool that helps you debug your entire production app stack. Honeycomb’s extension decreases the overhead, latency, and cost of sending events to the Honeycomb service, while increasing reliability.
  • Instana Enterprise Observability Platform ingests performance metrics, traces requests, and profiles processes to make observability work for the enterprise. The Instana Lambda extension offers modification-free, low latency tracing of Lambda functions backed by their real-time Enterprise Observability Platform.
  • Imperva Serverless Protection protects organizations from vulnerabilities created by misconfigured apps and code-level security risks in serverless computing environments. The Imperva extension enables customers to easily embed additional security in their DevOps processes for serverless applications without requiring any code changes, leading to faster time to market.
  • Lumigo provides a monitoring and observability platform for serverless and microservices applications. The Lumigo extension enables the new Lumigo Lambda Profiler to see a breakdown of function resources, including CPU, memory, and network metrics. Use the extension to receive actionable insights to reduce Lambda runtime duration and cost, fix bottlenecks, and increase efficiency.
  • Check Point CloudGuard provides full lifecycle security for serverless applications. The CloudGuard extension enables Function Self Protection data aggregation as an out-of-process extension, providing detection and alerting on application layer attacks.
  • New Relic enables you to efficiently monitor, troubleshoot, and optimize your Lambda functions. New Relic’s extension allows you send your Lambda service platform logs directly to New Relic’s unified observability platform, allowing you to quickly visualize data with minimal latency and cost.
  • Thundra provides an application debugging, observability and security platform for serverless, container and virtual machine (VM) workloads. The Thundra extension adds asynchronous telemetry reporting functionality to the Thundra agents, getting rid of network latency.
  • Splunk offers an enterprise-grade cloud monitoring solution for real-time full-stack visibility at scale. The Splunk extension provides a simplified runtime-independent interface to collect high-resolution observability data with minimal overhead. Monitor, manage, and optimize the performance and cost of your serverless applications with Splunk Observability solutions.
  • Sentry’s extension enables developers to monitor code health. From error tracking to performance monitoring, developers can see issues more clearly, solve them quicker, and continuously stay informed about the health of their applications, all without making code changes.
  • Site24x7 provides a performance monitoring solution for DevOps and IT operations. The Site24x7 extension enables real-time observability into your Lambda functions. It enables you to monitor critical Lambda metrics and function executions logs and optimize execution time and performance.
  • The Sumo Logic extension enables you to get instant visibility into the health and performance of your mission-critical applications using AWS Lambda. With this extension and Sumo Logic’s continuous intelligence platform, you can now ensure that all your Lambda functions are running as expected by analyzing function, platform, and extension logs to quickly identify and remediate errors and exceptions.

Here are Lambda extensions from AWS services:

  • AWS AppConfig helps you manage, store, and safely deploy application configurations to your hosts at runtime. The AWS AppConfig extension integrates Lambda and AWS AppConfig seamlessly. Lambda functions have simple access to external configuration settings quickly and easily. Developers can now dynamically change their Lambda function’s configuration safely using robust validation features.
  • Amazon CodeGuru Profiler helps developers improve application performance and reduce costs by pinpointing an application’s most expensive line of code. It provides recommendations for improving code to save money. The Lambda integration removes the need to change any code or redeploy packages.
  • Amazon CloudWatch Lambda Insights enables you to efficiently monitor, troubleshoot, and optimize Lambda functions. The Lambda Insights extension simplifies the collection, visualization, and investigation of detailed compute performance metrics, errors, and logs. You can more easily isolate and correlate performance problems to optimize your Lambda environments.
  • AWS Distro for OpenTelemetry is a secure, production-ready, AWS-supported distribution of the OpenTelemetry project. The Lambda extension runs the OpenTelemetry collector and enables functions to send trace data to AWS monitoring services such as AWS X-Ray and to any destination such as Honeycomb and Lightstep that supports OpenTelemetry Protocol (OTLP) using the OTLP exporter.

To get started with Lambda extensions, use the links provided to install these extensions.

Things to Know
Here are a couple of things to keep in mind:

Pricing: Extensions share the same billing model as Lambda functions and you are charged for compute time used in all phases of the Lambda lifecycle. For function invocations, you pay for requests served and the compute time used to run your code and all extensions, together, in 1ms increments. To learn more about billing for extensions, visit the Lambda FAQs page.

Performance: Lambda extensions might impact the performance of your function because they share resources such as CPU, memory, and storage with the function, and because extensions are initialized before function code. For example, if an extension performs compute-intensive operations, you might see your function’s execution duration increase because the extension and your function code share the same CPU resources.

Because Lambda uses allocates proportional CPU power based on the memory setting, you might see increased execution and initialization duration at lower memory settings as more processes compete for the same CPU resources. You can use CloudWatch metrics such as PostRuntimeExecutionDuration to measure the extra time the extension takes after the function execution and MaxMemoryUsed to measure the increase in memory used.

Available Now
The performance improvements announced as part of GA are currently in US East (N. Virginia), Europe (Ireland), and Europe (Milan) Regions.

You can also build your own extensions. To learn how to build extensions, see the Lambda Extensions API in the AWS Lambda Developer Guide. You can send feedback through the AWS forum for AWS Lambda or through your usual AWS Support contacts.

Channy

Update. Watch a quick introductory video and a deep dive playlist about AWS Lambda Extensions for more information.

AWS Local Zones Are Now Open in Boston, Miami, and Houston

AWS Local Zones place select AWS services (compute, storage, database, and so forth) close to large population, industry, and IT centers. They support use cases such as real-time gaming, hybrid migrations, and media & entertainment content creation that need single-digit millisecond latency for end-users in a specific geographic area.

Last December I told you about our plans to launch a total of fifteen AWS Local Zones in 2021, and also announced the preview of the Local Zones in Boston, Miami, and Houston. Today I am happy to announce that these three Local Zones are now ready to host your production workloads, joining the existing pair of Local Zones in Los Angeles. As I mentioned in my original post, each Local Zone is a child of a particular parent region, and is managed by the control plane in the region. The parent region for all three of these zones is US East (N. Virginia).

Using Local Zones
To get started, I need to enable the Zone(s) of interest. I can do this from the command line (modify-availability-zone-group), via an API call (ModifyAvailabilityZoneGroup), or from within the EC2 Console. From the console, I enter the parent region and click Zones in the Account attributes:

I can see the Local Zones that are within the selected parent region. I click Manage for the desired Local Zone:

I click Enabled and Update zone group, and I am ready to go!

I can create Virtual Private Clouds (VPCs), launch EC2 instances, create SSD-backed EBS volumes (gp2), and set up EKS and ECS clusters in the Local Zone (see the AWS Local Zones Features page for a full list of supported services).

The new Local Zones currently offer the following instances:

Name Category vCPUs Memory
(GiB)
t3.xlarge General Purpose 4 16
c5d.2xlarge Compute Intensive 8 16
g4dn.2xlarge GPU 8 32
r5d.2xlarge Memory Intensive 8 64

I can also use EC2 Auto Scaling, AWS CloudFormation, and Amazon CloudWatch in the parent region to monitor and control these resources.

Local Zones in Action
AWS customers are already making great use of all five operational Local Zones! After talking with my colleagues, I learned that the use cases for Local Zones can be grouped into two categories:

Distributed Edge – These customers want to place selected parts of their gaming, social media, and voice assistant applications in multiple, geographically disparate locations in order to deliver a low-latency experience to their users.

Locality – These customers need access to cloud services in specific locations that are in close proximity to their existing branch offices, data centers, and so forth. In addition to low-latency access, they often need to process and store data within a specific geographic region in order to meet regulatory requirements. These customers often run a third party software VPN appliance on Amazon EC2 instance to connect to the desired Local Zone.

Here are a few examples:

Ambra Health provides a cloud-based medical image management suite that empowers some of the largest health systems, radiology practices, and clinical research organizations. The suite replaces traditional desktop image viewers, and uses Local Zones to provide radiologists with rapid access to high quality images so that they can focus on improving patient outcomes.

Couchbase is an award-winning distributed NoSQL cloud database for modern enterprise applications. Couchbase is using AWS Local Zones to provide low latency and single-digit millisecond data access times for applications, ensuring developers’ apps are always available and fast. Using Local Zones, along with Couchbase’s edge computing capabilities, means that their customers are able to store, query, search, and analyze data in real-time

Edgegap is a game hosting service provider focused on providing the best online experience for their customers. AWS Local Zones gives their customers (game development studios such as Triple Hill Interactive, Agog Entertainment, and Cofa Games) the ability to take advantage of ever-growing list of locations and to deploy games with a simple API call.

JackTrip is using Local Zones to allow musicians in multiple locations to collaboratively perform well-synchronized live music over the Internet.

Masomo is an interactive entertainment company that focuses on mobile games including Basketball Arena and Head Ball 2. They use Local Zones to deploy select, latency-sensitive portions of their game servers close to end users, with the goal of improving latency, reducing churn, and providing players with a great experience.

Supercell deploys game servers in multiple AWS regions, and evaluates all new regions as they come online. They are already using Local Zones as deployment targets and considering additional Local Zones as they become available in order to bring the latency-sensitive portions of game servers closer to more end users.

Takeda (a global biopharmaceutical company) is planning to create a hybrid environment that spans multiple scientific centers in and around Boston, with compute-intensive R&D workloads running in the Boston Local Zone.

Ubitus deploys game servers in multiple locations in order to reduce latency and to provide users with a consistent, high-quality experience. They see Local Zones as a (no pun intended) game-changer, and will use it them to deploy and test clusters in multiple cities in pursuit of that consistent experience.

Learn More
Here are some resources to help you learn more about AWS Local Zones:

Stay Tuned
We are currently working on twelve additional AWS Local Zones (Atlanta, Chicago, Dallas, Denver, Kansas City, Las Vegas, Minneapolis, New York, Philadelphia, Phoenix, Portland, and Seattle) and plan to open them throughout the remainder of 2021. We are also putting plans in place to expand to additional locations, both in the US and elsewhere. If you would like to express your interest in a particular location, please let us know by filling out the AWS Local Zones Interest Form.

Over time we plan to add additional AWS services, including AWS Direct Connect and more EC2 instance types in these new Local Zones, all driven by feedback from our customers.

Jeff;

 

Happy 10th Birthday – AWS Identity and Access Management

Amazon S3 turned 15 earlier this year, and Amazon EC2 will do the same in a couple of months. Today we are celebrating the tenth birthday of AWS Identity and Access Management (IAM).

The First Decade
Let’s take a walk through the last decade and revisit some of the most significant IAM launches:

Am IAM policy in text form, shown in Windows Notepad.May 2011 – We launched IAM, with the ability to create users, groups of users, and to attach policy documents to either one, with support for fifteen AWS services. The AWS Policy Generator could be used to build policies from scratch, and there was also a modest collection of predefined policy templates. This launch set the standard for IAM, with fine-grained permissions for actions and resources, and the use of conditions to control when a policy is in effect. This model has scaled along with AWS, and remains central to IAM today.

August 2011 – We introduced the ability for you to use existing identities by federating into your AWS Account, including support for short-term temporary AWS credentials.

June 2012 – With the introduction of IAM Roles for EC2 instances, we made it easier for code running on an EC2 instance to make calls to AWS services.

February 2015 – We launched Managed Policies, and simultaneously turned the existing IAM policies into first-class objects that could be created, named, and used for multiple IAM users, groups, or roles.

AWS Organizations, with a root account and three accounts inside.February 2017 – We launched AWS Organizations, and gave you the ability to to implement policy-based management that spanned multiple AWS accounts, grouped into a hierarchy of Organizational Units. This launch also marked the debut of Service Control Policies (SCPs) that gave you the power to place guard rails around the level of access allowed within the accounts of an Organization.

April 2017 – Building on the IAM Roles for EC2 Instances, we introduced service-linked roles. This gave you the power to delegate permissions to AWS services, and made it easier for you to work with AWS services that needed to call other AWS services on your behalf.

December 2017 – We introduced AWS Single Sign-On to make it easier for you to centrally manage access to AWS accounts and your business applications. SSO is built on top of IAM and takes advantage of roles, temporary credentials, and other foundational IAM features.

November 2018 – We introduced Attribute-Based Access Control (ABAC) as a complement to the original Role-Based Access Control to allow you to use various types of user, resource, and environment attributes to drive policy & permission decisions. This launch allowed you to tag IAM users and roles, which allowed you to match identity attributes and resource attributes in your policies. After this launch, we followed up with support for the use of ABAC in conjunction with AWS SSO and Cognito.

IAM Access Analyzer, showing some active findingsDecember 2019 – We introduced IAM Access Analyzer to analyze your policies and determine which resources can be accessed publicly or from other accounts.

March 2021 – We added policy validation (over 100 policy checks) and actionable recommendations to IAM Access Analyzer in order to help you to construct IAM policies and SCPs that take advantage of time-tested AWS best practices.

April 2021 – We made it possible for you to generate least-privilege IAM policy templates based on access activity.

Then and Now
In the early days, a typical customer might use IAM to control access to a handful of S3 buckets, EC2 instances, and SQS queues, all in a single AWS account. These days, some of our customers use IAM to control access to billions of objects that span multiple AWS accounts!

Because every call to an AWS API must call upon IAM to check permissions, the IAM team has focused on availability and scalability from the get-go. Back in 2011 the “can the caller do this?” function handled a couple of thousand requests per second. Today, as new services continue to appear and the AWS customer base continues to climb, this function now handles more than 400 million API calls per second worldwide.

As you can see from my summary, IAM has come quite a long way from its simple yet powerful beginnings just a decade ago. While much of what was true a decade ago remains true today, I would like to call your attention to a few best practices that have evolved over time.

Multiple Accounts – Originally, customers generally used a single AWS account and multiple users. Today, in order to accommodate multiple business units and workloads, we recommend the use of AWS Organizations and multiple accounts. Even if your AWS usage is relatively simple and straightforward at first, your usage is likely to grow in scale and complexity, and it is always good to plan for this up front. To learn more, read Establishing Your Best Practice AWS Environment.

Users & SSO – In a related vein, we recommend that you use AWS SSO to create and manage users centrally, and then grant them access to one or more AWS accounts. To learn more, read the AWS Single Sign-On User Guide.

Happy Birthday, IAM
In line with our well-known penchant for Customer Obsession, your feedback is always welcome! What new IAM features and capabilities would you like to see in the decade to come? Leave us a comment and I will make sure that the team sees it.

And with that, happy 10th birthday, IAM!

Jeff;

 

Get Started Using Amazon FSx File Gateway for Fast, Cached Access to File Server Data in the Cloud

As traditional workloads continue to migrate to the cloud, some customers have been unable to take advantage of cloud-native services to host data typically held on their on-premises file servers. For example, data commonly used for team and project file sharing, or with content management systems, has needed to reside on-premises due to issues of high latency, or constrained or shared bandwidth, between customer premises and the cloud.

Today, I’m pleased to announce Amazon FSx File Gateway, a new type of AWS Storage Gateway that helps you access data stored in the cloud with Amazon FSx for Windows File Server, instead of continuing to use and manage on-premises file servers. Amazon FSx File Gateway uses network optimization and caching so it appears to your users and applications as if the shared data were still on-premises. By moving and consolidating your file server data into Amazon FSx for Windows File Server, you can take advantage of the scale and economics of cloud storage, and divest yourself of the undifferentiated maintenance involved in managing on-premises file servers, while Amazon FSx File Gateway solves issues around latency and bandwidth.

Replacing On-premises File Servers
Amazon FSx File Gateway is an ideal solution to consider when replacing your on-premises file servers. Low-latency access ensures you can continue to use latency-sensitive on-premises applications, and caching conserves shared bandwidth between your premises and the cloud, which is especially important when you have many users all attempting to access file share data directly.

You can attach an Amazon FSx file system and present it through a gateway to your applications and users provided they are all members of the same Active Directory domain, and the AD infrastructure can be hosted in AWS Directory Service, or managed on-premises.

Your data, as mentioned, resides in Amazon FSx for Windows File Server, a fully managed, highly reliable and resilient file system, eliminating the complexity involved in setting up and operating file servers, storage volumes, and backups. Amazon FSx for Windows File Server provides a fully native Windows file system in the cloud, with full Server Message Block (SMB) protocol support, and is accessible from Windows, Linux, and macOS systems running in the cloud or on-premises. Built on Windows Server, Amazon FSx for Windows File Server also exposes a rich set of administrative features including file restoration, data deduplication, Active Directory integration, and access control via Access Control Lists (ACLs).

Choosing the Right Gateway
You may be aware of Amazon S3 File Gateway (originally named File Gateway), and might now be wondering which type of workload is best suited for the two gateways:

  • With Amazon S3 File Gateway, you can access data stored in Amazon Simple Storage Service (Amazon S3) as files, and it’s also a solution for file ingestion into S3 for use in running object-based workloads and analytics, and for processing data that exists in on-premises files.
  • Amazon FSx File Gateway, on the other hand, is a solution for moving network-attached storage (NAS) into the cloud while continuing to have low-latency, seamless access for your on-premises users. This includes two general-purpose NAS use-cases that use the SMB file protocol: end-user home directories and departmental or group file shares. Amazon FSx File Gateway supports multiple users sharing files, with advanced data management features such as access controls, snapshots for data protection, integrated backup, and more.

One additional unique feature I want to note is Amazon FSx File Gateway integration with backups. This includes backups taken directly within Amazon FSx and those coordinated by AWS Backup. Prior to a backup starting, Amazon FSx for Windows File Server communicates with each attached gateway to ensure any uncommitted data gets flushed. This helps further reduce your administrative overhead and worries when moving on-premises file shares into the cloud.

Working with Amazon FSx File Gateway
Amazon FSx File Gateway is available using multiple platform options. You can order and deploy a hardware appliance into your on-premises environment, deploy as a virtual machine into your on-premises environment (VMware ESXi, Microsoft Hyper-V, Linux KVM), or deploy in cloud as an Amazon Elastic Compute Cloud (Amazon EC2) instance. The available options are displayed as you start to create a gateway from the AWS Storage Gateway Management Console, together with setup instructions for each option.

Below, I choose to use an EC2 instance for my gateway.

FSx File Gateway platform options

The process of setting up a gateway is pretty straightforward and as the documentation here goes into detail, I’m not going to repeat the flow in this post. Essentially, the steps involved are to first create a gateway, then join it to your domain. Next, you attach an Amazon FSx file system. After that, your remote clients can work with the data on the file system, but the important difference is that they connect using a network share to the gateway instead of to the Amazon FSx file system.

Below is the general configuration for my gateway, created in US East (N. Virginia).

FSx File Gateway Details

And here are the details of my Amazon FSx file system, running in an Amazon Virtual Private Cloud (VPC) in US East (N. Virginia), that will be attached to my gateway.

FSx File System Details

Note that I have created and activated the gateway in the same region as the source Amazon FSx file system, and will manage the gateway from US East (N. Virginia). The gateway virtual machine (VM) is deployed as an EC2 instance running in a VPC in our remote region, US West (Oregon). I’ve also established a peering connection between the two VPCs.

Once I have attached the Amazon FSx file system to my Amazon FSx File Gateway, in the AWS Storage Gateway Management Console I select FSx file systems and then the respective file system instance. This gives me the details of the command needed by my remote users to connect to the gateway.

Viewing the attached Amazon FSx File System

Exploring an End-user Scenario with Amazon FSx File Gateway
Let’s explore a scenario that may be familiar to many readers, that of a “head office” that has moved its NAS into the cloud, with one or more “branch offices” in remote locations that need to connect to those shares and the files they hold. In this case, my head office/branch office scenario is for a fictional photo agency, and is set up so I can explore the gateway’s cache refresh functionality. For this, I’m imagining a scenario where a remote user deletes some files accidentally, and then needs to contact an admin in the head office to have them restored. This is possibly a fairly common scenario, and one I know I’ve had to both request, and handle, in my career!

My head office for my fictional agency is located in US East (N. Virginia) and the local admin for that office (me) has a network share attached to the Amazon FSx file system instance. My branch office, where my agency photographers work, is located in the US West (Oregon) region, and users there connect to my agency’s network over a VPN (an AWS Direct Connect setup could also be used). In this scenario, I simulate the workstations at each office using Amazon Elastic Compute Cloud (EC2) instances.

In my fictional agency, photographers upload images to my agency’s Amazon FSx file system, connected via a network share to the the gateway. Even though my fictional head office, and the Amazon FSx file system itself are resources located on the east coast, the gateway and its cache provide a fast, low latency connection for users in the remote branch office, making it seem as though there is a local NAS. After photographers upload images from their assignments, additional staff in the head office do some basic work on them, and make the partly-processed images available back to the photographers on the west coast via the file share.

The image below illustrates the resource setup for my fictional agency.

My sample head/branch office setup, as AWS resources

I have set up scheduled multiple daily backups for the file system, as you might expect, but I’ve also gone a step further and enabled shadow copies on my Amazon FSx file system. Remember, Amazon FSx for Windows File Server is a Windows File Server instance, it just happens to be running in the cloud. You can find details of how to set up shadow copies (which are not enabled by default) in the documentation here. For the purposes of the fictional scenario in this blog post, I set up a schedule so that my shadow copies are taken every hour.

Back to my fictional agency. One of my photographers on the west coast, Alice, is logged in and working with a set of images that have already had some work done on them by the head office. In this image, it’s apparent Alice is connected and working on her images via the network share IP marked in an earlier image in this post – this is the gateway file share.

Suddenly, disaster strikes and Alice accidentally deletes all of the files in the folder she was working in. Picking up the phone, she calls the admin (me) in the east coast head office and explains the situation, wondering if we can get the files back.

Since I’d set up scheduled daily backups of the file system, I could probably restore the deleted files from there. This would involve a restore to a new file system, then copying the files from that new file system to the existing one (and deleting the new file system instance afterwards). But, having enabled shadow copies, in this case I can restore the deleted files without resorting to the backups. And, because I enabled automated cache refreshes on my gateway, with the refresh period set to every 5 minutes, Alice will see the restored files relatively quickly.

My admin machine (in the east coast office) has a network share to the Amazon FSx file system, so I open an explorer view onto the share, right-click the folder in question, and select Restore previous versions. This gives me a dialog where I can select the most recent shadow copy.

Restoring the file data from shadow copies

I ask Alice to wait 5 minutes, then refresh her explorer view. The changes in the Amazon FSx file system are propagated to the cache on the gateway and sure enough, she sees the files she accidentally deleted and can resume work. (When I saw this happen for real in my test setup, even though I was expecting it, I let out a whoop of delight!). Overall, I hope you can see how easy it is to set up and operate an Amazon FSx File Gateway with an Amazon FSx for Windows File Server.

Get Started Today with Amazon FSx File Gateway
Amazon FSx File Gateway provides a low-latency, efficient connection for remote users when moving on-premises Windows file systems into the cloud. This benefits users who experience higher latencies, and shared or limited bandwidth, between their premises and the cloud. Amazon FSx File Gateway is available today in all commercial AWS regions where Amazon FSx for Windows File Server is available. It’s also available in the AWS GovCloud (US-West) and AWS GovCloud (US-East) regions, and the Amazon China (Beijing), and China (Ningxia) regions.

You can learn more on this feature page, and get started right away using the feature documentation.

AQUA (Advanced Query Accelerator) – A Speed Boost for Your Amazon Redshift Queries

Amazon Redshift already provides up to 3x better price-performance at any scale than any other cloud data warehouse. We do this by designing our own hardware and by using Machine Learning (ML).

For example, we launched the SSD-based RA3 nodes for Amazon Redshift at the end of 2019 (Amazon Redshift Update – Next-Generation Compute Instances and Managed, Analytics-Optimized Storage) and added additional node sizes last April (Amazon Redshift update – ra3.4xlarge Nodes), and last December (Amazon Redshift Launches RA3.xlplus Nodes With Managed Storage). In addition to high-bandwidth networking, RA3 nodes incorporate a sophisticated data management model. As I said when we launched the RA3 nodes:

There’s a cache of large-capacity, high-performance SSD-based storage on each instance, backed by S3, for scale, performance, and durability. The storage system uses multiple cues, including data block temperature, data blockage, and workload patterns, to manage the cache for high performance. Data is automatically placed into the appropriate tier, and you need not do anything special to benefit from the caching or the other optimizations.

Our customers use RA3 nodes to maintain very large data sets and are seeing great results. From digital interactive entertainment to tracking impressions and performance for media buys, Amazon Redshift and RA3 nodes help our customers to store and query data at world scale, with up to 32 PB of data in a single data warehouse.

On the downside, it turns out that advances in storage performance have outpaced those in CPU performance, even as data warehouses continue to grow. The combination of large amounts of data (often accessed by queries that mandate a full scan), and limits on network traffic, can result in a situation where network and CPU bandwidth become limiting factors.

We can do something about that…

Introducing AQUA
Today we are making the ra3.4xl and ra3.16xl nodes even more powerful with the addition of AQUA (Advanced Query Accelerator). Building on the caches that I told you about earlier, and taking advantage of the AWS Nitro System and custom FPGA-based acceleration, AQUA pushes the computation needed to handle reduction and aggregation queries closer to the data. This reduces network traffic, offloads work from the CPUs in the RA3 nodes, and allows AQUA to improve the performance of those queries by up to 10x, at no extra cost and without any code changes. AQUA also makes use of a fast, high-bandwidth connection to Amazon Simple Storage Service (Amazon S3).

You can watch this video to learn a lot more about how AQUA uses the custom-designed hardware in the AQUA nodes to accelerate queries. The benefit comes about in several different ways. Each node performs the reduction and aggregation operations in parallel with the others. In addition to getting the n-fold speedup due to parallelism, the amount of data that must be sent to and processed on the compute nodes is generally far smaller (often just 5% of the original). Here’s a diagram that shows how all of the elements come together to accelerate queries:

If you are already using ra3.4xl or ra3.16xl nodes to host your data warehouse, you can start using AQUA in minutes. You simply enable AQUA for your clusters, restart them, and benefit from vastly improved performance for your reduction and aggregation queries. If you are ready to move into the future with RA3 and AQUA, you can create a new RA3-based cluster from a snapshot of your existing one, or you can use Classic resize to do an in-place upgrade.

Using AQUA
I don’t happen to have a data warehouse! I used a snapshot provided by the Redshift team to create a pair of clusters. The first one (prod-cluster) does not have AQUA enabled, and the second one (test-cluster) does:

To create the AQUA-enabled cluster, I simply choose Turn on on the Cluster configuration page:

My queries will use the lineitem table, which has over 18 billion rows:

I create a session on each cluster and disable the Redshift result cache:

And then I run the same query on both clusters:

select sum(l_orderkey), count(*) from lineitem where
l_comment similar to 'slyly %' or
l_comment similar to 'plant %' or
l_comment similar to 'fina %' or
l_comment similar to 'quick %' or
l_comment similar to 'slyly %' or
l_comment similar to 'quickly %' or
l_comment similar to ' %about%' or
l_comment similar to ' final%' or
l_comment similar to ' %final%' or
l_comment similar to ' breach%' or
l_comment similar to ' egular%' or
l_comment similar to ' %closely%' or
l_comment similar to ' closely%' or
l_comment similar to ' %idea%' or
l_comment similar to ' idea%' ; 

If you take a look at the diagram above (and perhaps watch the video), you can see why AQUA can handle queries of this type very efficiently. Instead of sequentially scanning all 18 billion or so rows on the compute nodes, AQUA distributes the collection of similar to expressions to multiple AQUA nodes where they are run in parallel.

The query on the cluster that has AQUA enabled finishes in less than a minute:

The query on the cluster that does not have AQUA enabled finishes in a little under 4 minutes:

As is always the case with databases, complex data, and equally complex queries, your mileage will vary. For example, you could imagine a query that did a complex JOIN of rows SELECTed from multiple tables, where each SELECT would benefit from AQUA, and the overall speedup could be even greater. As you can see from the simple query that I used for this post, AQUA can dramatically reduce query time and perhaps even enable some new types of somewhat real-time queries that were simply not possible or practical in the past.

Things to Know
Here are a couple of interesting facts about AQUA:

Cluster Version – Your clusters must be running Redshift version 1.0.24421 or later in order to be able to make use of AQUA. To learn more about how to enable and disable AQUA, read Managing an AQUA Cluster.

Relevant Queries – AQUA is designed to deliver up to 10X performance on queries that perform large scans, aggregates, and filtering with LIKE and SIMILAR_TO predicates. Over time we expect to add support for additional queries.

Security – All data cached by AQUA is encrypted using your keys. After performing a filtering or aggregation operation, AQUA compresses the results, encrypts them, and returns them to Redshift.

Regions – AQUA is available today in the US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), and Asia Pacific (Tokyo) Regions, and will be coming to Europe (Frankfurt), Asia Pacific (Sydney), and Asia Pacific (Singapore) in the first half of 2021.

Pricing – As I mentioned earlier, there’s no additional charge for AQUA.

Try AQUA Today
If you are using ra3.4xl or ra3.16xl nodes to power your Redshift cluster, you can enable AQUA, restart the cluster, and run some test queries within minutes. Take AQUA for a spin and let me know what you think!

Jeff;