Category: <span>Amazon Machine Learning</span>

New – Ready-to-use Models and Support for Custom Text and Image Classification Models in Amazon SageMaker Canvas

Today AWS announces new features in Amazon SageMaker Canvas that help business analysts generate insights from thousands of documents, images, and lines of text in minutes with machine learning (ML). Starting today, you can access ready-to-use models and create custom text and image classification models alongside previously supported custom models for tabular data, all without requiring ML experience or writing a line of code.

Business analysts across different industries want to apply AI/ML solutions to generate insights from a variety of data and respond to ad-hoc analysis requests coming from business stakeholders. By applying AI/ML in their workflows, analysts can automate manual, time-consuming, and error-prone processes, such as inspection, classification, as well as extraction of insights from raw data, images, or documents. However, applying AI/ML to business problems requires technical expertise and building custom models can take several weeks or even months.

Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service that allows business analysts to use a variety of ready-to-use models or create custom models to generate accurate ML predictions on their own.

Ready-to-use Models
Customers can use SageMaker Canvas to access ready-to-use models that can be used to extract information and generate predictions from thousands of documents, images, and lines of text in minutes. These ready-to-use models include sentiment analysis, language detection, entity extraction, personal information detection, object and text detection in images, expense analysis for invoices and receipts, identity document analysis, and more generalized document and form analysis.

For example, you can select the sentiment analysis ready-to-use model and upload product reviews from social media and customer support tickets to quickly understand how your customers feel about your products. Using the personal information detection ready-to-use model, you can detect and redact personally identifiable information (PII) from emails, support tickets, and documents. Using the expense analysis ready-to-use model, you can easily detect and extract data from your scanned invoices and receipts and generate insights about that data.

These ready-to-use models are powered by AWS AI services, including Amazon Rekognition, Amazon Comprehend, and Amazon Textract.

Ready-to-use models available

Custom Text and Image Classification Models
Customers that need custom models trained for their business-specific use-case can use SageMaker Canvas to create text and image classification models. 

You can use SageMaker Canvas to create custom text classification models to classify data according to your needs. For example, imagine that you work as a business analyst at a company that provides customer support. When a customer support agent engages with a customer, they create a ticket, and they need to record the ticket type, for example, “incident”, “service request”, or “problem”. Many times, this field gets forgotten, and so, when the reporting is done, the data is hard to analyze. Now, using SageMaker Canvas, you can create a custom text classification model, train it with existing customer support ticket information and ticket type, and use it to predict the type of tickets in the future when working on a report with missing data.

You can also use SageMaker Canvas to create custom image classification models using your own image datasets. For instance, imagine you work as a business analyst at a company that manufactures smartphones. As part of your role, you need to prepare reports and respond to questions from business stakeholders related to quality assessment and it’s trends. Every time a phone is assembled, a picture is automatically taken, and at the end of the week, you receive all those images. Now with SageMaker Canvas, you can create a new custom image classification model that is trained to identify common manufacturing defects. Then, every week, you can use the model to analyze the images and predict the quality of the phones produced.

SageMaker Canvas in Action
Let’s imagine that you are a business analyst for an e-commerce company. You have been tasked with understanding the customer sentiment towards all the new products for this season. Your stakeholders require a report that aggregates the results by item category to decide what inventory they should purchase in the following months. For example, they want to know if the new furniture products have received positive sentiment. You have been provided with a spreadsheet containing reviews for the new products, as well as an outdated file that categorizes all the products on your e-commerce platform. However, this file does not yet include the new products.

To solve this problem, you can use SageMaker Canvas. First, you will need to use the sentiment analysis ready-to-use model to understand the sentiment for each review, classifying them as positive, negative, or neutral. Then, you will need to create a custom text classification model that predicts the categories for the new products based on the existing ones.

Ready-to-use Model – Sentiment Analysis
To quickly learn the sentiment of each review, you can do a bulk update of the product reviews and generate a file with all the sentiment predictions.

To get started, locate Sentiment analysis on the Ready-to-use models page, and under Batch prediction, select Import new dataset.

Using ready-to-use sentiment analysis with a batch dataset

When you create a new dataset, you can upload the dataset from your local machine or use Amazon Simple Storage Service (Amazon S3). For this demo, you will upload the file locally. You can find all the product reviews used in this example in the Amazon Customer Reviews dataset.

After you complete uploading the file and creating the dataset, you can Generate predictions.

Select dataset and generate predictions

The prediction generation takes less than a minute, depending on the size of the dataset, and then you can view or download the results.

View or download predictions

The results from this prediction can be downloaded as a .csv file or viewed from the SageMaker Canvas interface. You can see the sentiment for each of the product reviews.

Preview results from ready-to-use model

Now you have the first part of your task ready—you have a .csv file with the sentiment of each review. The next step is to classify those products into categories.

Custom Text Classification Model
To classify the new products into categories based on the product title, you need to train a new text classification model in SageMaker Canvas.

In SageMaker Canvas, create a New model of the type Text analysis.

The first step when creating the model is to select a dataset with which to train the model. You will train this model with a dataset from last season, which contains all the products except for the new collection.

Once the dataset has finished importing, you will need to select the column that contains the data you want to predict, which in this case is the product_category column, and the column that will be used as the input for the model to make predictions, which is the product_title column.

After you finish configuring that, you can start to build the model. There are two modes of building:

  • Quick build that returns a model in 15–30 minutes.
  • Standard build takes 2–5 hours to complete.

To learn more about the differences between the modes of building you can check the documentation. For this demo, pick quick build, as our dataset is smaller than 50,000 rows.

Prepare and build your model

When the model is built, you can analyze how the model performs. SageMaker Canvas uses the 80-20 approach; it trains the model with 80 percent of the data from the dataset and uses 20 percent of the data to validate the model.

Model score

When the model finishes building, you can check the model score. The scoring section gives you a visual sense of how accurate the predictions were for each category. You can learn more about how to evaluate your model’s performance in the documentation.

After you make sure that your model has a high prediction rate, you can move on to generate predictions. This step is similar to the ready-to-use models for sentiment analysis. You can make a prediction on a single product or on a set of products. For a batch prediction, you need to select a dataset and let the model generate the predictions. For this example, you will select the same dataset that you selected in the ready-to-use model, the one with the reviews. This can take a few minutes, depending on the number of products in the dataset.

When the predictions are ready, you can download the results as a .csv file or view how each product was classified. In the prediction results, each product is assigned only one category based on the categories provided during the model-building process.

Predict categories

Now you have all the necessary resources to conduct an analysis and evaluate the performance of each product category with the new collection based on customer reviews. Using SageMaker Canvas, you were able to access a ready-to-use model and create a custom text classification model without having to write a single line of code.

Available Now
Ready-to-use models and support for custom text and image classification models in SageMaker Canvas are available in all AWS Regions where SageMaker Canvas is available. You can learn more about the new features and how they are priced by visiting the SageMaker Canvas product detail page.

— Marcia

AWS Machine Learning University New Educator Enablement Program to Build Diverse Talent for ML/AI Jobs

AWS Machine Learning University is now providing a free educator enablement program. This program provides faculty at community colleges, minority-serving institutions (MSIs), and historically Black colleges and universities (HBCUs) with the skills and resources to teach data analytics, artificial intelligence (AI), and machine learning (ML) concepts to build a diverse pipeline for in-demand jobs of today and tomorrow.

According to the National Science Foundation, Black and Hispanic or Latino students earn bachelor’s degrees in Computer Science—the dominant pathway to AI/ML—at a much lower rate than their white peers, earning less than 11 percent of computer science degrees awarded. However, research shows that having diverse perspectives among skilled practitioners and across the AI/ML lifecycle contributes to the development of AI/ML systems that are safe, trustworthy, and have less bias. 

In 2018, we announced the Machine Learning University (MLU) to share with all developers the same courses that we used to train engineers at Amazon and AWS. This platform offers self-service, self-paced, AI/ML digital courses.

Machine Learning University home page

And today, we add this new program to our AI/ML training offering. Although anyone could access the MLU self-paced learning, it places the burden on the learner to source prerequisite work and solutions. This educator enablement program takes the concepts and lessons developed by MLU and makes them more accessible to educators. It offers a year-round educator enablement program with lesson planning, course playbooks, and access to free compute resources.

Program Details
Educators are onboarded in small-group cohorts into bootcamps where they will learn the material and deep dive into how to teach it via instructor-led lectures and hands-on projects. Educators who complete the bootcamp can take part in different year-round development opportunities, such as a dedicated Slack channel to share teaching best practices, education topic series and virtual study sessions moderated by MLU instructors, and regional events for continued professional development. Also, they will receive continuing education credits and AWS-provided stipends.

Faculty and students get access to instructional material through Amazon SageMaker Studio Lab. SageMaker Studio Lab was announced last year and is AWS’s free (no credit card required) ML development environment. It provides computing and storage for anybody that wants to learn and experiment with ML. Institutions can unlock additional resources to support their ML programs by registering for AWS Academy. AWS Academy unlocks all the AWS services for a complete AI/ML program.

Community colleges and universities can integrate this educator enablement program into their computer science, information technology, and business curricula to create an AI/ML course, certificate, or degree. We have worked with educators and education boards such as Houston Community College to create content that is vetted for credit-worthy and degree-earning curricula.

In August 2022, we launched our first educator bootcamp in partnership with The Coding School. The bootcamp was delivered over two weeks, offering lectures, case studies, and hands-on projects. 25 educators completed the Educator Machine Learning Bootcamp, representing 22 US community colleges and universities.

Learn More and Join The Program
During 2023, AWS Machine Learning University will run six educator-enablement cohorts starting in January. The program will give priority consideration to educators at community colleges, MSIs, and HBCUs, in alignment with this program mission to increase access to AI/ML technology to historically underserved and underrepresented students.

If you are a computer science educator or part of a board of educators interested in fostering more depth in your computer science coursework, you should sign up for the educator enablement program.