| regression | find the housing price based on location | predict the temperature |
|---|---|---|
| classification | detect if a person is healthy or unhealthy | find the species of an animal |
When we deal with a lot of data in regression or classification, another term plays an important rule: factorization machines. The movie recommendation tool of Netflix is a good example. A Factorization Machine is a general supervised learning algorithm that can be used for both classification and regression tasks. This extension of a linear model is designed to economically capture interactions between functions within high-dimensional data in low-density datasets. Low-density means that the database or matrix is sparse (empty): In a rating database i.e. a lot of user rated only some of thousand available movies on Netflix and since most entries will be null.
Easily speaking, unsupervised problems are encountered when the computer needs to determine the label of a data point, i.e. finding the group a data point belongs to. A good example is the K-Nearest Neighbour algorithm in which a class assignment is made considering its nearest neighbours. Another good example of unsupervised learning could be to find some anomalies in pictures.
Pictures however are an interesting type of data structure. An image is a collection of many pixel which itself consists of some colours, i.e. RGB. To process this amount of data, we must compress it somehow in order to find features (i.e. a face) in a picture, Convolutional Neural Networks are often used. CNNs reduce many pixels and map them into one.
AWS High-Level Services
Reading my Machine Learning 101, we can now have a look at some awesome AWS products and when it is a good idea to use them:
If you want to bring out an application as fast as possible, I can recommend using one of the high-level services AWS offer:
Amazon Forecast
Amazon Forecast is a fully managed service for time series forecasting. If you provide historical time series data for Amazon Forecast, you can predict future points in the series. Time series forecasts are useful for different domains such as retail, financial planning, supply chain, and healthcare. You can also use Amazon Forecast to forecast operational metrics for inventory management as well as human resources and resource planning.
AWS Rekognition
With AWS Rekognition you can easily build an image classification application without even knowing a lot about its theory. You can simply use its high-level API. With Amazon Rekognition, it is possible to identify objects, people, text, scenes, and activities in images and videos. You can also identify any inappropriate content. Amazon Rekognition also provides highly accurate facial analysis.
AWS Comprehend
Amazon Comprehend stands for the analysis of unstructured text data using NLP (Natural Language Processing). Various analysis tools of the cloud-based service extract key phrases in order to recognize the sentiment of a text or filter out names or places in the context of entity recognition. In text processing software often sentiment analysis is needed, with AWS Comprehend you can easily find out if a comment is positive, neutral or negative.
Low-Level Services (AWS Sagemaker)
If you need to have a more low-level control and you are a more experienced data scientist, you will love AWS Sagemaker. It enables you to set up jupyter notebooks, import sample notebooks, run your training and deploy your models without bothering about the hardware components. You will pay for the high-performance machine only for the time of training, which will save you money.
Lastly, I want to point out the most interesting and common in-built algorithms in Sagemaker. The following table is showing common problems which can be solved perfectly with the mentioned AWS Sagemaker algorithms:
Sagemaker in-built algorithmCommon problem to solve
Factorization Machines Algorithm
| Sagemaker in-built algorithm | Factorization Machines Algorithm | KNN | Image Classification Algorithm | XGBoost Algorithm | BlazingText |
|---|---|---|---|---|---|
| Common problem to solve | when building recommendation tools and have a lot of sparse data | unsupervised clustering problems | image recognition and classification | general supervised regression or classification problems (a lot of competitions are won with this algorithm) | natural language processing |




