Implementation of AWS MLOps at a Listed German Retail Group

The client is a wholly owned subsidiary of a German retail group with omni-channel sales structures. As an incubator for technological ideas, the company develops not only products but also technologies and sustainable business models. At the heart of this is the vision of digitizing the industry for the benefit of customers without compromising on quality. Inspired by technological trends, the subsidiary conducts successful research with 14 patents, implements projects independently and also supports start-ups in realizing their ideas.

With over 900 branches in Europe, the parent group is represented in the retail trade for optics and medical products. In addition to fair prices for products, the group has been convincing with high quality and service for decades. The company is the market leader in Germany and regularly sets standards for new technological developments within the industry.

In This Article You Will Read About:

  • The Challenge:
    • How do you achieve the successful digitization of a highly complex measurement process?
  • The Solution:
    • Establish Machine Learning Operations (MLOps) as a successful process and help improve and accelerate the development process thanks to the use of modern cloud infrastructure
  • The Implementation:
    • Automating Docker Images Builds to ECR
    • Automatic updates to the ECS service
    • Move to Cloudwatch triggered batch workload
  • The Benefits:
    • Implementation of MLOps
    • Deployment of AWS infrastructure
    • API development
    • Foundation for future projects

The Challenge:

The company conducts its own research and development (R&D) for the development of innovative technologies and products. Experts from the fields of measurement technology, machine learning (ML), front-end and back-end software development, and user experience form an interdisciplinary team.

Among other things, our client developed a product with the goal of an automated, location- and device-independent online test procedure for the individual customization of medical devices. The central aspect of the project is the digitalization of the measurement procedure, which is traditionally performed on site using specialized equipment and trained personnel. PROTOS Technologie supported the company in achieving this goal by implementing a modern cloud infrastructure to improve and accelerate the development process.

Machine Learning Operations (MLOps):

When moving from the experimental development phase to production readiness, it became clear to the customer that working with machine learning operations (MLOps) in particular brings great advantages. The central prerequisite for the successful implementation of the project was a robust, cost-effective and secure infrastructure that can be continuously tailored and optimized to new machine learning tasks. This allows machine learning engineers to be relieved of the often most time-consuming tasks such as data preparation and model provision and to use their capacities for development progress.

In order to integrate new AWS cloud-based MLOps functions into the backend, close coordination with the respective development teams was required. Front-end developers were provided with endpoints to upload data and use machine learning models with the right interfaces. In collaboration with machine learning engineers and data scientists, a data environment hosted in AWS and a training environment for ML were built. 

The development and maintenance of a scalable CI/CD pipeline to ensure versioning, auditability, and testing was also part of the requirements, meshing classic DevOps with ML-specific requirements for MLOps.  

The Solution:

Machine Learning Pipeline on AWS:

PROTOS Technologie facilitated the customer’s implementation of MLOps to provide the necessary infrastructure resources and processes. Through automation, the development process was improved, accelerated and secured.

Possibly the biggest challenge in Machine Learning projects is often the enormous amount of time required to handle data acquisition, data preparation and model training. The biggest advantage of MLOps is the reduction of the workload by automating processes and relieving the ML experts to optimize a machine learning project. 

The large amounts of data needed require prior preparation in addition to collection and storage. With the help of the expertise of PROTOS, the provision on modern cloud infrastructure, as well as optimization and automation of these factors, contributed essentially to the improvement of the entire machine learning process.

The Implementation in the AWS Cloud:

A data pipeline was implemented in the AWS Cloud using AWS Lambda, Amazon S3, and AWS KMS, which encrypted and processed the data and supported ongoing data cleaning and data labeling. 

With the data and compute capacity provided via AWS, the machine learning model training could be containerized and transferred to the AWS cloud. The advantage was that the model training could be executed on the cloud’s scalable and optimized CPUs and GPUs, regardless of location or user. In addition, hosting the trained models in production could be achieved in a lightweight, cost-effective, and highly available and scalable manner using AWS Lambda serverless. The ability to have a containerized runtime in AWS Lambda provides modern API-first approaches to even deploying complex machine learning models. 

To enable interaction with the data pipeline and ML model results, a secure AWS API gateway endpoint was provided to mobile and front-end developers. 

Highest security standards and intensive testing (unit tests / integration tests) were the basis of the implemented CI/CD process, which automates the deployment and testing of the infrastructure running on AWS. 

All resources were deployed as infrastructure-as-code using AWS Cloud Development Kit (CDK). This meant that the complete infrastructure for running an MLOps-optimized machine learning project could be provisioned with one click and was understandably customizable, as well as being further operable for the customer itself.

The Benefits:

Introduction of MLOps

Data management, training and deployment are immense time factors in Machine Learning projects. MLOps defines the workflows, infrastructure and automation processes necessary to optimize ML development.

Deployment of an AWS Infrastructure

Cloud infrastructure enables scalable and independent operation of a machine learning pipeline and data backup, as well as accelerated training on AWS compute instances.

API Development

Developing and deploying API interfaces using API Gateway allows secure communication with users and enables more functionality to run independently and scalably in the AWS backend.

Foundation for Future Projects

The infrastructure developed via Infrastructure-as-Code and the MLOps processes are the cornerstone of further machine learning projects. With the experience gained, future projects can be developed and integrated more easily and quickly at Fielmann.