The following excerpt is taken from the transcript of a presentation by Seagate at Launchpad 2020.
Speaker: Roger Hall, Principal Engineer, Seagate
Using AI/ML to monitor clean room conditions and production
Artificial intelligence applies machine learning, analytics, inference, and other techniques which can be used to solve actual problems. The two examples here, particle detection and image anomaly detection, have the potential to adopt edge analytics during the manufacturing process. A common problem in clean rooms is spikes in particle count from particle detectors. With this application, we can provide context to particle events by monitoring the area around the machine, and detecting when foreign objects like gloves enter areas where they should not.
Image anomaly detection historically has been accomplished at Seagate by operators in clean rooms, viewing each image one at a time for anomalies. Creating models of various anomalies through machine learning methodologies can be used to run comparative analyses in a production environment, where outliers can be detected through inference in an automated, real-time analytics scenario. So, anomaly detection is also frequently used in machine learning to find patterns or unusual events in our data.
How do you know what you don’t know is really what you ask, and the first step in anomaly detection is to use an algorithm to find patterns or relationships in your data. In this case, we are looking at hundreds of variables and finding relationships between them. We can then look at a subset of variables and determine how they are behaving in relation to each other. We use this baseline to define normal behavior and generate a model of it. In this case, we are building a model with three variables. We can then run this model against new data.
Observations that do not fit in the model are defined as anomalies and anomalies can be good or bad. It takes a subject matter expert to determine how to classify the anomalies, and classification could be scrapped, or okay to use for example. The subject matter expert is assisting the machine to learn the rules. We then update the model with the classification anomalies and start running again, and we can see that there are a few that generate these models.
Now Seagate factories generate hundreds of thousands of images every day. Many of these require a human to look at them and make a decision. This is dull and mistake-prone work that is ideal for artificial intelligence. The initiative that I am project managing is intended to offer a solution that matches the continual increased complexity of the products we manufacture, and that minimizes the need for manual inspection.
Edge RX smart manufacturing
The Edge RX smart manufacturing reference architecture is the initiative both Hamid and I are working on. Sorry to say that Hamid isn’t here today, but as I said, you may have guessed our goal is to introduce early defect detection in every stage of our manufacturing process, through machine learning and real-time analytics through inference. In doing so, we will improve overall product quality, enjoy higher yields with lesser defects, and produce higher margins.
Because this was entirely new, we established partnerships with HPE, NVIDIA, and Docker, now Mirantis, two years ago, to develop the capability that we now have, as we deploy Edge RX to our operations sites in four continents. From a hardware sense, HPE and NVIDIA have been able partners in helping us develop an architecture that we have standardized on. On the software stack side, Docker has been instrumental in helping us manage a very complex project with a steep learning curve for all concerned.
To further clarify efforts to enable more AI and ML in factories, the objective was to determine an economical edge compute that will access the latest AI and ML technology, using a standardized platform across all factories. This objective included providing an upgrade path that scales while minimizing disruption to existing factory systems and burden on factory information systems resources.
The two parts to the compute solution are shown in the diagram, and the gateway device connects to Seagate’s existing factory information systems architecture and does inference calculations. The second part is a training device for creating and updating models. All factories will need the gateway device and the compute cluster on site. And to this day, it remains to be seen if the training devices are needed in other locations, but we do know that one device is capable of supporting multiple factories simultaneously. There are also options for training on cloud-based resources.
“To enable more AI and ML in factories, the objective was to determine an economical edge compute that will access the latest AI and ML technology, using a standardized platform across all factories.”
— Roger Hall, Principal Engineer
The stream-storing appliance consists of a Kubernetes cluster with GPU and CPU worker nodes, as well as master nodes and Docker Trusted Registry (now Mirantis Secure Registry). The GPU nodes are hardware-based using HPE L4000 Edgelines to balance our virtual machines. For machine learning, we’ve standardized on both the HPE Apollo 6500 and the NVIDIA GDX1, each with eight NVIDIA v100 GPUs. Incidentally, the same technology enables augmented and virtual reality.
Deploying Kubernetes clusters to operations sites worldwide
Hardware is only one part of the equation. Our software stack consists of Docker Enterprise (now Mirantis Kubernetes Engine) and Kubernetes, as I’ve mentioned previously. We’ve deployed these clusters at all of our operations sites, with specific use cases planned for each site. Mirantis has had a major impact on our ability to develop this capability by offering a stable platform and Universal Control Plane, that provides us with the necessary metrics to determine the health of the Kubernetes cluster and the use of Docker Trusted Registry to maintain a secure repository for containers.
They have been an exceptional partner in our efforts to deploy clusters at multiple sites. At this point, in our deployment efforts, we are on-prem, but we are exploring cloud service options that include Mirantis’ next-generation offering (Mirantis Container Cloud) that includes StackLight (logging, monitoring and alerting) in conjunction with multi-cluster management and to me the concept of federation, of multi-cluster management, is a requirement in our case because of the global nature of our business where our operation sites are on four continents. So StackLight provides the hook of each cluster that makes multi-cluster management an effective solution.
Choosing an enterprise-grade container platform
Open source has been a major part of project Athena, and there has been a debate about using Docker CE versus Docker Enterprise. That decision was actually easy given the advantages that Docker Enterprise would offer, especially during an early phase of development.
Kubernetes was a natural addition to the software stack and has been widely accepted, but we have also been at work to adopt such open source tools as RabbitMQ messaging, TensorFlow, and TensorRT, to name three, GitLab for developments and a number of others as you see here, as well. Most of our program programming has been in Python.
The results of our efforts so far have been excellent. We are seeing a six month return on investment from just one of seven clusters, where the hardware and software cost approached close to $1 million. The performance on this cluster is now over 3 million images processed per day.
Seagate Leverages Mirantis Kubernetes Engine to provide Ai/ML-powered smart manufacturing at the edge.
The company wanted to adopt AI/ML-based smart manufacturing across four continents.
Seagate is using Mirantis Kubernetes Engine to deploy economical edge compute at production facilities.
The company is using AI/ML to process more than 3 million images daily, improving product quality.
Seagate by the Numbers
“Mirantis has had a major impact on our ability to develop this capability by offering a stable platform and Universal Control Plane, that provides us with the necessary metrics to determine the health of the Kubernetes cluster.”