The stream-storing appliance consists of a Kubernetes cluster with GPU and CPU worker nodes, as well as master nodes and Docker Trusted Registry (nowMirantis Secure Registry). The GPU nodes are hardware-based using HPE L4000 Edgelines to balance our virtual machines. For machine learning, we’ve standardized on both the HPE Apollo 6500 and the NVIDIA GDX1, each with eight NVIDIA v100 GPUs. Incidentally, the same technology enables augmented and virtual reality.
Deploying Kubernetes clusters to operations sites worldwide
Hardware is only one part of the equation. Our software stack consists of Docker Enterprise (nowMirantis Kubernetes Engine) and Kubernetes, as I’ve mentioned previously. We’ve deployed these clusters at all of our operations sites, with specific use cases planned for each site. Mirantis has had a major impact on our ability to develop this capability by offering a stable platform and Universal Control Plane, that provides us with the necessary metrics to determine the health of the Kubernetes cluster and the use of Docker Trusted Registry to maintain a secure repository for containers.
They have been an exceptional partner in our efforts to deploy clusters at multiple sites. At this point, in our deployment efforts, we are on-prem, but we are exploring cloud service options that include Mirantis’ next-generation offering (Mirantis Container Cloud) that includes StackLight (logging, monitoring and alerting) in conjunction with multi-cluster management and to me the concept of federation, of multi-cluster management, is a requirement in our case because of the global nature of our business where our operation sites are on four continents. So StackLight provides the hook of each cluster that makes multi-cluster management an effective solution.
Choosing an enterprise-grade container platform
Open source has been a major part of project Athena, and there has been a debate about using Docker CE versus Docker Enterprise. That decision was actually easy given the advantages that Docker Enterprise would offer, especially during an early phase of development.
Kubernetes was a natural addition to the software stack and has been widely accepted, but we have also been at work to adopt such open source tools as RabbitMQ messaging, TensorFlow, and TensorRT, to name three, GitLab for developments and a number of others as you see here, as well. Most of our program programming has been in Python.
The results of our efforts so far have been excellent. We are seeing a six month return on investment from just one of seven clusters, where the hardware and software cost approached close to $1 million. The performance on this cluster is now over 3 million images processed per day.