The Mirantis Blog
Kubernetes tutorials, product updates and featured articles
Austin Summit Preview: Architecture and Best Practices to Deploy Hadoop and Spark Clusters with Sahara
While Hadoop is the standard for big data platforms, it isn’t easy to deploy. At the OpenStack Summit in Austin, we’ll share best practices and explain how.
If you’re going to be anywhere near Berlin next week and you have any interest at all in using Big Data effectively, you might want to consider heading over to the Berlin Buzzwords conference, to be held from May 25-28, 2014. The conference concentrates on the storing, analysis, and searching of massive amounts of digital data — their tag is “search, store, scale” — and includes a strong social and informal component providing massive opportunities for collaboration and unexpected learning.
One of the major bottlenecks in data-intensive computing is cross-switch network traffic. Fortunately, having map code executing on the node where the data resides significantly reduces this problem. This technique, called “data locality”, is one of the key advantages of Hadoop Map/Reduce. In this article, we’ll discuss the requirements for data locality, how the virtualized environment of OpenStack influences the Hadoop cluster topology, and how to achieve data locality using Hadoop with Savanna.