Tag: hadoop

Austin Summit Preview: Architecture and Best Practices to Deploy Hadoop and Spark Clusters with Sahara

While Hadoop is the standard for big data platforms, it isn’t easy to deploy. At the OpenStack Summit in Austin, we’ll share best practices and explain how.

Sahara updates in the OpenStack Juno release

The Sahara project has several major updates in the Juno release.

Search, Store, Scale: Big Data and Berlin Buzzwords (with Sahara!)

If you’re going to be anywhere near Berlin next week and you have any interest at all in using Big Data effectively, you might want to consider heading over to the Berlin Buzzwords conference, to be held from May 25-28, 2014. The conference concentrates on the storing, analysis, and searching of massive amounts of digital data — their tag is “search, store, scale” — and includes a strong social and informal component providing massive opportunities for collaboration and unexpected learning.

Improving Data Processing Performance with Hadoop Data Locality

One of the major bottlenecks in data-intensive computing is cross-switch network traffic. Fortunately, having map code executing on the node where the data resides significantly reduces this problem. This technique, called “data locality”, is one of the key advantages of Hadoop Map/Reduce. In this article, we’ll discuss the requirements for data locality, how the virtualized environment of OpenStack influences the Hadoop cluster topology, and how to achieve data locality using Hadoop with Savanna.