Planning hardware for your OpenStack cluster: the answers to your questions
My colleague Anne Friend and I recently gave a webinar on "How to get away with hardware planning for your OpenStack cloud". During the webinar, we promised that we'd provide you with the answers to any questions we didn't get to during the live call, and here they are.
You mentioned adding storage in a top-of-the-rack oversubscribed switch. Can you talk about how that might be configured?
Typical top-of-rack does not have as much bandwidth on the uplinks as it has on the downlinks. For example, typically a trident+ switch will have 48 10 gig ports for downlinks at 960 gbits of throughput, but only 4x40 gig ports, or 320 gigs as uplink capacity, so it's roughly 3/1 oversubscribed.
This means that you should limit the traffic traversing the uplinks. There are two ways to handle that. One way is to schedule tenant VMs within the domain described by this TOR to reduce traffic between VMs going across the links.
A second major source of traffic is Cinder traffic between Cinder hosts and Compute hosts. Concentrating this traffic in a single switch will also offload the uplinks. For example if you are using the simple Cinder OpenStack ISCSI targets, you can provide one or two per rack and make sure that the Cinder scheduler creates volumes from the Cinder target that is located in the same rack as the Compute resources. Both of these are custom filters that have to be created for the Nova and Cinder schedulers. It's not quite "out of the box," but it is an easy mod to make.
I am trying to understand how we can apply some of the trade-offs you described in more specific terms. Can you give us a numerical example of vCPU/VRAM allocation trade-offs for 2 different use cases, as examples?
There are probably too many use-cases to really get into them, but let's look at some actual calculations.
- 100 VMs
- 2 EC2 compute units average
- 16 EC2 compute units max
- No Oversubscription
This translates to:
- 200 GHZ of CPU capacity (100 user x 2 GHz/user)
- Max about 5 cores (16 GHz / 2.4 Ghz per core)
Based on calculations of:
- Count Hyperthreading as 1.3x
- 10-11 E5 2640 sockets (200 GHz / 2.4 GHz per CPU / 6 Cores per socket)
- 5-6 dual core servers (11 sockets / 2 sockets per server)
- 17 VMs per server (100 VMs / 6 servers)
- 100 VMs
- 4 GB per VM
- Min 512MB, Max 32GB
This translates to:
- 400 GB of total memory (100 VMs * 4 GB per VM)
Based on calculations of:
- Need about 4 128GB machines (400 GB / 128 GB)
- Balance with CPU you need 6 machines for CPU capacity
- Reduce per server memory and go with 6 64 or 96 GB machines (6x64GB is 384GB, 6x96 is 596GB)
If you'd like to have a bit more memory then you need to go with 96 GB machines
When you say VLAN is good for a small network, how big is small?
A small network has less then 4K virtual networks. However, since quantum allows tenants to have multiple networks, you can't assume that you can host 4K tenants. Also, don’t forget that you will have some static infrastructure needs; don’t forget to reserve tags for those networks.
How does Fuel help me with automated network configuration?
Fuel can verify the network configuration to ensure that your nodes are connected properly and all appropriate VLAN tags are unblocked at the switch.
Do you think it’s better to use brand-name hardware such as Dell, HP, etc., or do you think we can have the same performance with hardware built by ourselves? How about the Open Compute Platform?
I answered this question in more detail on the webinar, but the short answer is if you are large enough to support your own hardware, or small enough not to care about downtime during a hardware failure, then you can use ODMs/white boxes. If you are mid-tier, you should use brand names, as onsite repair SLAs are hard to match otherwise.
Open Compute is a promising platform, but it depends on broader hardware support. It's coming, but it's not there yet.
Do you recommend nodes that run specific nova services have different hardware? For example, should the node running nova-api have more memory than a node running glance-api?
At Mirantis we recommend that you consolidate all OpenStack infrastructure services into dedicated hosts called controllers. This kind of architecture makes it easier to ensure high availablilty.
What about ARM (or Atom) based microservers?
If you have a general purpose cloud it would be hard for you to run a CPU intensive work load on ARM- or Atom-based microservers. Try running MsSQL server or Oracle on ARM; you are not going to go to far. If you have a special purpose cloud that fits into the constraints of these CPUs, then by all means, run them. Cloud is not all CPU, and A lot of ARM/Atom designs do not have enough bandwidth or disk to make for a good platform.
What about blade servers?
In my personal opinion, I prefer regular servers for the cloud. If you need higher density, use sleds (Dell C-class, HP SL-class) instead of blades. A blade server mid-plane typically does not have enough bandwidth to do a good job on the cloud, and not enough local storage, putting a double hit on your chassis bandwidth needs. You'll probably need a blade Plus, you pay premium for the blades. One or two blade designs have started to eliminate at least the network bottleneck, but other concerns remain. So check the interconnect speed; YMMV.
Can we have live migration capabilities without shared storage?
You can have live migration without shared storage. It just takes longer to migrate.
For a small private cloud, do you recommend fiber channel for shared storage on computes, or just a 1Gigabit shared file system?
Neither. Use 10 gig and Ceph, or another block store. You don’t need shared FS or the expense of Fibre Channel.
Can you talk a bit more about the 6.5x swift requirement?
This is another question with a more detailed answer in the recorded webinar, but here's a sample calculation:
Assume a replication factor of 3.
Add 2 hand off devices (needed to reserve space for failures)
Finally, if you exceed about 75% of an XFS drive's capacity you begin to have problems, do you wind up with the following calculation:
(3+2)/.75 = 6.7
After deployment, what tools do you or have you used to verify CPU & hardware utilization?
At Mirantis, we have used (and had success with) Nagios and Zabbix.
Can OpenStack deployment be done in OCP (Open Compute Platform)?
Yes. Mirantis Fuel is largely independent of hardware architecture.
Where do diskless hypervisors fit into the 'local vs shared vs object' storage equation? Is it possible to operate compute nodes as diskless iSCSI clients without breaking Cinder's ability to connect to iSCSI targets, or is another SAN solution required for bare metal?
Let me turn this around and ask why you'd want to bother with such complexity. With Mirantis Fuel, we deploy the OS for you already. Having a few small drives for the OS will make the set up simpler. We tried this before, but arrays seem to have issues when multiple initiators for the OS and Cinder from the same host want to talk to a single target. Lots of complexity, trouble and cost. Not worth it.
Does Fuel support bonding interfaces?
Yes, but you need to use the command-line interface rather than the web-based UI.
Has there been any work with Illumos-based hypervisors, or anything Illumos for that matter, or has all work been done on Linux only?
ZFS is not so compelling that an odd-ball OS such as Solaris is worth your while. Yes you can run XEN and KVM with caveats and restrictions. If you are rich enough to support your own OS team you can do it, but you'll always be behind the feature curve. I’ve built and run multiple OS teams for various companies, and I can tell you, if you are in the OS-making business, go for it. Otherwise follow the trail; your going will be easier than plowing straight through the jungle.