Some of you may already have notices others just stumbled on this post through a search engine, I have set up an openstack private cloud at one of our projects:
We have noticed that the benefits of having a private cloud is spreading through the different teams within the organization and therefore the interest into this flexibility is growing. Since this wasn't the original use case we are encountering some design issues right now.
For the original instances the default overcommit ratios are fine. But the request for new machines with other goals are like interfering with those original instances running in the same default availability zone.
So we are looking to configure some aggregate zones to keep this under control. As soon as we figure out a workable solution I will write about it in a new blog post.
But in the discussions to come to a solution one remark was , couldn't openstack tackle the issues of having an hypervisor with a growing load and memory issues itself by migrating instances to another hypervisors? Which is like a valuable argument to me. So before even looking into such a solution the feature of live migration should work..
Since we aren't using shared storage for our cloud this could be tricky. So I went to the web to inform myself about the different options.
I came across some very interesting reads, like the one of Thornelabs why you shouldn't use shared storage for example. Which has some valuable disadvantages of it besides the benefits. In our use case the benefits aren't outweighing against disadvantages. But as I have noticed in the whole openstack story there are options for almost every cloud use case and therefore the logical complexity of it. So for many amongst you shared storage could be a solution.
Another rather interesting one about live migration as a perk not a panacea
This has the consequence we can only use the live migration about with the kvm block storage migration which isn't really supported by the upstream developers and will probably phased out in the future for something more reliable.
We configured config drives as the default to get the metadata served to cloud-init at boot time for an instance. The default drive format (iso9660) has a bug in libvirt of copying a read-only disk. To tackle this one we configured the vfat format on all hypervisors.
Unfortunately this still doesn't solve our issue with it. Apparently when you use the live migrate option openstack doesn't take the overcommit ratio into account. Since our cloud is already overcommitted we don't have enough resources according to the live migration precheck to move instances around..
The proposed fix isn't released yet in the RDO kilo nova packages and patching a system isn't something I like to do in a semi-production environment.
So until today live migration isn't something we have tackled yet on our cloud. If you have solved this on your kilo RDO release cloud already feel free to enlighten me about it!