Today I thought I would write about the good and the bad about CoreOS so far. Many components are in an alpha or beta state so things may change over the coming months. Also as a disclaimer, views in this post are my own and not necessarily those of HP.
As stated in my blog post yesterday, I have been using CoreOS on my Macbook Pro using Vagrant and VirtualBox. This made it extremely easy to setup the CoreOS cluster on my Mac. I made a minor mistake to start with, and that is to not configure the unique URL required for Etcd correctly. A mistake a colleague of mine also made on his first try so it likely to be a common one to make.
I initially had VirtualBox configured to use a Mac formatted USB drive I have hooked-up. Vagrant tried to create my CoreOS cluster there and during the setup the Ruby in Vagrant kept spinning on some disk reading routine and not completing the setup. Debug output didn't help find the cause so I switched to using the internal SSD instead.
CoreOS itself appears to be derived from Chrome OS which itself is a fork of Gentoo. It is incredibly minimal, there isn't even a package manager that comes with it. But that is the whole point. It is designed so that Docker Containers are run on top of it providing the application support. Almost completely isolating the underlying OS from the applications running on top of it. This also provides excellent isolation between say MySQL and Apache for a LAMP stack.
It is a clean, fast OS using many modern concepts such as systemd and journald. Some of these are only in the bleeding-edge distributions at the moment so many people may not be familiar with using them. Luckily one of my machines is running Fedora 20 so I've had a play with these technologies before.
CoreOS provides a clustered key/value store system called 'etcd'. The name of this confused many people I spoke to before we tried it. We all assumed it was a clustered file store for the /etc/ path on CoreOS. We were wrong, although that is maybe the direction it will eventually take. It actually uses a REST based interface to communicate with it.
Etcd has been pretty much created as a new project from the ground-up by the CoreOS team. The project is written in Go and can be found on Github. Creating a reliable clustered key/value store is hard, really hard. There are so many edge cases that can cause horrific problems. I cannot understand why the CoreOS team decided to roll their own instead of using one of the many that have been well-tested.
Under the hood the nodes communicate to each other using what appears to be JSON (REST) for internal admin commands and Google Protobufs over HTTP for the Raft Consensus Algorithm library used. Whilst I commend them for using Protobufs in a few places, HTTP and JSON are both bad ideas for what they are trying to achieve. JSON will cause massive headaches for protocol upgrades/downgrades and HTTP really wasn't designed for this purpose.
At the moment this appears to be designed more for very small scale installations instead of hundreds to thousands of instances. Hopefully at some point it will gain its own protocol based on Protobufs or similar and have code to work with the many edge cases of weird and wonderful machine and network configurations.
Fleet is another service written in Go and created by the CoreOS team. It is still a very new project aimed at being a scheduler for a CoreOS cluster.
To use fleet you basically create a systemd configuration file with an optional extra section to tell Fleet what CoreOS instance types it can run on and what it would conflict with. Fleet communicates with Etcd and via. some handshaking figures out a CoreOS instance to run the service on. A daemon on that instance handles the rest. The general idea is you use this to have a systemd file to manage docker instances, there is also a kind-of hack so that it will notify/re-schedule when something has failed using a separate systemd file per service.
Whilst it is quite simple in design it has many flaws and for me was the most disappointing part of CoreOS so far. Fleet breaks, a lot. I must have found half a dozen bugs in it in the last few days, mainly around it getting completely confused as to which service is running in which instance.
Also the way that configurations are expressed to Fleet are totally wrong in my opinion. Say, for example, you want ten MySQL docker containers across your CoreOS cluster. To express this in Fleet you need to create ten separate systemd files and send them up. Even though those files are likely identical.
This is how it should work in my opinion: You create a YAML file which specifies what a MySQL docker container is and what an Apache/PHP container is. In this YAML you group these and call them a LAMP stack. Then in the YAML file you specify that your CoreOS cluster needs five LAMP stacks, and maybe two load balancers.
Not only would my method scale a lot better but you can then start to have a front-end interface which would be able to accept customers.
CoreOS is very ambitions project that in some ways becomes the "hypervisor"/scheduler for a private Docker cloud. It can easily sit on top of a cloud installation or on top of bare metal. It requires a totally different way of thinking and I really think the ideas behind it are the way forward. Unfortunately it is a little too early to be using it in anything more than a few machines in production, and even then it is more work to manage than it should be.