Monday, 15 September 2014

Speaking about libAttachSQL at Percona Live London

As many of you know I'm actively developing libAttachSQL and am rapidly heading towards the first beta release.  For those who don't, libAttachSQL is a lightweight C connector for MySQL servers with a non-blocking API.  I am developing it as part of my day job for HP's Advanced Technology Group.  It was in-part born out of my frustration when dealing with MySQL and eventlet in Python back when I was working on various Openstack projects.  But there are many reasons why this is a good thing for C/C++ applications as well.

What you may not know is I will be giving a talk about libAttachSQL, the technology behind it and the decisions we made to get here at Percona Live London.  The event is on the 3rd and 4th of November at the Millennium Gloucester Conference Centre.  I highly recommend attending if you wish to find out more about libAttachSQL or any of the new things going on in the MySQL world.

As for the project itself, I'm currently working on the prepared statement code which I hope to have ready in the next few days.  0.4.0 will certainly be a big release in terms of changes.  There has been feedback from some big companies which is awesome to hear and I have fixed a few problems they have found for 0.4.0.  Hopefully you will be hearing more about that in the future.

For anyone there I'll be in London from the 2nd to the 5th of November and am happy to meet with anyone and chat about the work we are doing.

Monday, 1 September 2014

New version of libAttachSQL, C connector for MySQL released!

It has been just over 2 weeks since the last libAttachSQL version was released.  I had a great vacation in the middle which for once meant that I didn't do any work for the week I was away :)

For those who don't know about it, libAttachSQL is a lightweight, non-blocking C connector for MySQL servers.  It is Apache 2.0 licensed so plays well with both Open Source and Commercially licensed applications.  I have been developing it for 2 months now as part of my work for HP's Advanced Technology Group.  It is hosted on GitHub and uses many freely available tools (such as Travis CI) to host and test various parts of the project.

Once again I thank everyone for the feedback I have received.  You all make it even more awesome to be working on this :)

So, on to the new version 0.3.0 alpha release.  This time round we have been focusing on zlib compression and SSL support.  Both of these features have been added and neither impacts the non-blocking aspect of the library.  The SSL part in particular was quite new to me, I've coded SSL into applications many times in the past, but I've never done it in a non-blocking way before.  It posed some interesting challenges but it was fun and appears to be working great now.

The biggest changes in this release are:

  • Fixes to the test cases and improvements to the CI used
  • Documentation improvements
  • Many minor bug fixes
  • Protocol compression (zlib) support
  • SSL encryption (OpenSSL) support
  • 32bit compiling works
For more information see the Version History section of the docs.

On to the next release which should complete the biggest pre-release features.  From there we can head towards our first GA release.

If you have any questions, feedback, etc... please feel free to leave comments, email me or open a GitHub issue.

Friday, 15 August 2014

libAttachSQL 0.2.0 alpha released!

Hot on the heals of last week's release we have released version 0.2.0 alpha of libAttachSQL.  For those who have missed my previous blog posts, libAttachSQL is a lightweight C connector for MySQL servers I'm creating with HP's Advanced Technology Group.  It has an Apache 2 license so is good for linking with most Open Source licenses as well as commercial software projects.

Changes in this release:

  • Added support for query result buffering
  • Passive connect on first query is now asynchronous
  • Improved memory handling
  • Many documentation changes, including API examples
  • Many other smaller fixes
For more information see the libAttachSQL documentation and the release itself can be found on the libAttachSQL website.

We have had some great feedback so far.  Thanks to everyone who has contacted me since the first release.  As always if you have any questions feel free to contact me or file an issue on GitHub.

Sunday, 10 August 2014

libAttachSQL 0.1.0 alpha released!

As I briefly mentioned in my previous post, I have been working on a new project for HP's Advanced Technology Group called libAttachSQL.

libAttachSQL is a lightweight C connector for MySQL servers.  It is Apache 2 licensed (and therefore compatible with many open source licenses as well as commercial use) and has a new asynchronous API.  With the new API you send a command which returns immediately and you poll until the library tells you there are results ready, this is very useful for applications that have many things going on that you do not want held up by waiting for the MySQL server to process a query.  In later posts I will give usage examples of this.

I am a great believer in release early/often so on Friday, 5 weeks after I started writing code (and docs), I have released the first alpha version of this connector.  The source of this release can be downloaded here.  For now this is a source-only release just to give a taste of the project so far.  At some point before GA binary packages will be released too.  Documentation for the library can be found on Read The Docs.

What it can currently do:

  • Compile in CLang and GCC on Linux and Mac
  • Cross-compile for Windows using MinGW64 (in Fedora only)
  • Connect to MySQL servers using TCP or Unix socket file
  • Send basic MySQL queries and retrieve results
  • Using an API similar to prepared statements it can automatically escape and convert data for your queries
  • Not a lot else
As the project progresses we will be adding many more features such as prepared statement support.


This project is completely open, using many available free services as described in my previous blog post.  We welcome people to come and kick the tyres and contribute in as small or large way as possible.  This can be simply filing a bug or feature request, contributing docs or code, etc...  One thing we could really use right now is someone with Debian/Ubuntu expertise to help us create the Debian package scripts (I'm not an expert at these and am struggling to make it work).  There is a GitHub issue open for this.

If you have any questions about the library feel free to contact me, comment on this blog post, open a GitHub issue or come chat on the #libAttachSQL channel on Freenode.

Friday, 8 August 2014

How cloud hosted services are helping open source

One big project I'm working on for HP's Advanced Technology Group right now is an Apache 2.0 licensed C connector for MySQL servers called libAttachSQL.  The whole process, not just the code itself, is helping us learn about new and current techniques in Open Source development.  Whilst I will be writing many posts about libAttachSQL in the future, today's post is about the free hosted services we are using around it.

GitHub

Almost all previous Open Source projects I have worked on in the past have been hosted on Canonical's Launchpad platform.  Over the last couple of years there has been a shift to using GitHub and almost everything I have worked on at HP has been hosted there.  Now there are many services that hook into GitHub so this seemed like the perfect opportunity to try some of them out.

The libAttachSQL project has its own organisation in GitHub and a couple of trees under this.  The service is fantastic and has grown a lot over the years in features and reliability.  The only thing I don't quite agree with is that they prefer a custom type of Markdown documentation over other formats.  Some reStructuredText support is there but it isn't as good as I would hope yet.  This is a really minor issue though and not something they should be knocked down on.

GitHub Pages

GitHub pages is a relatively new service created by GitHub.  Simply create a tree with a specific name, push some static content, and you are done!  There is also an easy method to get domains pointed to it so we have a GitHub page as the site for libattachsql.org.

Read The Docs

Every Open Source project I have worked on from Drizzle onwards has had its documentation in reStructuredText format which compiles into HTML, PDF and many other formats using a Python based tool called Sphinx (not to be confused with the search server).  In my opinion it is more flexible than Markdown format, especially when documenting APIs.

libAttachSQL's documentation was again written in reStructuredText format and is automatically compiled into HTML and PDF documentation using the free service Read The Docs.  This is hooked into GitHub so on a new push/merge Read The Docs will automatically generate a new version of the documentation.  We have pointed a subdomain to the Read The Docs output so that it can be easily accessed, docs.libattachsql.org.

I am extremely pleased with this service, not only is it free for Open Source projects but it makes documentation even more aesthetically pleasing than the basic Sphinx templates do.

Travis CI

Every source code project needs Continuous Integration.  There are many solutions to this, one of the most popular being Jenkins.  As with the RST documentation format every project I have worked on from Drizzle onwards uses Jenkins to test every branch before and after merging.  I could have used Jenkins for this project but my goal is not to own the hosting of anything.  So, for libAttachSQL I setup Travis CI.  This is a hosted service that is free for Open Source projects and has a paid-for variant for private projects.

Our Travis setup will test compiling in CLang and GCC in Linux (Travis uses Ubuntu 12.04), running a test suite in each.  Every virtual test host comes with a MySQL server already running for you to use in your tests and it was very simple to set this up.  We also get the Travis tests to build the documentation with nitpick mode and warnings as errors so any minor documentation problems are picked-up early.  All this is done with a very simple YAML script (although ours has got a little more complex with adding support for Coverity).

At a later date I want our builds to also run Valgrind checks and on the provided OS X platform, but I will work on getting those running at a later date.

Travis is a fantastic service and a breeze to setup and use.  The interface shouldn't be too unfamiliar if you have used Jenkins before.  My wish is that it supported more platforms.  I would really love a Fedora based builder, a more up-to-date Ubuntu and possibly Windows builders.  Although they do have OS X builders which is fantastic.

Coverity Scan

Coverity Scan is a static code analyser which is free to Open Source projects hosted on GitHub, it also hooks in nicely with Travis CI with Travis providing the analysis data from the code and builds to break down on the site.  This was the most complex of all of these services to setup but has given some fantastic results so far.  It has found 13 potential bugs in my code that CLang's lint and Valgrind didn't find.  This is really impressive, for starters there are incredibly strict flags set for building the project from git, also there was only one false positive.  Unfortunately there is a quota limit for Open Source projects so we only run this occasionally rather than every merge.

Conclusion

We have managed to have all of the services that we would need to setup and manage setup for us completely for free and no hosting for us to manage.  And these are all awesome services and most were very quick to setup.  I thank all of the companies providing these services, it has easily shaved a week off my time setting up machines to host our project and many more hours managing the services.

Over the next couple of weeks I will be talking a lot more about the libAttachSQL project, so look out for those posts.

Thursday, 31 July 2014

Tripping up using MinGW

One of the projects I am currently working on for HP's Advanced Technology Group is written in C/C++ and is using MinGW to cross-compile from Linux to Windows.  I have done this in several other projects quite successfully but this week I was tripped up by something when using MinGW in Fedora 20.

The build system I am using is DDM4 which is a collection of m4 files that can be used as a template for any project.  Back when working on libdrizzle 5.x we added support for MinGW cross-compiling to this.  It is designed to enable as many good compiler warnings as possible (and makes them trigger as errors) and if the compiler support PIE (Position Independent Executable) this will be enabled too.

MinGW's compiler is based on GCC and as such supports PIE to some extent.  Unfortunately I found that the '-pie' compiler option was causing the entry point to the executable to point away from the main() function.  It was hitting MinGW's dummy main() instead and the executable was returning immediately.

I have made a patch to disable PIE for MinGW and upstreamed it to DDM4.  I leave this as a warning for anyone that it hits in the future (including me).

As an extra side note, MinGW 64bit in Ubuntu seems to be completely broken.  The binary packages only contain a few text files and directories, no actual binaries.

Thursday, 26 June 2014

CoreOS Review

I have spent a few days now playing with CoreOS and helping other members of HP's Advanced Technology Group get it running on their setups.

Today I thought I would write about the good and the bad about CoreOS so far.  Many components are in an alpha or beta state so things may change over the coming months.  Also as a disclaimer, views in this post are my own and not necessarily those of HP.

Installation


As stated in my blog post yesterday, I have been using CoreOS on my Macbook Pro using Vagrant and VirtualBox.  This made it extremely easy to setup the CoreOS cluster on my Mac.  I made a minor mistake to start with, and that is to not configure the unique URL required for Etcd correctly.  A mistake a colleague of mine also made on his first try so it likely to be a common one to make.

I initially had VirtualBox configured to use a Mac formatted USB drive I have hooked-up.  Vagrant tried to create my CoreOS cluster there and during the setup the Ruby in Vagrant kept spinning on some disk reading routine and not completing the setup.  Debug output didn't help find the cause so I switched to using the internal SSD instead.

CoreOS


CoreOS itself appears to be derived from Chrome OS which itself is a fork of Gentoo.  It is incredibly minimal, there isn't even a package manager that comes with it.  But that is the whole point.  It is designed so that Docker Containers are run on top of it providing the application support.  Almost completely isolating the underlying OS from the applications running on top of it.  This also provides excellent isolation between say MySQL and Apache for a LAMP stack.

It is a clean, fast OS using many modern concepts such as systemd and journald.  Some of these are only in the bleeding-edge distributions at the moment so many people may not be familiar with using them.  Luckily one of my machines is running Fedora 20 so I've had a play with these technologies before.

Etcd


CoreOS provides a clustered key/value store system called 'etcd'.  The name of this confused many people I spoke to before we tried it.  We all assumed it was a clustered file store for the /etc/ path on CoreOS.  We were wrong, although that is maybe the direction it will eventually take.  It actually uses a REST based interface to communicate with it.

Etcd has been pretty much created as a new project from the ground-up by the CoreOS team.  The project is written in Go and can be found on Github.  Creating a reliable clustered key/value store is hard, really hard.  There are so many edge cases that can cause horrific problems.  I cannot understand why the CoreOS team decided to roll their own instead of using one of the many that have been well-tested.

Under the hood the nodes communicate to each other using what appears to be JSON (REST) for internal admin commands and Google Protobufs over HTTP for the Raft Consensus Algorithm library used.  Whilst I commend them for using Protobufs in a few places, HTTP and JSON are both bad ideas for what they are trying to achieve.  JSON will cause massive headaches for protocol upgrades/downgrades and HTTP really wasn't designed for this purpose.

At the moment this appears to be designed more for very small scale installations instead of hundreds to thousands of instances.  Hopefully at some point it will gain its own protocol based on Protobufs or similar and have code to work with the many edge cases of weird and wonderful machine and network configurations.

Fleet


Fleet is another service written in Go and created by the CoreOS team.  It is still a very new project aimed at being a scheduler for a CoreOS cluster.

To use fleet you basically create a systemd configuration file with an optional extra section to tell Fleet what CoreOS instance types it can run on and what it would conflict with.  Fleet communicates with Etcd and via. some handshaking figures out a CoreOS instance to run the service on.  A daemon on that instance handles the rest.  The general idea is you use this to have a systemd file to manage docker instances, there is also a kind-of hack so that it will notify/re-schedule when something has failed using a separate systemd file per service.

Whilst it is quite simple in design it has many flaws and for me was the most disappointing part of CoreOS so far.  Fleet breaks, a lot.  I must have found half a dozen bugs in it in the last few days, mainly around it getting completely confused as to which service is running in which instance.

Also the way that configurations are expressed to Fleet are totally wrong in my opinion.  Say, for example, you want ten MySQL docker containers across your CoreOS cluster.  To express this in Fleet you need to create ten separate systemd files and send them up.  Even though those files are likely identical.

This is how it should work in my opinion:  You create a YAML file which specifies what a MySQL docker container is and what an Apache/PHP container is.  In this YAML you group these and call them a LAMP stack.  Then in the YAML file you specify that your CoreOS cluster needs five LAMP stacks, and maybe two load balancers.

Not only would my method scale a lot better but you can then start to have a front-end interface which would be able to accept customers.

Conclusion


CoreOS is very ambitions project that in some ways becomes the "hypervisor"/scheduler for a private Docker cloud.  It can easily sit on top of a cloud installation or on top of bare metal.  It requires a totally different way of thinking and I really think the ideas behind it are the way forward.  Unfortunately it is a little too early to be using it in anything more than a few machines in production, and even then it is more work to manage than it should be.