Friday, 18 February 2011

Last Week in Drizzle

It has been a while since we have done one of these so I thought I might try and resurrect the tradition.  So here is my first "Last Week in Drizzle".

Replication


The original plan for replication was to use a Tungsten Replicator based solution to transfer the transaction logs (similar to MySQL's binary logs).  Unfortunately this can't be completed in time for the GA release so we have switched to a master-slave solution similar to MySQL.  The Tungsten solution is still something we plan to finish though.

Replication events are stored using Google Protocol Buffer messages in an InnoDB table, these events are read by the slave, stored locally and applied.  The advantage of the Google Protocol Buffer messages is a script or program can be knocked up in pretty much any language in minutes to read our replication log.

Unfortunately this sudden change in replication method means we could not complete the slave code in time for the RC release, which in turn means we are creating a second RC release as explained further down in this blog post.

Special thanks goes to David Shrewsbury, Patrick Crews and Joe Daly for making this happen.

New Release


Our first RC has been released this week.  In this release we have:

  • Drizzle server can now fork to background via. --daemon.  This was primarily implemented to help RedHat/Fedora init.d scripts.

  • Implicit Cartesian Joins no longer work this is to prevent runaway queries.

  • Improvements to the replication transaction log.

  • Many other bug fixes and improvements.


Race to GA


Due to the late entry replication code we intend to have one more RC whist we test it to death in as many horrid ways as Patick Crews can find.  So the current release plan for Drizzle7 is now:

RC2 - 28th February 2011
GA - 14th March 2011

New RPM Repository


Derks has created a new RPM repository for us at rpm.drizzle.org, more details on this can be seen here.

Windows Jenkins Slave


Monty Taylor has created a Windows slave for our Jenkins Continuous Integration testing system.  This means we now test every trunk merge for libdrizzle regressions in Windows.

Docs Day


On Monday 21st February we have our docs day, the developers will be reviewing the entire docs site for technical errors and any improvements that can be made (such as missing topics).  We encourage anyone who would like to improve the quality of our docs to join in this effort, contact us on #drizzle on Freenode or file a bug if you spot anything we could improve on.

Update 2011-02-20: We are postponing this until Wednesday 23rd February due to documentation merges which won't have quite hit trunk by Monday.

Final Thoughts


I'm going to try and do one of these every week, so if you have any feedback or topics you would like me to cover, please let me know.

16 comments:

  1. [...] This post was mentioned on Twitter by drizzlenews, Andrew Hutchings. Andrew Hutchings said: Last Week in Drizzle http://j.mp/hrJNzp [...]

    ReplyDelete
  2. Hi, weren't there originally plans to use gearman for replication? Or does this still apply and you only don't mention it? Thanks!

    ReplyDelete
  3. Yes, that is a good point. We were also looking into Gearman and RabitMQ (and I believe we are at least some of the way to having those complete too). I believe we went we Tungsten because one of the developers there is helping us with it.

    David Shrewsbury knows much more about this than I do, he would be a good person to ask about it. I only stepped into our replication recently when fixing the InnoDB transaction log table.

    ReplyDelete
  4. What is the schema for the InnoDB table used to store replication events?

    ReplyDelete
  5. Hi Mark,
    It is stored in InnoDB's internal dictionary and exposed via. DATA_DICTIONARY.

    ReplyDelete
  6. I should also note that the table is called DATA_DICTIONARY.SYS_REPLICATION_LOG which will show when you use --innodb.replication-log=1. In the RC release DDL against the table is blocked but not DML writes (I've literally just pushed up a patch for this which combines with some work Brian has done to stop replication on individual tables).

    So in this RC release DML writes against that table is bad. But in future it can be used to help prune the log.

    There is also a DATA_DICTIONARY.INNODB_REPLICATION_LOG which shows the SYS_REPLICATION_LOG events in a human-readable form.

    ReplyDelete
  7. Hi Pavel,

    There currently are plugins for replicating through Gearman (I haven't used that personally) as well as RabbitMQ supplied by the community. Unless the contributors of these plugins have kept them up to date with the latest development changes, I'm not sure how well those still work.

    We were originally looking to have Tungsten Replicator as our end-to-end solution until we could get a native solution. However, we went ahead and did a major push to get a native solution in place for GA. I expect work on Tungsten support to continue by the community developer leading that effort.

    ReplyDelete
  8. Speaking from the Tungsten dev side, we might want to try to find somebody to sponsor Tungsten support for drizzle.

    One issue with InnoDB that I think you'll run into is that large scale applications will likely be pretty slow because you have a lot of SQL processing and commit overhead. Another problem is that storing large quantities of binary data causes massive tablespace bloat and is quite difficult to manage on live systems. (Example: pruning multi-hundred million line tables is very painful on live systems and can lock the table or worse still cause replicator failures due to lock wait timeouts.) You may have fixes for these issues; just want to let you know about a couple of our scars from earlier versions of Tungsten.

    Meanwhile on the Tungsten side we are working to open up replication fully under GPL V2 which will expose the Tungsten disk log, which is fast, self-managing, and works well when you have billions of transactions in the log. That will be quite beneficial for large-scale usage.

    ReplyDelete
  9. We would welcome any help from the Tungsten side to work with it. I believe we are most of the way there too. I wasn't initially working on replication so I don't know the current state of play.

    Well, we have solved a lot of these problems I believe. For starters the message is committed along with the actual transaction (apart from DDL). Also the transaction message almost always contains row changes rather than SQL (I think ALTER TABLE is currently the only exception). Although tablespace bloat is possible. Pruning possibly could cause deadlocking issues if it isn't done correctly.

    ReplyDelete
  10. Andrew - Can you describe the schema here?

    ReplyDelete
  11. CREATE TABLE `SYS_REPLICATION_LOG` (
    `ID` BIGINT,
    `SEGID` INT,
    `COMMIT_ID` BIGINT,
    `END_TIMESTAMP` BIGINT,
    `MESSAGE_LEN` INT,
    `MESSAGE` BLOB,
    PRIMARY KEY (`ID`,`SEGID`) USING BTREE,
    KEY `COMMIT_IDX` (`COMMIT_ID`,`ID`) USING BTREE
    ) ENGINE=InnoDB COLLATE = binary REPLICATION = FALSE

    So ID is a transaction ID, SEGID is segment ID (when messages are broken up). The rest hopefully should make sense. REPLICATION = FALSE means "don't log changes to this table"

    I had to make a slight change to the InnoDB internal parser to support 8byte INTs to do this.

    ReplyDelete
  12. How is commit_id different from id?

    ReplyDelete
  13. Joe Daly added that so that we could number commit sequence and we could identify commit order.

    ReplyDelete
  14. Hi

    I was close to getting tungsten working a couple of months ago but found myself in a refactoring rabbit hole due to the fact that drizzle can interleave transactions in the log and ran out of time due to day-job-crap

    i'll give it another go, whenever time allows

    ReplyDelete
  15. How does it affect performance to have everyone committing to this table? This seems to offer some opportunities to create serialization points. In Tungsten, we address commit overhead using block commits on updates, but given that you commit with user transactions you won't have that option. Mark C and others can comment from experience; I'm just guessing.

    You might want to run sysbench to put a few hundred million transactions into the system and see how things behave. 50M transactions a day is not unusual in busy systems; if you assume 7 days worth of logs that gets you to 350M rows. Try setting to hold them at 7 days (e.g., with a cron job to trim old xacts) and see (a) how big the InnoDB tablespace/file grows and (b) impact on the system when purges run while the system is fully loaded. I would guess for small systems that have small InnoDB buffer cache sizes you will run out of space for locks. :(

    I'm not shooting arrows BTW. We used InnoDB for the Tungsten store though and it caused us a lot of pain.

    ReplyDelete
  16. Hmm... I don't know if sysbench and our other suites have been run with the transaction log recently. I'll try and get this done on our sysbench machines early next week.

    ReplyDelete