Dreaming up Zotonic Deployment with Github

Zotonic would benefit from a module that allows deployment by pushing to Github. This module could also accommodate periodic database backups to preserve content changes.

Though it may be fun, it is not ideal to dive right in to implementing such a beast so I will start by exploring the design. This module will be called mod_github_sync.

In order to effectively handle development changes that are pushed to Github there are two high-level options: polling and event-driven. As with most things it is preferable to do this in an event-driven manner. I have a functioning polling-based synchronization running for Verafin's website. It uses a simple cron-based approach which suffers from the obvious flaw that it requires admin access. Either way you slice it you run into issues with operations burden or security gaps. As a result I will be making mod_github_sync use an event-driven system directly in Zotonic.

How do you go about getting an event trigger when someone pushes to a Github repository? Github supports Post-Receive Hooks which let a repository owner or administrator set a URL to POST to when new objects are received (basically when a push occurs). The other side of this is a mechanism within Zotonic to accept this POST and trigger a background pull and rebuild of the code and templates. In my experience the best way do this is with a Webmachine resource. In order to support a similar requirement for IPN I wrote such a resource in mod_paypal. It seems like a simpler rendition of that resource could work.

What other issues need to be considered? There are certainly some concerns about blocking Github with a slow resource. A clear solution to that is to spawn a process or better yet use Zotonic's event system to trigger the long-running bits.

Another problem is how to handle the content backup aspect of mod_github_sync. There was some talk on zotonic-developers a while back about a cron-esque system for running scheduled tasks in Zotonic. That could work, but it's a concept, not a product. Another alternative would be to code-shop bits of mod_backup since it does operate on a schedule. Generally speaking it is fairly easy to do a roughly timed operation in Erlang anyway using a receive after statement.

The final consideration I can think of right now is the synchronization point at which the database can be backed up safely. My assumption going in is that the database Zotonic uses is always in a consistent state as viewed by other clients. Based on this assumption I intend to run pg_dump from os:cmd/1 to write the SQL from the database to a tracked file within the site directory.

However, there is another obvious risk: running the scheduled backup to Github while processing a Post-Receive and attempting to pull. Git's locking might in theory handle this, but it seems imprudent to rely on it when I could synchronize myself. To synchronize something in Erlang both tasks would use a common process to represent the lock. It seems very likely that Zotonic has some site-based synchronization techniques that can be used.

Alright, it is settled! Next time I get a solid chunk of time to site down and program I am going to build mod_github_sync using Post-Receive Hooks to keep in sync with development and receive after to schedule hourly content backups. I will run the synchronization issue by zotonic-developers.