Strategy for versioning documents?

I have to build an application that will manage text documents. Versioning those documents would be awesome.

I could implement a naive versioning strategy. A cron every night that will check if the current version is different from the version of yesterday ; and if yes, saved yesterday in the old versions.

But I’d love to be able to implement something more git like with. Except I am not sure where to start.

=> Exactly git like is not really an option. I need the file to be a complete snapshot and not a collection of references to previous states of itself.

Any leads ?

I think what you have is actually better than doing Git, I think there was one Meteor project in one of the live events saying they were also thinking of using Git, but they opted to have their own too. [I’ll update this reply once I figure out the name]

The reason I say it is better is because it would likely satisfy your business scenario. Users are not developers and would not really know what a proper versioning tool would do. You should be able to make it more fine grain and say save every 30 minutes. If space is a concern there’s https://github.com/kpdecker/jsdiff that may help with diffing and patching TEXT. I haven’t tried it out, I just did a search for javascript diff. I’d recommend you plan to do that space optimization later. I think you can manage the meta data on your own so @rhywden’s suggestion for flitbit/diff may be a bit overkill for what you need but then again you can probably just use it for a sub-document and be a bit more future-proof in case you decide to version not-just-plain-text.

Though from your update, you stated you wanted to have full copies, I would agree with you on that, it is simpler. What I would do though is at least have 30 minute incremental diffs so a user can go back to a previous version within the day, but at the end of day a full snapshot is stored.

In addition, I would recommend giving the user the capability of explicitly versioning, but only give them two options much like SharePoint

  • Major (1.9 -> 2.0)
  • Minor (1.10 -> 1.11)

And when performing an explicit version, the full copy is stored.

If you want to store the difference between two (text) files, the fastest, easiest and most likely less bug-prone way would be to search for a diff-tool.

As a last resort, you could use the command-line diff and parse the result.

I would not recommend rolling your own - there’s a metric manureton of stuff you’d probably overlook and which would bite you at some point. Unix vs Mac vs. Windows line endings would be just one example.

Something like this, for example: https://github.com/flitbit/diff