David Vedvick

Notes

Use Git to Manage Your Blog History!

One of the major problems of rolling your own weblog is properly managing the history of your posts.

The aim of this post is to elucidate how one can easily manage blog history, using only Git.

Why?

The best known methods for managing history of text documents have always been terrible. Yes, I'm speaking of Wordpress, but also commercial solutions like SharePoint, or the version tracking that has been built into Microsoft Word for the longest time.

Here's a list of cons that I always think of when using these tools:

  1. Inconsistently track history
    
  2. Sometimes can leave comments, sometimes can't
  3. Usually diffing is either unavailable or is built using some proprietary/internal code that probably doesn't work well
  4. Obfuscated via dense database models, XML formats, and/or binary formats
  5. Difficult to use third party tools with them
  6. Content management systems, which is what all blog engines are, need security to manage the blog. These security systems usually come riddled with bugs and security flaws.

Along comes lowly git, the little DCVS tool that could, which fills in the above gaps nicely. Combine this with a nice text format such as markdown, and you've got yourself a nice, versioned, document management system.

However, it does come with its own set of cons: 1. The git learning curve 2. Git doesn't natively take post metadata 3. Git is a version control system, and thus doesn't track file metadata either — so "true" file creation time, last modified time are not available 4. Wrapping git commands up in your favorite server-side language can sometimes be tricky 5. Versioning doesn't happen automatically, but rather on intentional commits

None of this is a show-stopper however. Yes, git is ridiculous to learn. Yes, you can't get "true" file creation time. But none of this certainly bothered me much.

How?

This is how I did it with nodejs:

  1. Create a git repo (git init) where you want your posts to reside.

  2. Use a nice sane format to store metadata about your posts. I'd personally go with at least a JSON-like format. Mine looks like below:

    title: Use Git to Manage Your Blog History
    author: vedvick
    description:
    ---
    

    The --- signals to the parser that the metadata section is complete.

  3. Grab the posts from a configured or constant location. This is my highly sophisticated version:

    glob(path.join(notesConfig.path, '*.md'), function (err, files) { ... });
    

    Following a simple convention of prefixing filenames with the date the post is created, such as 20151006-use-git-to-manage-your-blog-history.md, the server can then easily and reproducibly sort the files by the created date.

  4. Parsing the notes has a little sophistication to it. Here's the code used on my server in full:

    var parseNote = function (file, callback) {
        parseNote.propMatch = /(^[a-zA-Z_]*)\:(.*)/;
    
        fs.readFile(file, 'utf8', function (err, data) {
            if (err) {
                callback(err);
                return;
            }
    
            var textLines = data.split('\n');
    
            var fileName = path.basename(file, '.md');
            var newNote = {
                created: null,
                pathYear: fileName.substring(0, 4),
                pathMonth: fileName.substring(4, 6),
                pathDay: fileName.substring(6, 8),
                pathTitle: fileName.substring(9)
            };
    
            var lineNumber = 0;
            for (var i = lineNumber; i < textLines.length; i++) {
                lineNumber = i;
                var line = textLines[i];
    
                if (line.trim() === '---') break;
    
                var matches = parseNote.propMatch.exec(line);
                if (!matches) continue;
    
                var propName = matches[1];
                var value = matches[2].trim();
    
                switch (propName) {
                    case 'created_gmt':
                        newNote.created = new Date(value);
                        break;
                    case 'title':
                        newNote.title = value;
                        break;
                }
            }
    
            newNote.text = textLines
                                .slice(lineNumber + 1)
                                // add back in the line returns
                                .join('\n');
    
            if (newNote.created !== null) {
                callback(null, newNote);
                return;
            }
    
            if (!notesConfig.gitPath) {
                newNote.created = new Date(newNote.pathYear, newNote.pathMonth, newNote.pathDay);
                callback(null, newNote);
                return;
            }
    
            exec('git -C "' + notesConfig.gitPath + '" log HEAD --format=%cD -- "' + file.replace(notesConfig.path + '/', '') + '" | tail -1',
                function (error, stdout, stderr) {
                    if (error !== null) {
                        callback(error);
                        return;
                    }
    
                    newNote.created = new Date(stdout);
                    callback(null, newNote);
                }
            );
        });
    };
    

    The neatest part here (and where git or some other version control system shines) is using it to determine the note's created date:

    exec('git -C "' + notesConfig.gitPath + '" log HEAD --format=%cD -- "' + file.replace(notesConfig.path + '/', '') + '" | tail -1',
         function (error, stdout, stderr) {
             if (error !== null) {
                 callback(error);
                 return;
             }
    
             newNote.created = new Date(stdout);
             callback(null, newNote);
         }
     );
    

    Note how this doesn't actually return the true "created" timestamp of the file, but it does, in my opinion, return a timestamp that is close enough.

  5. When drafting a new post, create a new branch so the draft can be worked on in isolation without affecting work on other posts (for example, I posted an entirely different post while drafting this one). For this I also follow another convention: post/<post-name-here>. Of course, the convention is optional but I think at the very least it encourages consistency.

  6. Finally, merge posts into master and push it to a web server. Then add a post-receive hook that checks out master to the location determined above: GIT_WORK_TREE=<note-location> git checkout -f

Note posted on Tuesday, October 6, 2015 7:21 AM CDT - link