Introduction and A Simple Website Toolchain

After having used Wordpress as a blogging platform for my Erasmus exchange blog, I got interested in writing the occasional rant and publish it on the interwebs. So here it is, my new blog on a simple webspace.

Wordpress is nice and I was quite impressed by its administration interface. But for a small blog, this seemed to be just too much. Setting up a Wordpress instance and maintaining it to ensure as much security as possible was too much of a hassle, so I implemented my own dumbed-down static html toolchain.

I wanted to support only a small set of features and tried to use existing tools as much as possible. My plan was to..

I will explain my small toolchain in this first blog entry and describe its usage. To avoid repeating what has been written somewhere in the web already, I will only describe what I deem interesting. If you are missing information, write me an email.

Markdown To HTML

The main tool in this toolchain is a simple bash script, which converts a set of Markdown annotated files into HTML using the converter pandoc. To use this script, you can issue a simple call on the command line:

$ *.md

If you have got text files in the current directory that end with the file extension .md, then this will create a HTML file for each of those files. The files contain a navigation section which contains a link to each of the other created files, where the links are ordered by date. The date must be supplied as the first line in the text file. You can use % to comment out that line if you want to, this shouldn't influence sorting.

Deploying The Files

Now it gets funky. The shell script above is very straight forward, as it only uses some bash-magic to wrap the output of pandoc with HTML. To complete a rather simplified CMS, some deployment mechanism is needed. This is were the git versioning system comes into play. I use it for my local "blog repository" to keep track of the files and added a post_commit hook to automatically deploy the blog as soon as I committed an update to a blog entry.

The directory structure of this website is very simple, but needs some explaining to understand the hook explained below. The site consists of two directories, a top-level directory containing index.html and other top-level files, and a subdirectory containing the blog, which is called blog.

The script to implement this is kind of a hack and I would appreciate comments on how to simplify it. The task it does can be summarized as follows:

  1. Create the HTML pages from the Markdown files
  2. Copy all CSS and JS files into the subdirectory containing the blog. I do this as I needed those files within a top- and subdirectory but didn't feel like adding more code to the static_pages script for referencing the files from another directory.
  3. After copying, create a simple .tar.gz archive, upload it using scp and unpack it using a ssh remote command
  4. Delete the remaining temporaries

Another option would be to use sed and post-process the files generated to fix the paths. I didn't implement this for the sake of learning some more bash scripting.

The Hook

This is script which implements the task described above:

# Create HTML from Markdown files and upload them to my account at the HU Berlin

# run the script twice to generate different navigation menus
/home/evnu/bin/ *.md
cd blog
/home/evnu/bin/ *.md
cd -

# create an upload archive

## copy CSS and JS files that exist in the top directory and that don't exist
## in blog/

COPY=$(comm \
        <(ls -l blog/ | egrep "\.css|\.js" | awk '{print $9;}' | sort) \
        <(ls -l *.{css,js} | awk '{print $9;}' | sort) -1 -3

### doesn't overwrite anything.
for file in $COPY
    cp $file blog

## find all files to create the archive
find . -name "*.html" -o -name "*.css" -o -name "*.js" | tar -czf archive.tgz --files-from -

## remove the temporary files
for file in $COPY
    rm blog/$file

# copy to the remote host
scp archive.tgz mamuelle@<domain>:~/.public_html/
ssh mamuelle@<domain> tar xfz .public_html/archive.tgz -C .public_html

# unpack and eat errors. i don't want to see them.
ssh mamuelle@<domain> rm .public_html/archive.tgz 2> /dev/null
rm archive.tgz

# set the rights on the remote host
ssh mamuelle@<domain> chmod a+r -R .public_html

The script could use some vamping up, of course: it should provide a trap handler to clean up when the user hits Ctrl+c, some refactoring to avoid using find twice would be nice and maybe more status output could be convenient. But it does the trick for now, so I am fine with it.

Corner Cases, Issues And To Dos

Some issues remain with the toolchain. One important issue is, that the script won't work properly with filenames containing whitespace. I didn't fix this, as I never use whitespace in filenames, and you shouldn't do that as well. One useful addition for the hook above would be an implementation that only uploads the files when merging into a deployment branch. This would resemble a "publish" feature known by other CMS.