03 Feb 2017, 19:07

Hugo, the static site generator

Hugo, a Static Site Generator

In my last post, I covered the rationale behind using a static site generator. Static site generators are not just for creating blogs. They can also be used to create online resumes, company sites, online documentation, etc.

The default choice for static site generator is Jekyll, which has the most support, but it’s troublesome to install and use. Hugo is a popular alternative that is easier to install, and faster to work with. It’s implemented in Golang, a.k.a. Go. This means it is written in a statically compiled language (The Best Kind) and is completely dependency free. Dependency hell is the bane of my existence. It’s like work that you have to do before you can start working. Anyway, let’s look at how to get started.

Hugo Install Process (MacOS)

This is so simple, and its simplicity is the reason why I went with Hugo after trying the more popular Jekyll, which was a mess.

brew update && brew install hugo
hugo new site myBlog
cd myBlog
git clone https://github.com/dplesca/purehugo.git themes/purehugo
echo "theme = purehugo" >> config.toml

Creating or customizing themes is beyond the scope of this post, but what we are doing here is “installing” a pre-baked Hugo theme, and then setting it as our default.

Hugo Workflow: Drafting & Publishing a Post (MacOS)

In order to create a new post for your blog:

cd myBlog
hugo new post/myReviewOfHugo.md
open content/post/myReviewOfHugo.md # write the post in your text editor

# Optional: launch a local webserver, give it a sec, and preview the blog
hugo server & sleep 2 && open http://localhost:1313/blog/
killall hugo # because we left hugo running in the background there

While the server is running, you can actually continue to edit the post in your editor. The server will live update the view in your browser. This is optional, but it will verify that everything will look correct when you publish.

When you’re satisfied, you can generate the actual web content to disk, and publish it. The following steps assume you are using Github Pages, so the publish is made using a git push.

# You must already have a GitHub project, and in its settings page, and have set the GitHub pages to "master branch / docs". In this example, the project name is "blog".

# These are the one-time Hugo steps:
echo "publishDir = docs" >> config.toml
echo "baseURL = https://myname.github.com/blog" >> config.toml

# These are the one-time Git steps:
rm -rf themes/.git # delete existing git files so they don't interfere
git init  # turn this directory into a git repo
git remote add origin https://github.com/myname/blog.git

# These are the only steps needed every time you publish new content:
hugo  # this generates HTML + JS + CSS under the publishdir (blog/docs/)
git add -A
git commit -m "Add a blog post about whatever."
git push

That’s all there is to it, although you can always use a different Git client if you don’t like the command line. I sure as hell don’t like it (I use Atlassian Sourcetree) but it’s up to you.

Post Metadata: WTF is “Front Matter” ?

In each post (each Markdown file), there is some metadata in a header at the top of the file, called “front matter.” Jekyll was the first to introduce this concept (in name, at least), but it is common across other static generators now. Hugo lets you write front matter in YAML, JSON or TOML (the default). If you’ve worked in web development surely you’ve heard of JSON, but now you may be asking WTF is YAML and TOML?

These are syntaxes invented specifically for controlling the settings of static site generators. It seems to be a case of “reinventing the wheel” of INI files, which have been around for decades. Basically, a config file. Key-value pairs. Associative array. Hash table (please don’t shorten it to just “hash,” words have meanings, know the difference). Dictionary. They’re all basically the same thing. YAML started in 2009 or so, as a minimalist-syntax alternative to JSON, which itself was a minimalist alternative to XML. We’ll get this right some day.

The CEO of GitHub and inventor of Jekyll, probably high on the smell of his own farts, in 2013 decided that YAML needed to be even more minimal, and renamed this idea after himself (“TOM”), and thus was born TOML, which primarily because of the fame of the creator has now spread to a few other projects. Thus, we have minimalized almost all the way back to INI files (except now it has been “standardized”). Progress.

Oh and by the way, none of these are actually markup languages at all. They just aren’t. The insistence on propagating the use of the acronym letters -ML for config file formats is basically an inside joke at this point.

The takeaway for me is that in the mid-2000s it became fashionable to ditch braces and brackets in all syntax for everything, in favor of careful indentation. Thus returning to the fashion of the 1970s and FORTRAN. You know what’s popular today, though? Look at Go, Rust, and Swift. Yea that’s right, compiled languages with curly braces are back again. Urge to kill risinnnnnnng. All right, deep breaths.

Anyway, within this “front matter,” you can define tags and categories, timestamps, and titles for every post. For examplte, the front matter for this post was defined as such:

+++
Tags = ["web","blogging","Hugo", "Jekyll", "YAML", "TOML"]
Description = "Initial impressions on the static site generator, Hugo"
date = "2017-02-03T19:07:12-05:00"
title = "Hugo, the static site generator"
Categories = ["web","blogging","Hugo"]
+++

You can also set optional variables like a publish date in the future (Hugo will not render it to the content directory until this date), or an alias (if you want to forward visitors from another URL to this post instead).

The configuration file for your Hugo site, config.toml, is also in this syntax.

That more or less covers the basics of Hugo, and static site generators like it. My next post will be about Markdown (an actual markup language).

03 Feb 2017, 15:43

How to Blog in 2017

under blogging security web

My first blog, back in the early 2000s, was on a hosted blogging platform known as Blogger. It was simple and convenient: as the admin you just logged into the Blogger service, edited posts in your browser, and hit publish. This is basically how Tumblr still works today, although Tumblr’s innovation was to include media file hosting and allow everyone to repost each others’ content.

But Blogger content was static, and textual. You could post a few paragraphs of text, and embed images if they were hosted elsewhere. Only later did Google buy out the service and integrate it with their photo-hosting service. In the mid-2000s, many geeks wanted more flexibility, like the ability to limit access to members only, integrate their own photo/video/audio collections, and – most importantly – control the appearance of their blog.

So my second blog was generated with a Web Content Management System (CMS) and self-hosted on a home Windows XP PC running the “WAMP” software stack, with a DNS record from a free dynamic DNS service. If you’re a system admin or security expert you’re probably cringing. I am too. In hindsight, it’s a miracle if that PC was not 0wned by a hacker at some point, but at least I have no evidence to believe it was. But I thought my blog was pretty cool, it had a custom look, custom domain name, its own forums, file storage, a weather widget on the sidebar. I believe it was using the Drupal CMS. The 2000s saw this rise of the “web app,” a concept that an application was something that ran in a scripting language on a web server and presented you with a web page as the user interface. As a system programmer who thinks an application is a single self-contained compiled binary, I thought this was an anathema. But the rest of the tech world decided otherwise: websites that were not database-backed and server-side-scripted were totally 90s! That meant lame. 90s wasn’t cool again yet.

The reason why the self-hosted CMS approach to blogging is cringey is that it is notoriously difficult to secure a CMS, especially one written in PHP. PHP is now known to be prone to reoccuring security issues because of flaws in its design (unvalidated input, access control problems, command injection issues, etc.), and the use of a SQL database means fighting a war agains SQL injection attacks from anyone who uses your site. Spammers will leave spam comments. You just want to run a blog, but now you’re a system admin for a web server, a database admin for a database, and you have to understand the PHP (or Java, or whatever) that generates your site on the fly every time a visitor loads a page. If you ever want to use a web hosting service for your CMS-based site instead of hosting it at home, you have to pay real money, because supporting and securing Apache, PHP, and MySQL is a full-time job! On top of all of that, all of this script and database stuff makes the site is slower to load, and prone to Denial of Service attacks.

This is no way to live. And so, as is typical, the tech community decided that what is old is new again, and that static sites were actually a good idea that should never have been abandoned. Rolling my eyes so hard I went temporarily blind, I actually resisted even caring about the cool way to blog in the 2010s. I used LiveJournal for a bit. I tried a hosted Wordpress (Wordpress.com) account to blog about game console emulators. I got into using Tumblr, even though (or maybe because) the tech community is not on there. But now I’ve decided to give a fresh look at what’s fresh, and give it a chance.

Here are some things I noticed about the current Preferred Way for Cool Kids to Blog.

If you write any kind of code for a living, you host it on a free hosting service in the .io TLD. This is just what is fashionable, and like all fashion choices, it can’t really be explained. “Everyone is doing it”, including this blog. We are not all hosting sites in the British Indian Ocean Territory, but yes, this TLD exists because the UK stole some Pacific Islanders’ land during the Cold War, and its only other claim to fame might be its black site CIA torture prison. How’s that for oblivious Silicon Valley tech privilege!
Because HTML, JS, and CSS are nearly impossible to work in directly anymore (much like assembly code), people write their web page content in a highly simplified markup language, and then run that through a compiler (oh, sorry, static site generator) to produce a web site in actual HTML, JS, and CSS. The output is then posted to a web hosting service. There are some 450 static site generators to choose from. This site uses Hugo, which I’ll talk about in a future post. An even more popular choice is Jekyll, which is fine…for me to poop on.
The simplified markup language of choice currently is Markdown, which will also be the subject of a future post because it is pretty neat.
Because supporting the ability for visitors to post comments would require a dynamic site, static sites have outsourced this responsibility to third-party services. That is, comments are implemented with an embedded JavaScript element that is loaded from a remote service. The dominant choice of service at the moment is Disqus. This and any other user-account-based service that embeds its content on your blog is a privacy problem: it means Disqus is basically assigning you an identifier and following you around to all of the Disqus-enabled sites you visit. Ghostery blocks Disqus by default, for this reason. I suggest using Twitter to reach me if you have a comment.
Because static sites cannot track how many visitors they get and where they visited from, that too has been outsourced. Google Analytics is now more prevalent than HPV and herpes combined. I have had to delete it out of every web-related code repository that I have borrowed to make anything. Even if I’m the last one on Earth who cares about privacy, I will not be including that here. The same goes for social media sharing links. You’re a big boy and/or girl, I bet you’ll figure out how to share a URL yourself!

So there you have it, my take on the Way to Blog in the 2010s for Cool Kids. Thanks for reading. – MM