Hakyll (Optional) PDFs

Written on April 23, 2018
Tags: coding, hakyll, blog

Disclaimer: This post assumes that you know Haskell, and is probably most understandable (and useful) if you have a little bit of experience with Hakyll. If you don’t, go check them out! Haskell is a great programming language, and Hakyll is a great way to make a website.

The Problem

In setting up this blog, I wound up reformatting some old articles I had written so that they’d look nice on the web. I think the original PDFs still look better in a lot of ways, so I wanted to give people access to them.

I’m using Hakyll to generate this website, so the challenge was to have the PDF links show up when there was an associated PDF, and otherwise have nothing show up. This wound up being more difficult than I thought it would be, so I thought I should document the solution in case it’s useful to someone else. (And, honestly, for when I forget what I’ve done.)

The first part is just going to be background for people unfamiliar with Hakyll, so skip that if you already know Hakyll and just want to know how to add optional PDFs.

Hakyll Background

As I write this the file hierarchy for the blog looks like this:

blog
├── drafts
│   └── hakyll-pdfs.md
├── ideas.md
├── index.md
└── posts
    ├── educating-teams.md
    ├── educating-teams.pdf
    ├── hiring-teams.md
    ├── hiring-teams.pdf
    ├── jealousy.md
    ├── straight-conversion.md
    └── straight-conversion.pdf

As you can see, there is a hiring-teams.pdf, but no jealousy.pdf, and we want this to be dealt with cleanly.

Before doing anything with PDFs, this is what the relevant part of my Hakyll configuration looked like:

main = hakyll $ do
    match "blog/posts/*" $ do
        route $ cleanRoute
        compile $ pandocCompilerWith pandocReadOpts pandocWriteOpts
            >>= loadAndApplyTemplate "templates/post.html"    postCtx
            >>= loadAndApplyTemplate "templates/default.html" postCtx
            >>= relativizeUrls
            >>= cleanIndexUrls

postCtx :: Context String
postCtx =
    dateField "date" "%B %e, %Y" <>
    defaultContext

The first chunk tells Hakyll to use pandoc to compile the original markdown files into html, apply the relevant templates, and do some cleaning up.

There is some extra stuff going on offscreen in cleanRoute so that the post blog/posts/some-post.md gets copied to blog/posts/some-post/index.html rather than to blog/posts/some-post.html. This is so that you can access it with the simpler url blog/posts/some-post.

postCtx tells Hakyll what information we’re going to need for the files. It gives us some standard data, plus the date the post was written. Let’s see how that’s used, in templates/post.html:

<div class="info">
    Written on $date$
    $if(author)$
        by $author$
    $endif$
</div>

$body$

At the top of the page, there’s a bit saying when the post was written; this is where that comes from, so that I don’t have to manually put it into each post. The fields inside dollar signs are substituted by the Hakyll system.

The first thing we need to do is to make sure the PDFs end up in the final site:

main = hakyll $ do
    match "blog/posts/*.pdf" $ do
      route extrasRoute
      compile copyFileCompiler

    match "blog/posts/*.md" $ do
        route $ cleanRoute
        compile $ pandocCompilerWith pandocReadOpts pandocWriteOpts
            >>= loadAndApplyTemplate "templates/post.html"    postCtx
            >>= loadAndApplyTemplate "templates/default.html" postCtx
            >>= relativizeUrls
            >>= cleanIndexUrls

This tells Hakyll to copy the PDFs over verbatim.

There’s more extra stuff in extrasRoute to get blog/posts/some-post.pdf to go to blog/posts/some-post/some-post.pdf.

Note that we need to add .md to the second pattern to make sure it doesn’t match the PDFs.

The Tricky Part

The next step is to get the information about the PDF to the page as a field we can use as $pdf$ to tell us where the PDF is (if it’s there at all).

After a whole bunch of looking through the documentation and trying things that didn’t work, this is what I came up with:

postCtx :: Context String
postCtx =
    field' "pdf" (\item -> do
        let fp = toFilePath $ itemIdentifier item
        let pdfName = ((dropExtensions . normalise) fp <.> "pdf")
        pdf <- loadAll $ fromGlob pdfName
        return $ 
            ListField (urlField "url" :: Context CopyFile) pdf) <>
    dateField "date" "%B %e, %Y" <>
    defaultContext

So what’s going on here? field' is a function that I discovered in the internal code of Hakyll, that you don’t actually have access to by default. I had to copy it’s definition into the file:

field' :: String -> (Item a -> Compiler ContextField) -> Context a
field' key value = Context $ \k _ i -> if k == key then value i else empty

You can completely ignore it’s definition though (I did). The main thing is what it does: It lets you have access to the post in question, and the state of the site compiler. The state of the compiler tells us which files have been loaded and to where. Given that information you have to produce a piece of data to be substituted in for the field. (In our case $pdf$.)

So that’s what we do. We extract the file path of our post into fp. We then turn it into the path for the PDF in pdfName. Then we ask the compiler to give us all the PDFs it’s loaded that have that name. In this case there will always be either one or zero. We return that as a list field where each PDF knows it’s URL.
Don’t even ask why we have to explicitly mark the type as Context CopyFile. It has to do with some type system magic I don’t fully understand.

Now we can go back to our template and set it up as follows:

<div class="info">
    Written on $date$
    $if(author)$
        by $author$
    $endif$
    <br />
    $for(pdf)$
        Click <a href=$url$>here</a> for a nicely formatted PDF version of this article
    $endfor$
</div>

$body$

We loop through the PDFs (remember it’s just zero or one of them), and for each one we add a link to it’s URL.


Site proudly generated by Hakyll