Skip to main content

Convert Markdown to PDF Tutorial

Posted on September 26, 2019 by CraigParkerCraigParker

How to Convert Markdown to PDF

I wrote, back in December of 2018 (it’s September of 2019 now) over here about how I created a process for converting Markdown to PDF with Pandoc. Well, the plot thickens… What was really nice, if you read that post, was how fast we could convert now. There were some things we were willing to put up with in exchange for such a speedy (a few seconds) turnaround time.

But, there were problems. As I said in the other post, I wasn’t happy with the headings. There was an awful lot of white space between things (vertically) like code blocks and headings. And speaking of headings, LaTeX doesn’t even really have the same ones as HTML does. There is some “sort of” equivalents. Instead of <h1> and <h2>, they are section and subsection (and so on with different names down the line), and only really go as far as <h5>.

On top of that, I was trying to make a study guide for Elle Krout’s Puppet Certification class, and one of the command line boxes had a character (I think a checkmark) that just would not render properly. It was during that ruckus that I started looking for other ways of converting Markdown to PDF.

The better way

What I found was a similar process, but one I was WAY more familiar with. I figured out that we can use a CSS/HTML combination to get from Markdown to PDF, still using Pandoc. Off I went, and the results are beautiful.

We’ve got a script that does it now (one that we’ll be constantly improving), and essentially it runs a pandoc command to create an HTML file from Markdown. Then there’s a program called WeasyPrint that will get us the rest of the way. It creates a PDF based on the HTML and a stylesheet (CSS file).

This is right up my alley. I’ve been creating websites since everyone still used tables for layout (back in the 90s). The trick with this was just figuring out how to style things. And we’re done now. That means that your students get some pretty snazzy looking study guides to help with your certifications.

So how do we convert Markdown to PDF? Well, it’s a fairly simple process. We make a Markdown file, then run a script that executes a couple of commands, after prompting for some input. Here’s the script, as it sits:

The script

#!/bin/bash

# This script lists the current markdown files, prints out their names (sans .md file extensions)
# and asks you to pick one. Highlight the name, paste it into the prompt, and bam.
# Then it asks for which kind of layout you want (landscape or portrait).
# Next, the script loops, and waits for you to run it again, or kill it. It will run on the same
# markdown file, using the same layout, so you don't have to keep messing with copying and pasting commands.

echo " "

ls -a *.md | cut -d "." -f 1
read -p "Filename: " filename

PS3='Please enter your choice: '
options=("Landscape" "Portrait")
select opt in "${options[@]}"
do
    case $opt in
        "Landscape")
            while true; do
            filename='README'

            pandoc -s --template="templates/default.html" -f markdown-smart --toc -c style-landscape.css "$filename.md" -o "$filename.html"
            python3 -m weasyprint "$filename.html" "$filename.pdf"

            echo " "
            echo " "
            read -p "Press [Enter] key to make another PDF, or [Ctrl + C] to kill the script"

            done
            ;;
        "Portrait")
            while true; do
            filename='README'

            pandoc -s --template="templates/default.html" -f markdown-smart --toc -c style-portrait.css "$filename.md" -o "$filename.html"
            python3 -m weasyprint "$filename.html" "$filename.pdf"

            echo " "
            echo " "
            read -p "Press [Enter] key to make another PDF, or [Ctrl + C] to kill the script"

            done
            ;;
        *) echo "invalid option $REPLY";;
    esac

done

The comments in this Bash script explain how it works. It spits out a list of markdown files in the directory where we’re sitting, prompts you to copy and paste one, asks for a layout type, then makes appropriate HTML (with Pandoc) and PDF (using WeasyPrint).

The second part (the weasyprint command) of this is easy. But the first one (the pandoc command) involves a little more oomph. It’s looking at the custom HTML template, creating a table of contents (based on headings in the Markdown file — H2 and H3 are what show up). Then it spits out an HTML document. The WeasyPrint command just looks at the finished HTML file and CSS, makes the PDF, and drops the mic.

Rather than post the whole CSS here, I’ll just highlight some of the things that will make life easier for anyone trying to give this a whirl. There’s also a README.pdf that will explain it too, with visual examples.

The CSS template that takes you from Markdown to PDF

Fonts

Right up near the top of the page, we declare fonts. We use Noto Sans and Source Code Pro at Linux Academy, but feel free to plug in any fonts you want there. Grab the regular, bold, and italic version of each. Just put them in the same directory that we stuck ours in, and refer to them as we did.

Pages

Below that, you’ll see a @page :first. This is the cover page. We’ve declared a background image, and in the example PDF, that’s a Pinehead stuffy in the middle of the Kancamagus Highway in New Hampshire. Note the image size, 450×300. If you want a different sized image, you’ll have to dork with the margins. Play with it until it sits where you want it to, then you should probably stick with the same sized images, moving forward. If you don’t, you’ll be messing with margins every time you make a PDF.

Next in line is @page no-chapter. This is the Table of Contents page. We’ve got things set up pretty much the same as on the regular pages. But you can change them here (get rid of the logo and page number in the lower right maybe) and not affect the rest of the document.

Up next is the @page. This affects anything after the Table of Contents page(s) in the PDF. It’s pretty much the same as the TOC, but the opportunity is there to make things a little different.

Finally, there’s @page :blank. To tell you the truth, I can’t actually remember what this is for. Remember though, I was on a marathon this needs to get done FAST kind of mission, so some of it is a blur. I apologize… Suffice to say that if I can’t remember, you’re probably all set not knowing too.

If you know anything about CSS, the rest of the stylesheet should make perfect sense. One thing you may wonder about is the actual title on the title page. This is a bit of a lighter font. If you like it, do what we did: declare a lighter font-face, make sure the lighter font is in the fonts directory, then call it as we did in our CSS’s h1.title declaration. Or just leave it alone. Our designer (the stupendous and never-duplicated Ingrid van Beljon Higgins, on the Content team) specified a light font, so I went with it. You don’t need to though.

Table of Contents Problem

There was one wee little issue, with the table of contents. When a list (the H3 headings are the list items) went over a page break, the items on the first page’s part got bumped up a bit. Check out the README.pdf in the Git repo to see what I’m talking about.

The fix is to edit boxes.py. You’ll have to hunt for it, but it’s sitting in whichever directory WeasyPrint got installed into. Something like: /usr/local/lib/python3.6/dist-packages/weasyprint/formatting_structure on an Ubuntu machine, and /usr/local/lib/python3.7/site-packages/weasyprint/formatting_structure/boxes.pyhere on a Mac.

Line 324 of that file (but this may change in future versions) reads:

if (start or end) and old_style == self.style:

It is essentially saying If something is equal to something else, and we need it to say If something is NOT equal to something else. We do it by replacing one of those equals signs with an exclamation point, like this:

if (start or end) and old_style != self.style:

Rendering should work fine after this change.

The HTML

There really isn’t a whole lot else to mess with. In the templates directory, there’s a default.html file there where we customized what’s showing up. We’ve got the title, subtitle, author, email, and date, and you can see where those are showing up on the finished PDF. This is where you can edit them though if you want something different showing up on your cover page.

To summarize

This process is fairly awesome, considering where we started. After getting lots of help from the Pandoc and WeasyPrint communities, we can whip through making some pretty cool looking study guides for learners now. And we’ve realized that this can be used for things besides study guides. We all wanted to share it with anyone who needs a slick method for converting Markdown to PDF easily. It might save someone from banging their heads on a desk trying to figure it out from scratch.

Grab the template over at this Github repository, give it a whirl, and let us know how you make out with it.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *