Making this site, part 2 (Rust backend)

2023-03-13

This is Part 2 of a two-part series about building this site. Part 1 gave an introduction and described how the site is deployed.

This part describes how Markdown and HTML templates are rendered to create pages. It also describes the basic HTTP server (using axum) that powers the site.

Table of Contents

  1. Why’d you do this in Rust?
  2. Rendering pages
  3. Axum web server
  4. Workflow for writing posts
  5. Conclusion
  6. Footnotes

Why’d you do this in Rust?

Good question! Right now, I’m most comfortable and productive in Rust. I considered doing this in Python or JavaScript, but I figured that any productivity gained by using those more established ecosystems could be canceled out by my lack of familiarity. A couple of years ago, I did make a sketchy web app for my lab with Django on the backend and vanilla JS on the frontend, so I do have a rough idea of what that might’ve looked like.

A related question is why I chose axum for the backend framework. Poking around on Reddit and elsewhere, many people seemed to recommend either axum or actix-web. A while ago, I’d investigated tower::Service (a key abstraction used by axum) for another project, so I decided to just go forward with axum.

So far, I’m pretty happy with my choices. However, I’m aware that this site is extremely simple by most standards, and so it’s possible (likely?) that I’ll run into some sharp corners as I try to add functionality in the future. After finishing most of the code, I found this blog post by three Rails developers who used Rust for a hackathon project, and they brought up some rough edges (especially around integration tests with a browser and database stuff) that might be fun to try to learn more about. Another approach would be to port this site over to Ruby/JS/Python to get a better feel for what the Rust ecosystem is missing.

Rendering pages

I figured that the axum-related stuff would be pretty simple, so I started by trying to figure out how to generate HTML for the various pages. Markdown was an easy choice to write posts in, since I’ve used it a lot now in my other projects. I briefly considered writing my own parser and HTML generator, which sounded like fun, but I realized it was probably dumb to write another parser1 instead of just using a crate and moving on to more unfamiliar territory.

Markdown → HTML

I decided to use pulldown_cmark based on its popularity on crates.io. Rendering a Markdown String to HTML is really simple:

use pulldown_cmark::{html, Parser}

fn render_markdown(markdown: &str) -> String {
    let parser = Parser::new(markdown);
    let mut output = String::new();
    html::push_html(&mut output, parser);
    output 
}

My actual function for Markdown rendering is only a little more complicated. I used pulldown_cmark::Options to enable some extensions (like footnotes) that aren’t on by default in pulldown_cmark. In addition, I wanted to error on broken links in order to catch them easily in CI. Conveniently, the library allows you to provide a callback that gets invoked on every broken link through the Parser::new_with_broken_link_callback constructor. So my final version of render_markdown ends up looking something like2:

use color_eyre::eyre::{bail, Result};
use pulldown_cmark::{html, BrokenLink, Options, Parser};

fn render_markdown(markdown: &str) -> Result<String> {
    // Turn broken links into a hard error
    let mut errors = vec![];
    let mut error_on_broken_link = |link: BrokenLink| {
        errors.push(format!(
            "broken {:?} link at reference to {}",
            link.link_type, link.reference
        ));
        None
    };

    let mut output = String::new();
    let parser = Parser::new_with_broken_link_callback(
        markdown, 
        Options::all(), 
        Some(&mut error_on_broken_link)
    );
    html::push_html(&mut output, parser);

    if !errors.is_empty() {
        bail!(errors.join("\n"));
    }

    Ok(output)
}

There are a few new imports for Options and broken link handling. I’m also using color-eyre to get color coded error messages. eyre::Result<T> is just a type alias for Result<T, E> using eyre’s error type.

The error_on_broken_link closure just takes the link details from BrokenLink and formats them into an error String that’s appended to errors. By returning Some(..) instead of None from the closure, it’s possible to make the Parser replace the broken link with something else, but I’d rather have it just error so I’m forced to fix it. If errors is non-empty after parsing the Markdown, we join all the errors into one big error string and return an Err (eyre::bail! is just shorthand for doing this).

Thanks to pulldown_cmark, this one function is basically all we need to render Markdown pages (aside from post metadata like title, creation date, and tags, which we’ll discuss shortly). But the content of the blog post is only part of the HTML document that gets sent to the viewer. To generate the rest of the page, we’ll use HTML templates with tera.

HTML templates

Tera is a template engine inspired by Jinja2 and Django templates. Here’s what my basic template (base.html) looks like:

<!-- base.html -->
<!DOCTYPE html>
<html> 
<head>
  {% block head %}
  <title>
    Kyle Matsuda - {{ title }}
  </title>
  <!-- Omitted: common stylesheets, etc. -->
  {% endblock head %}
</head>
<body>
  <!-- Omitted: nav bar stuff -->
  <div class="content">
    {% block content %}
      {{ body }}
    {% endblock content %}
  </div>
  <footer>
    {% block footer %}
    {% endblock footer %}
  </footer>
</body>
</html>

(This is a little simplified from the actual template, but displays the important features.)

Here, I’m only using two basic features, variables and blocks. The syntax {{ title }} means “get the value of title from the environment and insert it here”. The blocks head, content, and footer provide places to override or extend the template via inheritance. For example, the template for blog posts (blog.html) is a child of base.html:

<!-- "blog.html" -->
{% extends "base.html" %}
{% block head %}
  {{ super() }}
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/atom-one-dark.min.css">
  <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
{% endblock head %}
{% block footer %}
  {{ super() }}
  <script>hljs.highlightAll();</script>
{% endblock footer %}

The main thing we do here is fetch additional JS and CSS for code highlighting with highlight.js, which is needed for the blog posts but not in the “Home” or “About” pages. Above, {{ super() }} just means to include all of the block’s text from the parent template.

Rendering HTML templates

Let’s take a step back and think about the data required to make a simple page from a template like base.html. We probably need at least a title and body text. In Rust code,

#[derive(serde::Serialize)]
pub struct Page {
    pub title: String,
    pub body: String,
}

Here, title will be a plain string, but body will be an HTML String emitted from render_markdown.

Since we derived Serialize, Tera provides a simple interface to feed Page data into the HTML templates. Here’s what this looks like:

use color_eyre::eyre::Result;
use tera::Context;

// Omitted: initialization of TEMPLATES

fn render_html(page: &Page, template_name: &str) -> Result<String> {
    Ok(TEMPLATES.render(
        template_name, 
        &Context::from_serialize(page)?
    )?)
}

Above, the tera::Context holds the environment for inserting variable values. Initializing Context from page brings the struct members title and body into scope for the templates, and we can now render both templates base.html and blog.html shown in the previous section.

I cut out the initialization of TEMPLATES in the previous snippet because it’s a bit ugly. Tera wants you to initialize the templates once at app startup using lazy_static! or similar. Here’s what this looks like for me:

use tera::Tera;

// Compile templates
lazy_static::lazy_static! {
    pub static ref TEMPLATES: Tera = {
        let mut tera = match Tera::new("templates/*") {
            Ok(t) => t,
            Err(e) => {
                println!("Parsing error(s): {}", e);
                ::std::process::exit(1);
            }
        };
        tera.autoescape_on(vec![]);
        tera
    };
}

This initializes tera with all the templates in my ./templates/ directory. Note that auto-escaping is turned off since page.body contains HTML.

For completeness, here is what the full code looks like to render a simple page like the “Home” or “About” page:

use color_eyre::eyre::{bail, Result};
use pulldown_cmark::{html, BrokenLink, Options, Parser};
use serde::Serialize;
use std::fs;
use std::path::Path;
use tera::{Context, Tera};

// Compile templates
lazy_static::lazy_static! {
    pub static ref TEMPLATES: Tera = {
        let mut tera = match Tera::new("templates/*") {
            Ok(t) => t,
            Err(e) => {
                println!("Parsing error(s): {}", e);
                ::std::process::exit(1);
            }
        };
        tera.autoescape_on(vec![]);
        tera
    };
}

#[derive(Debug, Serialize)]
pub struct Page {
    pub title: String,
    pub body: String,
}

impl Page {
    pub fn new(title: &str, body_file_path: impl AsRef<Path>) -> Result<Self> {
        let body_raw = fs::read_to_string(body_file_path)?;
        let body = render_markdown(&body_raw)?;
        Ok(Self {
            title: title.to_owned(),
            body,
        })
    }
}

pub fn render_html(page: &Page, template_name: &str) -> Result<String> {
    Ok(TEMPLATES.render(template_name, &Context::from_serialize(page)?)?)
}

pub fn render_markdown(markdown: &str) -> Result<String> {
    // Turn broken links into a hard error
    let mut errors = vec![];
    let mut error_on_broken_link = |link: BrokenLink| {
        errors.push(format!(
            "broken {:?} link at reference to {}",
            link.link_type, link.reference
        ));
        None
    };

    let mut output = String::new();
    let parser = Parser::new_with_broken_link_callback(
        markdown, 
        Options::all(), 
        Some(&mut error_on_broken_link)
    );
    html::push_html(&mut output, parser);

    if !errors.is_empty() {
        bail!(errors.join("\n"));
    }

    Ok(output)
}

Blog post metadata

This simple Page is fine for rendering the “Home” and “About” pages. But a blog post comes with additional information like its creation date and tags. Here’s what this looks like as Rust struct:

#[derive(Debug, Clone, Serialize)]
pub struct BlogPost {
    #[serde(flatten)]
    pub page: Page,
    pub path: String,
    pub datetime: Datetime,
    pub date_string: String,
    pub tags: Vec<String>,
    pub published: bool,
}

BlogPost contains a Page, which holds its title and body. The #[serde(flatten) attribute is needed so that we can still access {{ title }} and {{ body }} in the tera templates (instead of {{ page.title }} and {{ page.body }}).

BlogPost also has data like the post’s URL (path), its date (date_string), tags, and whether the post should be displayed (published). I decided to store this data alongside the post by putting a TOML header at the top of each Markdown file:

# making-this-site.md
---
title = 'Making this site'
path = 'making_this_site'
date = 2023-02-19
published = true 
tags = [
    "software", 
    "rust", 
    ...
]
---
# ... rest of the post

The ---’s are just an easy way to demarcate the header. I used the toml crate to parse the header:

use color_eyre::eyre::{self, eyre, Result};

const HEADER_STR: &str = "---";

fn parse_header(contents: &str) -> Result<(Header, String)> {
    let (header, markdown) = extract_header(contents)?;
    let header_data: HeaderData = toml::from_str(header)?;
    let header = header_data.try_into()?;
    Ok((header, markdown.to_owned()))
}

fn extract_header(contents: &str) -> Result<(&str, &str)> {
    let mut it = contents.splitn(3, HEADER_STR).skip(1);
    let header = it.next().ok_or_else(|| eyre!("no header data"))?;
    let markdown = it.next().ok_or_else(|| eyre!("no post data"))?;
    Ok((header, markdown))
}

This function splits out the header from the rest of the file and parses it to a Header using toml::from_str and HeaderData::try_into (equivalently Header::try_from).

Here are the definitions for the two types HeaderData and Header used above:

use color_eyre::eyre;
use serde::Deserialize;
use toml::value::{Date, Datetime, Offset, Time};

#[derive(Deserialize)]
pub struct HeaderData {
    pub title: Option<String>,
    pub path: Option<String>,
    pub date: Option<Datetime>,
    pub tags: Option<Vec<String>>,
    pub published: Option<bool>,
}

impl TryFrom<HeaderData> for Header {
    type Error = eyre::Error;

    fn try_from(value: HeaderData) -> Result<Self, Self::Error> { 
        // Omitted: some validation, 
        // error messages on missing fields
    }
}

#[derive(Debug, Clone)]
pub struct Header {
    pub title: String,
    pub path: String,
    pub datetime: (Date, Option<Time>, Option<Offset>),
    pub date_string: String,
    pub tags: Vec<String>,
    pub published: bool,
}

Having both HeaderData and Header allows a clean separation between TOML parsing and validation. First, we use toml::from_str, which returns any type that implements Deserialize, to get out a HeaderData. Since the fields of HeaderData are all optional, we only get an error at this stage if the header is actually malformed TOML. Validation of the header happens in the TryFrom<HeaderData> impl, which checks that the required fields are present.

We could almost skip the first step and just deserialize Header straight from TOML, except that toml::value::Datetime looks like this:

pub struct Datetime {
    pub date: Option<Date>,
    pub time: Option<Time>,
    pub offset: Option<Offset>,
}

Blog posts should always have at least a Date, but the date is optional in the TOML datetime type. This is why we need the second layer of validation (and also explains the slightly funky type of Header.datetime).

Yay! We now have post metadata. With a few tweaks to the blog.html template, we can now take the data from a BlogPost and show the date of the post and its tags on the rendered HTML page, just like on the page you’re viewing now. The code to do this isn’t very interesting, so let’s skip on to the HTTP server.

Axum web server

Basic setup

Getting started with Axum was really easy, especially with the excellent examples.

Here are the dependencies we’ll need at first:

# Cargo.toml

[dependencies]
axum = "0.6"
color-eyre = "0.6"
hyper = { version = "0.14", features = ["full"] }
tokio = { version = "1", features = ["full"] }
tower = "0.4"
tower-http = { version = "0.3.0", features = ["fs"] }

Here’s a simplified main.rs:

// main.rs

use axum::{
    http::StatusCode,
    response::IntoResponse,
    routing::get,
    Router,
};
use color_eyre::eyre::Result;
use std::net::SocketAddr;

#[tokio::main]
async fn main() -> Result<()> {
    // Set up colorful error messages
    color_eyre::install()?;

    // Set up routes
    let app = Router::new()
        .route("/health", get(health_check));

    // Run it!
    let addr = SocketAddr::from(([0, 0, 0, 0], 8000));
    axum::Server::bind(&addr)
        .serve(app.into_make_service())
        .await?;

    Ok(())
}

async fn health_check() -> impl IntoResponse {
    (StatusCode::OK, "OK")
}

This starts a server at localhost:8000 with a single route /health. To flesh out our server, we’ll just have to create a few more routes that render the various pages of the website. Before we do that, let’s quickly set up serving files (CSS, images, etc.) from the ./assets/ directory using ServeDir from tower-http:

// main.rs

// At top of file, add:
use axum::routing::get_service;
use tower_http::services::ServeDir;

// Change main() to:
#[tokio::main]
async fn main() -> Result<()> {
    // Directory for serving static assets
    let serve_dir = get_service(ServeDir::new("assets"))
        .handle_error(handle_not_found);

    // Set up routes
    let app = Router::new()
        .route("/health", get(health_check))
        .nest_service("/assets", serve_dir.clone())
        .fallback_service(serve_dir);

    // Same code as before to startup the server
    ...
}

// Also add this: 
async fn handle_not_found(_: std::io::Error) -> impl IntoResponse {
    (StatusCode::NOT_FOUND, "Not found")
}

One of the main selling points of axum is the ability to easily add any tower::Service in this manner.

Serving pages

To serve the home page, we add a new route:

// In main():
...

// Set up routes
let app = Router::new()
    .route("/", get(home)) // New!
    .route("/health", get(health_check))
    .nest_service("/assets", serve_dir.clone())
    .fallback_service(serve_dir);

// Start server
...

Now, we just need to define a home function to call on requests to /. We could just construct a new Page on every request, but that requires reading in home.md each time. Instead we’ll read in all of the pages and blog posts at app startup. We’ll keep them in some global state,

pub struct AppState {
    home: Page,
    about: Page,
    posts: Vec<BlogPost>,
    // Maps from url string to index in self.posts
    paths: HashMap<String, usize>, 
    // etc.
}

which we initialize once in main() and give to the Router:

// In main():

// Returns a Result<Arc<AppState>>
let app_state = AppState::initialize()?;

// Set up routes
let app = Router::new()
    .route("/", get(home)) 
    .route("/health", get(health_check))
    .nest_service("/assets", serve_dir.clone())
    .fallback_service(serve_dir)
    .with_state(app_state); // New!

...

After doing this, we can use the State extractor to get access to app_state inside of home,

use axum::extract::State;

async fn home(
    State(app_state): State<Arc<AppState>>
) -> impl IntoResponse {
    make_into_response(app_state.home.render_html(HOME_TEMPLATE))
}

Yay!

(If the parameter State(app_state) looks weird, recall that function parameters are patterns in Rust. Since State is a newtype wrapper, this is just using pattern-matching to get the inner value.)

Individual blog posts are handled in the same way. For those, we also need to make use of the Path extractor,

// In main():

// Set up routes
let app = Router::new()
    .route("/", get(home)) 
    .route("/health", get(health_check))
    .route("/blog/:name", get(blog_post)) // New!
    ...

The "/blog/:name" tells the Router that we want to match the ":name" part of the path and use it for something. Similar to home with State, blog_post can take a parameter Path(name). Then the blog_post function simply searches app_state.paths for a blog post with a path that matches name:

pub async fn blog_post(
    Path(name): Path<String>,
    State(app_state): State<Arc<AppState>>,
) -> impl IntoResponse {
    if let Some(&index) = app_state.paths.get(&name) {
        let post = &app_state.posts[index];
        if post.is_published() {
            return make_into_response(app_state.posts[index].render_html());
        }
    }
    (StatusCode::NOT_FOUND, "Not found").into_response()
}

In the future, I might want to use a database to store the posts instead of keeping them in AppState. Then we would hold something like a Mutex<DatabaseConnection> in the AppState instead of Vec<BlogPost>.

That’s basically it for serving pages. The index of the blog requires a little more code (mainly for filtering posts by tag), but nothing too interesting or different from what I’ve shown here.

Adding some middleware

One of the best things about axum is how easy it is to add middleware. I don’t know very much at all about this, but from reading fasterthanlime’s post, it seems like it might be good practice to set up a concurrency limit and load shedding. This was mainly done out of curiosity, since I really doubt anyone would take the time to attack my site.

Here’s what this looks like:

// main.rs

use axum::error_handling::HandleErrorLayer;
use std::time::Duration;
use tower::ServiceBuilder;
use tower_http::trace::TraceLayer;
...

// Inside of main():

// Set up routes
let app = Router::new()
    .route("/", get(home)) 
    // Other routes...
    ... 
    // New:
    .layer(
        ServiceBuilder::new()
            .layer(HandleErrorLayer::new(handle_layer_error))
            .layer(TraceLayer::new_for_http())
            .concurrency_limit(CONCURRENCY_LIMIT)
            .load_shed()
            .timeout(Duration::from_secs(TIMEOUT)),
    )
    .with_state(app_state);

...

As advertised, this is super easy in axum. The Router::layer call wraps the Router with any tower::Service you want. (However, if the Service’s error type is not Infallible, you’ll also need to add a HandleErrorLayer as I did here.)

Workflow for writing posts

Writing posts locally is easy thanks to cargo watch. I run cargo watch -x run to automatically restart the server every time a file in the repo is edited. For editing markdown files, this doesn’t require any recompilation so it’s very fast. I do have to refresh my browser to see the changes, but that’s about it.

Conclusion

In this post, I took a deeper dive on the process of building this site. There’s nothing too complicated here, but I found this project to be a fun and gentle introduction to backend web stuff.

Besides the project, I have a few reflections about writing this post.

First, just the act of writing this up was super helpful—I ended up making a few tweaks to the code for clarity and even found a few bugs in the process.

Second, I’m wondering if having so many code snippets makes this post a little hard to read. Probably like most people, I’d really prefer to read Rust code in an IDE with rust-analyzer running. For a future project, I’m thinking of trying to add some kind of minimal IDE functionality (like hover?) to the code snippets to at least make them a little more informative.

Footnotes

1

I’ve written a couple recursive descent parsers between working through Crafting Interpreters, my project pomelo (SML), and a few other little experiments and projects. I’m far from an expert, but I do feel like I’ve written a lot of parsers considering some of the other gaps in my software knowledge (like say, backend stuff…).

2

In the real code, I have a decent amount of logging using tracing that I’ve removed from the snippets in this post.

#async #axum #backend #rust #software