The article is the publishing script?

Table of Contents

I used to have a pretty standard Jekyll theme as this web page. I'd hacked on it a bit and added support for Org Mode being the Emacs user I am, but after getting it up and running I promptly lost interest. Will the same thing happen to this page? Yeah probably. Either way, lectures just ended for another semester and I got a spur of motivation to get something going again. Specifically to try out Pollen. Which describes itself as

[…] a publishing system that helps authors make functional and beautiful digital books.

At the core of Pollen is an argument:

— Matthew Butterick, Pollen: the book is a program

While it clearly states that it's for digital books, it targets HTML and looked pretty cool so I gave it a shot. If you've seen the bottom of this page you may notice it was created with org-mode, so clearly, the pollen attempt didn't make it. While it's a very cool system, I had issues with the export behaving oddly, paragraph tags were showing up where they shouldn't have and it became difficult to fine-tune the export, despite having immense control of the output with the pollen markup format. As a templating system, the pollen preprocessor could be quite nice. I can see being able to inject arbitrary Racket code into anything text-based as something somebody needs for some reason.

1. So what's so special about this?

Well not much. I've used a lot of org-mode in the past and even attempted to do this once before. It's quite a nice markup language, I've personally used it for organising notes, managing TODO lists, planning projects, writing assignments, scheduling, agenda views, tagging, and exporting to various formats. It also supports literate programming, tables, and extensive customisation with Emacs Lisp. The literate programming has been quite useful for assignments that require lots of in-document code and outputs.

One thing that org-mode has that I think markdown and other variants lack is the extensibility. Don't get me wrong, there are a ton of tools that can process markdown and spit out something useful that is plenty configurable. But org-mode is first and foremost a part of Emacs. That means it comes with Emac's extensibility, which, in my opinion, is lightyears beyond anything else. And something that I'll talk more about later on this page.

2. org-export and org-publish

My lecturers certainly aren't interested in reading my assignments in a plain-text markup language almost exclusively supported by a text editor first released in 1985. Fortunately, org-mode has excellent support exporting documents to other formats including HTML, and \(\LaTeX\). On top of this org-publish adds support for processes a set of files, potentiality exporting, and publishing them. These systems are highly configurable and featureful.

All pages here (including the mono-page) are generated from a set of .org files, published via the org-html-publish-to-html function, smeared with a little CSS that isn't mine, then thrown up on a page. All in all it was about an evening's work to learn how the org-publish system worked, experiment, and polish to just what I had in mind.

3. What about the so-called publishing script being an article?

Yeah, so this article isn't really the publishing script. Specifically the file is tangled to publish.el, which is then executed. Tangling is the process of extracting source code from a document expanding, merging, and transforming it. Org then recomposes them into one or more separate files.

But it didn't use to be like that. To bootstrap the page I wrote a bog-standard .el file that could be loaded and executed from the command line to publish the page during testing,

emacs -Q --batch -l publish.el --funcall publish

During the writing of this article, I'm moving everything into here. Then I'll be able to

emacs -Q --batch -l org -f org-babel-tangle -l publish.el --eval '(org-publish "site" t)'

which isn't much of a change to be fair, but it's the principle that matters.

4. The meat of it

4.1. Setup, packages, and variables

First, we'll do some house keeping, I don't want these packages in my main init.el, so we'll move the package directory and require the built-in things I'll need.

(defvar base-directory (expand-file-name "~/Projects/"))
(defvar public-directory (file-name-concat base-directory "docs"))

(mkdir public-directory t)

(require 'package)
(setq package-user-dir (file-name-concat base-directory ".packages"))

(add-to-list 'package-archives '("melpa" . ""))
(add-to-list 'package-archives '("melpa-stable" . ""))

(unless package-archive-contents

(unless (package-installed-p 'use-package)
  (package-install 'use-package))

(require 'use-package)
(require 'ox-publish)
(require 'cl-lib)

(use-package htmlize
  :ensure t)

I use a few custom variables as well. inline-html-publish-index-subpages is a bit interesting, because I wanted to generate a mono-page as well as the individual pages I needed to have a custom publishing function, and I need to store the subpages for addition to the mono-page. The -marker variables are used to mark where some of the table of contents and subpages should be inserted. It's a bit of a hack I know I know but it works just fine.

(defvar inline-html-publish-index-subpages nil)
(defvar inline-html-publish-toc-marker "<!-- Inline html toc marker -->")
(defvar inline-html-publish-subpage-marker "<!-- Inline html subpage marker -->")
(defvar inline-html-publish-page-format "<h1 class='subpage-title' id='%s'><a href='%s'>%s</a></h1>")

I also need to configure some generic org-html variables.

(setq org-html-head-include-default-style t
      org-html-preamble t
      org-html-htmlize-output-type 'css
      org-html-postamble t
      org-html-head (concat
                     "<link rel='stylesheet' type='text/css' href=''/>"
                     "<link rel='stylesheet' type='text/css' href='org-htmlize-style.css'/>"
                     "<link rel='stylesheet' type='text/css' href='style.css'/>"))

The real meat of the org-publish system is the org-publish-project-alist variable, it defines what files should be published and how. This is a pretty boring one, we only have

  • "contens": recursively publish any file ending in .org (except in the base-directory to the public-directory with the org-html-publish-to-html function.
  • "static": just copies some common resources to the public-directory.
  • "index": is a little more interesting. It's very similar to "contents" but use my inline-html-publish-index function to publish the file.
(setq org-publish-project-alist
         :base-directory ,base-directory
         :base-extension "org"
         :exclude ""
         :publishing-directory ,public-directory
         :recursive t
         :publishing-function org-html-publish-to-html
         :html-postamble ,(concat
                           "<p class=\"author\">%a</p>"
                           "<p class=\"author\">Date: %T</p>"
                           "<p class=\"author\">%c</p>")
         :headline-levels 4
         :auto-preamble t)
         :base-directory ,base-directory
         :base-extension "css\\|js\\|png\\|jpg\\|gif\\|pdf\\|mp3\\|ogg\\|swf"
         :include ("CNAME")
         :exclude "docs"
         :publishing-directory ,public-directory
         :recursive t
         :publishing-function org-publish-attachment)
         :inline-components ("contents")
         :inline-plist (:body-only t :publishing-function inline-html-publish)
         :publishing-function inline-html-publish-index
         :base-directory ,base-directory
         :base-extension "org"
         :exclude ".*"
         :include ("")
         :publishing-directory ,public-directory
         :recursive t
         :headline-levels 4
         :auto-preamble t)
        ("site" :components ("index" "contents" "static"))))

inline-components and inline-plist are my own properties, they're used in inline-html-publish-index to configure the mono-page subpage exports.

4.2. Functions

I wrote a small helper macro to make writing some HTML tags easier. It's not particularly fancy but it does the job.

(defmacro with-tag (tag attribute-alist &rest body)
  "Insert <TAG `attributes'>, execute BODY, then insert </TAG>.
ATTRIBUTE-ALIST's key-value pairs are converted into HTML attributes."
  (declare (indent 1) (debug t))
  (let ((attribute-list (gensym "attribute-list"))
        (attributes (gensym "attributes")))
    `(let* ((,attribute-list (cl-loop for (prob . val) in ,attribute-alist collect
                                      (concat prob "='" val "'")))
            (,attributes (string-join ,attribute-list " ")))
       (insert (concat "<" ,tag " " ,attributes ">"))
       (insert (concat "</" ,tag ">")))))

The previously mentioned inline-html-publish function works by getting the to-be-published file and exporting it as HTML into a temp buffer, adding some HTML to the top via a format string, then plopping that plus some metadata into the inline-html-publish-index-subpages variable.

(defun inline-html-publish (plist filename pub-dir)
  "Publish FILENAME into `inline-html-publish-index-subpages'.
Stored as (timestamp output-filename title html-string).  Update the
`:inline-components' in PLIST with `:inline-plist'.  PUB-DIR is
  (let* ((org-inhibit-startup t)
         (visiting (find-buffer-visiting filename))
         (work-buffer (or visiting (find-file-noselect filename)))
         (output-filename (org-publish-file-relative-name
                           (file-name-with-extension filename ".html")
         (temp-buffer (generate-new-buffer " *temp*" t))
         (title nil))
    (with-current-buffer work-buffer
      (setq title (org-get-title))
      (org-export-to-buffer 'html temp-buffer nil nil nil t))

    (with-current-buffer temp-buffer
      (when title
        (insert (format inline-html-publish-page-format
                        (file-name-sans-extension output-filename)
      (push `(,(org-publish-cache-mtime-of-src filename)

    (kill-buffer temp-buffer)
    (unless visiting (kill-buffer work-buffer))))

To the "publish" the file it uses my inline-html-publish-index function. It first merges the properties for the components from :inline-components with :inline-plist and does some other setup-related tasks. Then publishes the file to the html-buffer buffer, followed up publishing the :inline-components with the now updated project alist.

(defun inline-html-publish-index (plist filename pub-dir)
  "Build a mono-page from FILENAME and `inline-html-publish-index-subpages'.
Merges the `:inline-plist' with the plist of `:inline-components' in PLIST"
  (let* ((org-inhibit-startup t)
         (org-html-preamble-format `(("en" "<h1 class='author'>%a</h1>")))
         (org-html-postamble-format `(("en" ,(concat
                                              "<p class=\"author\">Date: %T</p>"
                                              "<p class=\"author\">%c</p>"))))
         (visiting (find-buffer-visiting filename))
         (work-buffer (or visiting (find-file-noselect filename)))
         (output-filename (file-name-concat
                            (org-publish-file-relative-name filename plist)
         (html-buffer (find-file-noselect output-filename nil t))
         (inline-plist (plist-get plist :inline-plist))
         (inline-projects (cl-loop
                           for component in (plist-get plist :inline-components)
                           (let* ((project (assoc component org-publish-project-alist))
                                  (project-name (car project))
                                  (project-plist (cdr project)))
                             (cons project-name (append inline-plist project-plist))))))
    (with-current-buffer work-buffer
      (org-export-to-buffer 'html html-buffer nil nil nil nil))

    (org-publish-projects inline-projects)

    (setq inline-html-publish-index-subpages
          (mapcar #'cdr
                  (sort inline-html-publish-index-subpages
                        #'(lambda (x y) (not (time-less-p (car x) (car y)))))))

    (with-current-buffer html-buffer
      (goto-char (point-min))
      (search-forward inline-html-publish-toc-marker)
      (with-tag "ul" nil
                (cl-loop for (html-filename title html) in inline-html-publish-index-subpages do
                         (with-tag "li" nil
                                   (with-tag "a" `(("href" . ,(concat "#" (file-name-sans-extension html-filename))))
                                             (insert title)))))

      (goto-char (point-min))
      (search-forward inline-html-publish-subpage-marker)
      (cl-loop for (html-filename title html) in inline-html-publish-index-subpages do
               (with-tag "div" '(("class" . "subpage"))
                         (insert html)))

    (kill-buffer html-buffer)
    (unless visiting (kill-buffer work-buffer))))

The :inline-components uses the inline-html-publish function which populates the inline-html-publish-index-subpages variable. That's sorted and the timestamps tripped out. It then finds the markers within the html buffer and inserts the relevant content.

And that's it.

4.3. Wait wait where did those markers come from?

They're just embedded as exported HTML blocks in the file. Here's what that looks like

#+OPTIONS: title:nil
#+title:Jake Moss --- Mono-page

#+begin_export html
<div class='centred-container'>
<div class='abstract'>
  <h2>About me</h2>
  I'm a Computer Science Honours student at the University of Queensland interested in high performance computing, lisps, and functional programming. I mainly write Cython in Emacs for my day job and am working computer algebra systems for my thesis. 

This is a mono-page of everything here (although not much). I prefer being able to scroll and search through web pages in their entirety.

Here's a list of things on this page
#+begin_src emacs-lisp :exports results :results value html
  (string-join `("<div>" ,inline-html-publish-toc-marker "</div>") "\n")

#+begin_export html
<!-- Inline html toc marker -->

Pages can be narrowed to by clicking on their respective titles.

#+begin_src emacs-lisp :exports results :results value html
  (string-join `("<div>" ,inline-html-publish-subpage-marker "</div>") "\n")

#+begin_export html
<!-- Inline html subpage marker -->

Some things in =org-publish= are just hard coded and you cannot configure them, one such thing is the detection of $\LaTeX$ fragments to determine if the =mathjax= scripts should be included. Because I'm inserting the subpages myself the =mathjax= scripts in them don't apply. So there's one down here to make sure they work on the mono-page.

Jake Moss

Date: 2024-05-29 Wed 10:39

Emacs 29.3 (Org mode 9.6.15)