Almost exactly ten years ago on August 14 I wrote on this very blog (albeit in a different language):
So on yesterday's night I got worked up and decided to try and write [it]. But on a condition of not dragging it on for many days if it didn't work out on the first take, I've got enough on my mind as it is.
It did work out. Which makes August 14 the official birthday of highlight.js! Although it wasn't until 5 days later when the first meaningful commit was recorded. Using any form of source control was only an afterthought for me back then :-)
With the obligatory self-congratulatory stuff out of the way, let me now get to the main purpose of this anniversary post: explaining what makes highlight.js different among other highlighters. I'm not going to talk about obvious features listed on the front page of highlightjs.org. I'll try to document the philosophy that up until this point I was only referring to in various places, but never was able to put together.
I'll try to keep it short (otherwise I'll never finish this post!)
It is my deep conviction that highlighting should make code more readable instead of simply making it… fun, for the lack of better word.
Let me explain by example. Here's some things that serve towards better readability when highlighted:
Keywords, because they define the overall structure of the code and because they need prominent highlighting simply because they otherwise look too much like user variables.
Function and class titles at the place of declaration, because they effectively define a domain-specific language, an API. They have a very distinct semantics.
Built-ins and special literals, because it helps to know what in the code belongs to the language and what is defined by the user.
And these are the things highlighting which makes no sense, in my humblest opinion:
CamelCase identifiers, because it's not consistent: you get identifiers of the same nature either highlighted or not simply because they happen to be named differently.
.method() calls, because I, frankly, can't even invent a plausible reason of why they should be highlighted in any way.
Punctuation, because it significantly increases the amount of color clutter in any given snippet which makes it hard on the eyes.
I have a hypothesis that the only reason why these things get highlighted traditionally is simply due to the fact that they could easily be picked up by a regexp :-)
In highlight.js we sometimes go to great lengths to highlight what makes sense instead of what's easy ("semantics highlighting?"). In lisps we highlight the first thing in parentheses, regardless of it being or not being built-in, and we have special rules to not highlight them in quoted lists and even in argument lists in lambdas in Scheme. In VimScript we try our best to distinguish between strings and line comments even though they seem to be deliberately designed to trip up parsers. And we recognize quite a few ways of spelling out attributes in HTML.
The downside of this is that highlight.js is heavier and probably slower than it could've been. These were the reasons why we recently lost a bid on replacing the incumbent highlighting library on Stack Overflow. I still think they made a mistake :-)
Because quality beats lightness!
Of course no code base is ideal, especially a 10 year old one, there's always so much to do! However, since our way of dealing with the stress of Open Source maintenance is to not have it happening to us, the development of highlight.js goes at a rather leisurely pace. Which means we've accumulated quite a few plans without any reasonable expectation of when they might happen.
There's a new exciting parser in the making. We'd like to do an overhaul of our build system and packaging. There are plans to have pluggable renderers in addition to HTML.
You could be the one taking one of those over and covering yourself with great glory! If interested, drop me a line at email@example.com.