During the time before latest release of highlight.js 6.0 I decided — for the first time in more than 4 years — to actually look at other highlighting libraries. Sure I knew of their existence before but nonetheless never felt compelled to do any serious comparison because highlight.js is a fun project and I'm quite happy with the result. In fact this comparison has also been made for fun more than for anything else. I just wondered how actually good (or bad) highlight.js was looking among similar libraries.
I decided not to take into account highly subjective things like visual appeal (I'm not a good judge here), installation simplicity and documentation clarity (don't know how to measure them). Also I didn't evaluate number of supported languages. While it is a measurable quantity it doesn't mean much for an end user: if a tool doesn't support the language you need you don't care about dozens of others that it does support. Instead I concentrated on universally measurable things that make sense to everyone: size, speed and correctness.
Why "completely unfair" then, you ask? Because I knew who'd win before I even started :-).
Contenders
If you go to trouble of searching the Internet for "javascript syntax highlighter" you'll inevitably stumble upon hoards of posts all ingeniously similarly titled "N
useful/beautiful javascript tools" where N
varies from 4 to 20-something. Those were circulating the network for years but, predictably, aren't a very good source of information because they don't actually evaluate usefulness or beauty of solutions they link to.
So I've just picked up those names that I've got used to seeing around in blogs and forums where people try to find such a tool:
- SyntaxHighlighter by Alex Gorbachev, used on MDN and others.
- SHJS — a library built to be compatible with GNU source-highlight language definitions.
- Google Code Prettify — a highlighter used on Google Code and Stack Overflow.
- and finally highlight.js originally written by me, used on a popular Russian tech site Habrahabr.ru and others.
I've compiled an enterprisey-looking matrix of features supported by these libraries. It isn't intended for comparison per se because there are different use-cases and sometimes lack of features is a feature too. It's here to give you a general idea on what goal each one can serve.
highlight.js | SyntaxHighlighter | SHJS | Google Code Prettify | |
---|---|---|---|---|
User markup in code snippets | yes | no 1) | yes | yes |
Line numbers | no | yes | no | yes |
Striped background | no | yes | no | yes |
Replacing indenting TABs with spaces | yes | yes | no | no |
Language detection | yes | no | no | yes 2) |
Multi-language code | yes | yes 3) | no | yes |
Arbitrary HTML container for code | yes | no | no | no |
HTML5 compatibility 4) | yes | no | no | no |
Notes:
-
SyntaxHighlighter doesn't support arbitrary markup but has two special features that cover some use-cases: turning URLs into links and highlighting lines of code that require attention.
-
Prettify doesn't actually do any detection. Instead it employs an interesting approach of generalized highlighting that works independent of language. Though this makes it more prone to errors than the heuristic detection mechanism found in highlight.js.
-
I wasn't able to configure SyntaxHighlighter to do this but I attribute it to my lack of persistence. It works fine on the demo page.
-
Surely one couldn't expect being taken seriously these days without shoving trendy "HTML5" moniker somewhere! What it actually means here is that highlight.js automatically recognizes code snippets marked up according to HTML5 recommendation with
<pre><code class="language-something"> .. </code></pre>
.
Test case
The test page consists of code snippets using 7 popular languages: Python, Ruby, PHP, XML, HTML, CSS and Javascript. The "completely unfair" part of the article shows up here full-scale since those snippets come from highlight.js' own test suit! Anyway I think it was a good idea to use them because they were designed to be short and to exercise as many features of a language as possible. Here are four versions of the test case using highlight.js, SyntaxHighlighter, SHJS and Google Code Prettify in all their styled-by-default glory.
Size
All libraries have their way to include only required languages definitions on the page: simple linking to language files, on-demand loading, packing into a single file. Also all of them provide minified/packed production versions of files. Gzip compression wasn't used for no specific reason. The following table shows the overall size of all Javascript needed to highlight test snippets.
highlight.js | SyntaxHighlighter | SHJS | Google Code Prettify | |
---|---|---|---|---|
Size (KB) | 16.4 | 34.6 | 16.8 | 19.2 |
I didn't include CSS into calculation because it's not actually required: a site can define highlighting style within its main stylesheet.
Speed
To be honest modern browsers have made this test irrelevant. All highlighters are pretty fast to the point where highlighting is applied instantly. The only exception was SHJS that was configured to load language files on-demand which led in a couple of test runs to raw un-highlighted code being visible for a split-second. It doesn't tell anything bad about the speed of SHJS itself but rather shows that on-demand loading was a bad idea for the task.
I've measured the speed of highlighting using Firebug. It wasn't as straight-forward as counting size because there are more things to take into account here. After some tinkering I've decided on the following method:
- To represent the most common real-world case all files are loaded from cache but the browser still performs DNS lookups and establishes TCP connections for each file.
- Total load time is defined by
DOMContentLoaded
event for highlight.js and byonload
event for the rest. This may seem unfair but I just did what libraries suggest in their docs. - The time of highlighting itself is measured with Firebug's profiler. Since profiling affects performance this time cannot be simply added to the load time and should be considered separately.
highlight.js | SyntaxHighlighter | SHJS | Google Code Prettify | |
---|---|---|---|---|
Load time (msecs) | 870 | 1394 | 1008 | 1007 |
Highlighting time (msecs) | 55 | 67 | 54 | 72 |
Richness and correctness
Here is where things get interesting. Size and speed turned out not to affect user experience significantly but the difference in richness and correctness is plainly visible. There won't be any numbers though, just some notes.
I should note that the notion of "correctness" differs from library to library. While there are plain bugs there are also missing features that could be left out deliberately. Here I tried to adhere to my personal views on the subject and you may well be in disagreement with me. That's fine!
SyntaxHighlighter doesn't produce very rich highlighting to begin with. No Python decorators, no Javascript regexps, no CSS @-rules etc… Also it seems to being downright unable to highlight things that require more sophisticated parsing than a regular grammar, like names in function and class definitions. This is not bad by itself. The result still looks useful and leaves fewer places to screw up :-). But there are some issues with correctness anyway:
- no multi-line strings in PHP
- value-less attributes in HTML tags aren't recognized
- within CSS @-rules seemingly random words are recognized as "values" (whatever it could mean)
Not much is highlighted in Javascript.
SHJS was looking promising since it uses language definitions from the GNU source-highlight project and I thought those guys would do their job rather meticulously. But in practice it mishandled highlighting the most of all others:
- names of old-style classes in Python aren't highlighted (those in new-style classes do)
- class inheritance in Ruby badly breaks the whole line
#{}
constructs in Ruby strings aren't recognized- PHP
throw
keyword is not highlighted - tags are highlighted inside CDATA-escaped sections in XML
- unquoted attribute values in HTML tags aren't recognized
- @-rules in CSS break the whole highlighting flow
- "$" isn't considered part of identifiers in Javascript
Class inheritance (A < B
) in Ruby breaks the whole line.
Google Code Prettify works very well both in terms of richness and correctness. It can highlight CSS and Javascript within HTML, recognizes Python decorators, Javascript regexps. Speaking of the latter, it was Prettify where I borrowed ideas on how to implement those in highlight.js.
I've found very few issues with it:
- tags highlighted inside CDATA-escaped sections in XML
@font-face
in CSS is not recognized as @-rule- Ruby highlighting is also simplistic but doesn't cause such severe problems as in SHJS
That <not>
inside CDATA shouldn't be highlighted as tag.
As for highlight.js, it's pushed down to the end of the comparison for a reason :-). Obviously there won't be any correctness issues since I used code snippets from its own test suit which it successfully passes. Of course it doesn't in any way mean it's bug-free. But where the library really stands out is highlighting richness. It just knows much more about languages than others. Here are just those features visible only in this very test case that are unique to highlight.js:
- raw Python strings
- Ruby inheritance,
#{}
things, quoted symbols, symbolic function names etc. - yardoc in Ruby comments
- phpdoc in PHP comments
- classes, ids, tags and attributes in CSS selectors
Some of the recognized features (like variables in PHP) are deliberately not styled to maintain visual sanity. Most of these features (and those in other languages) are the result of elaborate effort of many highlight.js contributors in defining most intricate parsing rules (just look at Perl definition for example).
HTML with emedded Javascript and CSS. All sorts of ways to define tag attributes are supported.
Completely balanced conclusion
If you need a solid syntax highlighter (and don't care about line numbers or striped backgrounds) use highlight.js. It is small, fast, rich and correct!
And if you don't like something about it — contribute!
Comments: 18
Поздравляю с новым релизом! =)
Okay, I took a look at your source code, and it looks like a few bits of CSS would give you line numbers and alternating row colors:
This is off the top of my head, but that should produce a different set of line numbers per
block with alternating colors and only number every 5th line.
I'd be interested in hearing your opinion on Sunlight: http://sunlightjs.com/demo.html.
disclaimer: I'm the author.
And if you think it sucks, I wouldn't mind hearing some bug reports, too :).
Google Code Prettify allows one to specify a language and the language extensions map to file extensions. So if you know the filename that you're highlighting, you know which language extension to load and which to apply to a given code block. I think it also support the HTML5 idiom that you use as your definition for HTML 5 compatibility as long as the code element also has the class prettyprint.
David:
David, your solution won't work because
<span>
s don't corresponds to lines. I tried to do this some time ago with CSS counters but it gets harder than it seems pretty quickly.tmont:
It looks pretty interesting! I'll try to look at it in detail after my vacation. Thanks!
mikesamuel@gmail.com:
Mike, I know, I used it for CSS in the test. It's pretty obvious that no heuristic can be 100% reliable so you have to have some way to explicitly specify the language.
You can look at it that way. But my goal with highlight.js was more practical: I wanted code marked-up according to the spec to be highlighted without additional hassle. And even more practical need was to highlight code in just
<pre><code>..</code></pre>
without classes because it's what Markdown does for code fragments.An interesting CSS trick for zebra-striping would be to count the number of matches for newline and then create a background gradient with the appropriate color stops. Example below for 5 lines of code.
.someCodeBlock { background-image: linear-gradient(top, #000, #000 20%, #fff 20%, #fff 40%, #000 40%, #000 60%, #fff 60%, #fff 80%, #000 80%, #000); }
Should be easy enough to build these gradients up programmatically. Might have to do some adjustments for padding, etc.
Oh crap! Wrapping! hm... the plot thickens.
Nevermind, code blocks don't wrap and none of these seem to have wrapping as a feature :) So my suggestion could still work.
Thank you, I've been looking for a syntax highlighter and this looks quite interesting.
@ActionJake: You just need 2 color stops at 1.2em (assuming your line-height is 1.2) and background-size to set the size of the gradient to 2.4em. Then it's repeated and matches every line.
I've used this technique in my blog: http://leaverou.me/
Considering how much smaller highlight.js is compared to SyntaxHighlighter, it might be worth switching to.
Also, +1 to Lea :)
SyntaxHighlighter 3x seems to have improved significantly since this analysis was done.
You can easily select-all : copy the code snippet. pretify XML easily.. highlight lines of codes etc.. so found it much for user-friendly !
I am a Syntax Highlighter user and it badly needs an update. The last stable version 3.0.83 was released on July, 2010. Thanks to your review, I find it very plausible to migrate to highlight.js now.