Fresh from the oven, highlight.js now has a pretty cool feature that, to the best of my knowledge, is not supported by any other syntax highlighter. Namely, we can now recognize and highlight HTTP request headers and its body if it happens to be code in a language we know. This is intended for all sorts of API docs that often present the entire HTTP payload transferring some kind of JSON or XML.

Story

The feature was born out of a conversation with a user asking a very strange question: "How to disable highlighting for only a certain part of the code snippet." I couldn't even imagine why anyone would want to have more than one language in one code snippet until he provided a simple example with an HTTP prologue and a chunk of JSON payload. The actual problem was that highlight.js simply didn't know JSON at that time. But it was also obvious that even if it knew the language of the body the headers could completely break language detection or just pick up random part of body highlighting like incidentally matching keywords, numbers etc.

So I answered the user with apologies that we can't help him right now but we might look into this problem at some point in the future. It turned out "some point in the future" came later that evening when I realized that we already have two key ingredients to solve it: highlighting nested languages (used for JavaScript in HTML for example) and language detection. It was just the matter of putting them together.

Outcome

We now have the language "HTTP" that knows how to highlight request lines with a query string inside it, status lines with a numeric code, headers and their values.

We also have a strictly defined "JSON" language that knows pretty much all of JSON. Many thanks to Douglas Crockford for making it so limited and simple to parse. The strict definition makes auto-detection very reliable.

Both languages are now in the so-called "common" set which means they will be available in the CDN-hosted version by default in the next release.

Problems

Since no heuristics is completely reliable it would be nice to have some way to specify the sub-language inside a snippet in the same way as it now possible for the whole snippet. The hard part is to invent a way that doesn't suck :-). If you have any ideas — please share!

The other problem is obviously that the code is still very fresh and inevitably contains bugs. So get the source, build it, test it and let us know. Thank you!

I'm not a native English speaker and I'm trying to improve my language skills. Please feel free to use comments to correct any grammatical and spelling errors!

Comments: 9 (feed)

  1. peter.nguyen1802@gmail.com

    Kudos to you Ivan! For both this new feature and for listening for other user's requests

  2. Olexandr Shalakhin

    Thank you for your work! Highlight.js rocks!

  3. Konstantine Rybnikov

    I would rather call it CURL, but that's just me.

  4. I would rather call it CURL, but that's just me.

    That would say nothing to most Windows users.

    Anyway, it's a bit odd to use the name of a tool to describe a language. We don't call JavaScript "V8" do we?

  5. http://clickpass.com/public/ash

    How about using Content-Type to find out sub-language in this case?

  6. Apart from the header being possibly absent or arbitrary, the main problem is that highlight.js doesn't have a storage to keep this sort of information to use later in the parsing process. (I wish I knew how this sort of parsers are called properly.)

  7. Alex

    Seems that JSON is not included in the custom package download page!

  8. Yes, the new version hasn't been released yet. This post is about the development of the new feature, not a release announcement.

  9. The new unique feature (apparently) among syntax highlighters is highlighting HTTP headers and an arbitrary language in the request body. The most useful languages here are XML and JSON both of which highlight.js does support. Here’s the detailed post about the feature.

Add comment

Text delimited with a blank line becomes paragraphs, quoting is done with > on the left, list consists of items with a minus on the left, italic is marked with * from both sides, bold -- with **, code blocks are indented with 4 spaces