Маниакальный веблог » Мои программыhttps://softwaremaniacs.org/blog/category/mysoft/2023-12-24T15:56:03.986303-08:00ManiacИван Сагалаев о программировании и веб-разработкеhttp://softwaremaniacs.org/media/sm_org/style/photo.jpgDebounce
2023-12-24T15:56:03.986303-08:00https://softwaremaniacs.org/blog/2021/10/25/debounce/When last time I lamented the unoriginality of my tool nfp I also entertained the idea of salvaging some value from it by extracting the event debouncing part into a stand-alone tool. So that's what I did. Meet debounce, a Rust library and a prototype command-line tool. Rust In all ...
<p>When last time I <a href="https://softwaremaniacs.org/blog/2021/10/11/nfp/en/">lamented the unoriginality</a> of my tool <code>nfp</code> I also entertained the idea of salvaging some value from it by extracting the event debouncing part into a stand-alone tool. So that's what I did.</p>
<p>Meet <a href="https://crates.io/crates/debounce">debounce</a>, a Rust library and a prototype command-line tool.</p>
<p><a name=more></a></p>
<h2>Rust</h2>
<p>In all honesty, I didn't really <em>need</em> <code>debounce</code> as a library. <code>nfp</code> was already working fine. But it felt like the Right Thing™ to do, and gave me an excellent opportunity to play with Rust's synchronization primitives.</p>
<h3>Blocking and waiting</h3>
<p>For a clean solution (that is, one without busy polling I employed before) I needed two threads: the main one to block and wait forever for external events, and a worker to wait out timeouts and perform specified actions. The worker would also have a mode where there are no events and it should block and wait for the main thread to supply one.</p>
<p>This sounds like the job for a conditional variable, and I had hoped Rust would have some idiomatic higher-level wrapper around them. Turned out, Rust had three :-)</p>
<ul>
<li><a href="https://doc.rust-lang.org/std/sync/struct.Condvar.html">Condvar</a>, which is exactly the Rusty wrapper around the idea of a conditional var</li>
<li><a href="https://doc.rust-lang.org/std/sync/mpsc/fn.channel.html">channel</a>, a higher-level interface for consumers to wait on data supplied by producers (which I suspect is built on top of <code>Condvar</code>)</li>
<li><a href="https://doc.rust-lang.org/std/thread/fn.park.html">parking</a>, a built-in lightweight ability for a thread to suspend ("park") until the other thread wakes it up.</li>
</ul>
<p>Somehow it's very on-brand for a language that gives you 5 kinds of pointers and 4 kinds of strings :-) But this is also what makes it fun! Anyway, I ended up using parking, as it didn't need any extra code and worked well for my use case where I don't mind the worker thread being occasionally randomly woken up out of turn.</p>
<h3>Traits intricacies</h3>
<p>Another purely Rustian puzzle I stumbled upon had to do with polymorphism. I have two kinds of event buffer types whith identical interface for getting values out of them. In Rust you express this with a <em>trait</em> which concrete types then implement in their own way:</p>
<pre><code>pub trait Get<T> {
fn get(&mut self) -> State<T>;
}
</code></pre>
<p><code>T</code> is the type of data stored in the buffer. </p>
<p class=note><small>Here seasonal Rustians are probably already asking their screens something along the lines of "wait, if your return type strictly depends on what's in the buffer, it doesn't really make sense for it to be a parameter of the trait…" And they are totally correct, but I didn't know that at that point.</small></p>
<p>So far so good. Then I thought that, having a <code>.get()</code>, it should be pretty natural for the buffer to implement a standard <code>Iterator</code> that would call <code>.get()</code> as long as there are items in the buffer in the ready state.</p>
<p>So I wrote the obvious:</p>
<pre><code>impl<T, B> Iterator for B
where
B: Get<T>,
{
fn next(&mut self) -> Option<T> {
todo!();
}
}
</code></pre>
<p>Which says "this is an implementation of the standard <code>Iterator</code> trait for any type <code>B</code> which implements <code>Get</code>".</p>
<p>This however produced a compiler error which proved too intricate for me to understand. So, long story short, I <a href="https://users.rust-lang.org/t/blanket-implementation-of-iterator-for-a-generic-trait/66188">went on Rust user forum</a> where nice people imparted me over a couple of days with deep knowledge about traits, blanket implementations and associated types (which I think I finally get). Now my buffers are <a href="https://nest.pijul.com/isagalaev/debounce/changes/AWW3HMEVLO6Y3I5WC6GU2KAC5CPKRXBAYPXCYX3XTCJ5CTIDLXHAC">also iterators</a> and I don't need to repeatedly call <code>.get()</code> in my tests :-)</p>
<p>Here's a couple of things I had a chance to reflect on, following this story:</p>
<ul>
<li>
<p>Having such a go-to place as users.rust-lang.org is exactly what I'm missing while developing for Android. To my knowledge, there just isn't anything like this for that ecosystem, and everyone just shouts in the abyss of Stack Overflow and tries to sort out random pieces of code coming from there.</p>
</li>
<li>
<p>This type system wrangling is one of the things that makes dynamically typed languages more productive. And yes, I'm aware of the downsides, so no need to repeat the mantra of "typed languages remove a whole class of bugs" in the comments. Better think of the whole new class of code structures you need to learn and maintain to do it :-)</p>
</li>
</ul>
<h2>CLI tool</h2>
<p>Rust's packaging tool, <a href="https://doc.rust-lang.org/cargo/">Cargo</a>, has a built-in notion of "examples", where you can implement something working without affecting your library dependencies and have it automatically built alongside the main code.</p>
<p>So I implemented a CLI tool which works exactly in the way I <a href="https://softwaremaniacs.org/blog/2021/10/11/nfp/en/#debouncing">described in the previous post</a>, by removing sequential duplicates from <code>stdin</code> that happen within a specified grace period:</p>
<pre><code>inotifywait -m . | debounce -t 200
</code></pre>
<p>It's very bare-bones, as much as you would expect from a working example. I encourage anyone who needs additional options and features to write their own solution. (Here's a free idea: let the user specify by which part of the string to test equality, either with a regex or a field index, or something.)</p>
<h2>Open source</h2>
<p>So this technically makes me an open source maintainer. <em>Again</em>. But this time, having 10+ years of experience maintaining <a href="https://highlightjs.org/">highlight.js</a> I think I'm going to do things differently.</p>
<p>I don't like the default assumptions about what FLOSS maintainers are supposed to do these days. You're supposed to write code, do regular releases (lest your project will be pronounced dead), react on issues, review PRs from random people and be extra energized when dealing with anything that has the word "security" attached to it. And as a bonus for particularly good work you'll be rewarded with a Community™, whose self-proclaimed leaders would harass you for being a dictator who should feel guilty about not having the Community's interests, as formulated by the "leaders", in mind every single second of your life.</p>
<p>This is all bullshit, of course. But this is also reality. And I used to <a href="https://softwaremaniacs.org/blog/2019/02/25/misconception-about-oss-support/en/">bitch about it before</a>, there's nothing new here.</p>
<p>So here's what I'm going to do:</p>
<ul>
<li>
<p>I won't develop the code past what I need from it myself. If someone needs more features, they should write their own solution and maintain it (or not maintain it!) in the way they want. The license explicitly allows it.</p>
</li>
<li>
<p>I am interested in what other people would make of it, but I make no promise about accepting all derivative work into my code. As long as you don't forcefully insist on having your PR merged, I remain a nice person and encourage sharing of ideas!</p>
</li>
<li>
<p>I am <em>especially</em> interested in suggestions (in any form) on improving my Rust. This is, after all, what I wrote the thing for!</p>
</li>
</ul>
<p>In this light, my choice of <a href="https://pijul.com/">pijul</a> as a version control system plays well into this, as I expect to be somewhat shielded from Github's crowd where that sense of needy entitlement is especially strong.</p>
<p class=note><small>A random recent example is <a href="https://github.com/psf/black/issues/517">this thread</a> where people with Opinions™ have been harassing maintainers of <a href="https://pypi.org/project/black/">black</a> about a minor issue for <em>three years</em>, and not a single one of them thought of volunteering to maintain a fork with the stability guarantees they ostensibly require so hard. Such work is not much fun of course, but they assume the maintainers owe it to them.</small></p>
<p>P.S. I think I should write more about pijul, it's an interesting project!</p>
<p>P.P.S. By the way, check out <a href="https://highlightjs.org/">highlight.js</a>! Since I transferred it to more motivated people it became such a powerhouse!New pet project
2021-08-10T16:27:41.084480-07:00https://softwaremaniacs.org/blog/2021/03/06/new-pet-project/So anyway, I'm making a shopping list app for Android. As I understand, "shopping list" is something of a hello-world exercise of Android development, which may explain why there are so many rudimentary ones in Google Play. Only in my case I actually need one, and I know exactly what ...
<p>So anyway, I'm making a shopping list app for Android. As I understand, "shopping list" is something of a hello-world exercise of Android development, which may explain why there are so many rudimentary ones in Google Play. Only in my case I actually need one, and I know exactly what I want from it.</p>
<p>See, for the past 10 years or so I've been in charge of food supply in our family, which includes everything from grocery shopping logistics, to cooking, to arranging dishes in the dishwasher. And the app is an essential part of the first stage of that chain.</p>
<p><a name=more></a></p>
<h2>Previously</h2>
<p>Up until recently I used <a href="https://www.outofmilk.com/">Out of Milk</a>, which someone suggested me a long time ago, and at that time it was probably the best choice. I remember being quite happy to pay for a full version. Over time though it got a little bloated in ways I didn't need and a little neglected in places I cared about. The UI got very "traditional", requiring fiddly unnecessary motions for core functionality.</p>
<p>Here's the short list of its wrongs I still remember:</p>
<ul>
<li>
<p>Start-up time of several seconds, sometimes overflowing into dozens. I believe my 4-year old phone should be perfectly able to load a <em>shopping list</em> in sub-second time.</p>
</li>
<li>
<p>Adding an item when it's already on the list results in two identical items on the list. (Yes, really.)</p>
</li>
<li>
<p>Auto suggest when adding an item has whatever ordering and limits the amount of displayed results. This meant I could never get "Tomatoes" in there, as they were buried <em>under</em> "Roma tomatoes", "Cherry tomatoes", and a few others with no way to scroll to it.</p>
</li>
<li>
<p>Tiny click target to check an item off the list. I was constantly fat-fingering around those and getting into a different screen.</p>
</li>
<li>
<p>Checking an item off the list puts it into another list below the main one, which you either have to empty all the time, or end up with a huge scroll height. As I understand, the idea was that you could uncheck the items from there to put them back on the list, but that's unrealistic with my catalog of ~ 150 items.</p>
</li>
<li>
<p>"Smart" categorization kept inventing excessively detailed categories leading to several one-item categories clogging up the list.</p>
</li>
<li>
<p>Sometimes unsuccessful synchronization would "forget" added items on the list. Which is funny because I didn't have anything to synchronize with!</p>
</li>
</ul>
<h2>Now what</h2>
<p>I probably could spend some time on searching for an app that'd suit me better, but… Look, I'm a programmer. Writing code is what I do! And I wanted to play with Android development since forever, and the recent <a href="/blog/2020/04/14/on-kotlin/">exposure to Kotlin</a> gave me all the reasons I didn't really need in the first place :-)</p>
<p>Here's a laundry list of what I want from a shopping list:</p>
<ul>
<li>
<p><strong>Automatic ordering</strong> based on the order in which I buy things. I've had this idea ever since I was using <em>Out Of Milk</em>, because ordering manually sucks, and it feels like something computers should be able to do well, right? However it's really not trivial to implement, if you think of it. So it was my main challenge and a trigger to actually start the project.</p>
</li>
<li>
<p><strong>Fuzzy search</strong> for suggested items. I'm used to typing 3-4 characters in my Sublime Text to go to every file or identifier in a project. I want the same service here.</p>
</li>
<li>
<p><strong>Smart sorting</strong> of suggested items. It could take into account closeness of matching, frequency and recency of buying.</p>
</li>
<li>
<p><strong>Multiple lists</strong> with separate histories. Different stores have different order of aisles, and I buy different things in them. A single list won't cut it.</p>
</li>
<li>
<p><strong>Renaming and annotating items</strong>. I get annoyed by typos and spelling errors, I want to correct them. And sometimes I want to add a short note to an item (like a particular brand of cheese, or a reminder that I need two cartons of milk this time).</p>
</li>
<li>
<p><strong>Color-coded categories</strong>, to give visual aid in scanning what otherwise would be a plain list of strings. They don't have to be terribly detailed.</p>
</li>
<li>
<p><strong>Less</strong> of buttons, check boxes and dialogs. I want to interact with the content itself as much as possible. Swiping items off the list instead of clicking a checkbox. Having lists themselves in a carousel, instead of choosing their names from a <code><select></code>, etc. Oh, and no settings, if I can get away with it!</p>
</li>
<li>
<p><strong>Undo</strong>. It's really annoying to accidentally swipe off something covered by your thumb only to realize it's not what you intended, and now you have no clue what it was.</p>
</li>
<li>
<p><strong>GPS pinning</strong>. This is one aspirational feature I'll probably tackle last, if ever. I want to pin a list to a particular geo location, so the app would automatically select it when I'm at this store again.</p>
</li>
<li>
<p>Also, no tracking, ads or other such bullshit. Should be self-explanatory :-) Not having some ugly API SDK making network calls at startup should really help with performance.</p>
</li>
</ul>
<h2>Current status</h2>
<p>I actually first started working on it at the end of 2019 and made good progress into 2020… but then something got in the way.</p>
<pre><code>commit 8ca7b341801db3fda2e6fdbb5c1436d2b917b123
Author: Ivan Sagalaev <maniac@softwaremaniacs.org>
Date: Fri Dec 4 20:43:11 2020 -0800
Remove .idea/* from under git
commit a7d58b20d051b47cdc79868f578c85ba831c4801
Author: Ivan Sagalaev <maniac@softwaremaniacs.org>
Date: Sun Jan 26 22:21:46 2020 -0800
Rename `actualRecency` -> `recencyScore`
</code></pre>
<p>Yeah… Anyway, after making an effort to restart the project I'm making good progress again and actually feel really happy about it all!</p>
<figure class="picture right">
<img src="/media/blog/shopping-list.png">
<figcaption>Swiping right to "Buy" a thing</figcaption>
</figure>
<p>About a month ago I started dogfooding the app and was able to deleted Out Of Milk from my phone (So long and thanks for all the fish!) I've got the first five features mostly done, but there's nothing like actually using it that keeps showing me various edge cases I could never think about. I love this process :-)</p>
<p>Crucially, I can now add "Tomatoes" by just typing "t", "m" — and have them as the first suggestion.</p>
<p>The app <em>looks</em> pretty rudimentary, as you'd expect at this stage. But really, this time I want to not just fool around and dump the code somewhere in the open, I actually want to make a finished, sellable product out of it. Going to be a fun adventure! (Technically, me and my wife already tried selling my shareware tools at some time in the previous century, but we managed to only sell about two copies, so it doesn't count.)</p>
<p>Wish me luck :-)Misconception about OSS support
2019-06-03T15:33:08.462329-07:00https://softwaremaniacs.org/blog/2019/02/25/misconception-about-oss-support/You wouldn't think a free syntax highlighting library would be a strong dependency for the development process of a business, and yet I'm waking up on a Monday to a flurry of comments and even one personal email from engineers eager to ask me to work for free for their ...
<p>You wouldn't think a free syntax highlighting library would be a strong dependency for the development process of a business, and yet I'm waking up on a Monday to a flurry of comments and even one personal email from engineers eager to ask me to work for free for their employers.</p>
<p>So of course I took time to scathingly turn it into a teachable moment.</p>
<p><a name=more></a></p>
<p><cite><a href="https://github.com/highlightjs/highlight.js/issues/1984#issuecomment-466941892">https://github.com/highlightjs/highlight.js/issues/1984#issuecomment-466941892</a></cite>:</p>
<blockquote>
<p>I would like if you revert the change. It is currently blocking a lot of build from other people</p>
</blockquote>
<p>Let me take this as an opportunity to explain something about the current sorry state of relationship between businesses and open source projects. (Yeah, I know, but people still don't get it.)</p>
<p><a href="https://highlightjs.org/">highlight.js</a> is not a business, it's a hobby.</p>
<p>It means that whatever gets pushed to this repository or npm should be assumed to be the result of someone having fooled around and gone away for a weekend with their family. Or for a busy working day at their job.</p>
<p>If a business has made a decision to rely on this artifact for anything requiring any sort of stability (i.e. "blocking a lot of build from other people"), it made a stupid and uninformed decision. Or more realistically, it simply relies on maintainers feeling ashamed enough to quickly fix problems when they happen. Even more realistically, it just accepts the fact that their engineers are going to deal with maintainers by soliciting free support, because <em>it has always worked this way</em>. I, for one, don't feel any urge at all supporting someone's misplaced expectations :-)</p>
<p>So, dear fellow engineers, please take this build hiccup as an opportunity to explain to your particular business people that their entire intellectual property is a thin layer on top of a shaky foundation of open-source code lazily maintained by hobbyists or paid for by other businesses having their own goals in mind. Mention the <a href="https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/">leftpad story</a> for more effect. </p>
<p class=strong><strong>If they really want stability they have to invest in it.</strong></p>
<p>… by, for example, hiring engineers to deal with myriad of dependencies, maintain local stable forks, contribute patches upstream, or whatever — the key point is that it should not look like it "just works" on fairy dust.highlight.js turns 10
2016-08-16T23:27:06.934000-07:00https://softwaremaniacs.org/blog/2016/08/16/highlight-js-turns-10/Almost exactly ten years ago on August 14 I wrote on this very blog (albeit in a different language): So on yesterday's night I got worked up and decided to try and write [it]. But on a condition of not dragging it on for many days if it didn't work ...
<p>Almost exactly ten years ago on August 14 I <a href="http://softwaremaniacs.org/blog/2006/08/14/highlight-js/">wrote on this very blog</a> (albeit in a different language):</p>
<blockquote>
<p>So on yesterday's night I got worked up and decided to try and write [it]. But
on a condition of not dragging it on for many days if it didn't work out on
the first take, I've got enough on my mind as it is.</p>
</blockquote>
<p>It <strong>did</strong> work out. Which makes August 14 the official birthday of <a href="https://highlightjs.org/">highlight.js</a>! Although it wasn't until 5 days later when the <a href="https://github.com/isagalaev/highlight.js/commit/8e819059934f0e7da6637ace3d699708316a4b0e">first meaningful commit</a> was recorded. Using any form of source control was only an afterthought for me back then :-)</p>
<p><a name=more></a></p>
<h2>Quick flash back</h2>
<ul>
<li>Switched through 3 version control systems (Subversion, Bazaar, Git).</li>
<li>Made 71 (seventy-one!) public releases, with a regular <a href="http://softwaremaniacs.org/blog/2015/09/09/highlight-js-cadence/en/">6 week cadence</a> for the past year.</li>
<li><a href="https://highlightjs.org/static/demo/">166 languages and 77 styles</a> created by <a href="https://github.com/isagalaev/highlight.js/blob/master/AUTHORS.en.txt">216 contributors and 3 core developers</a>.</li>
<li>Accumulated <a href="https://github.com/isagalaev/highlight.js">8062 stars on Github</a>.</li>
<li>Went from being a single .js file to be provided as a custom-built package, a node.js library and served from two independent CDNs.</li>
<li>Acquired a mighty 490-strong unit test suite.</li>
</ul>
<h2>Identity</h2>
<p>With the obligatory self-congratulatory stuff out of the way, let me now get to the main purpose of this anniversary post: explaining what makes highlight.js different among other highlighters. I'm not going to talk about obvious features listed on the front page of highlightjs.org. I'll try to document the philosophy that up until this point I was only referring to in various places, but never was able to put together.</p>
<p>I'll try to keep it short (otherwise I'll never finish this post!)</p>
<p>It is my deep conviction that highlighting should make code more readable instead of simply making it… fun, for the lack of better word.</p>
<p>Let me explain by example. Here's some things that serve towards better readability when highlighted:</p>
<ul>
<li>
<p><em>Keywords</em>, because they define the overall structure of the code and because they need prominent highlighting simply because they otherwise look too much like user variables.</p>
</li>
<li>
<p>Function and class <em>titles</em> at the place of declaration, because they effectively define a domain-specific language, an API. They have a very distinct semantics.</p>
</li>
<li>
<p><em>Built-ins</em> and <em>special literals</em>, because it helps to know what in the code belongs to the language and what is defined by the user.</p>
</li>
</ul>
<p>And these are the things highlighting which makes no sense, in my humblest opinion:</p>
<ul>
<li>
<p>CamelCase identifiers, because it's not consistent: you get identifiers of the same nature either highlighted or not simply because they happen to be named differently.</p>
</li>
<li>
<p><code>.method()</code> calls, because I, frankly, can't even invent a plausible reason of why they should be highlighted in any way.</p>
</li>
<li>
<p>Punctuation, because it significantly increases the amount of color clutter in any given snippet which makes it hard on the eyes.</p>
</li>
</ul>
<p>I have a hypothesis that the only reason why these things get highlighted traditionally is simply due to the fact that they could easily be picked up by a regexp :-)</p>
<p>In highlight.js we sometimes go to great lengths to highlight what makes sense instead of what's easy ("semantics highlighting?"). In lisps we highlight the first thing in parentheses, regardless of it being or not being built-in, and we have special rules to <em>not</em> highlight them in quoted lists and even in <a href="https://github.com/isagalaev/highlight.js/commit/197f82a">argument lists in lambdas in Scheme</a>. In VimScript we try our best to <a href="https://github.com/isagalaev/highlight.js/commit/f2e6828">distinguish between strings and line comments</a> even though they seem to be deliberately designed to trip up parsers. And we recognize quite a <a href="https://github.com/isagalaev/highlight.js/blob/master/test/markup/xml/space-attributes.txt">few</a> <a href="https://github.com/isagalaev/highlight.js/blob/master/test/markup/xml/unquoted-attributes.txt">ways</a> of spelling out attributes in HTML.</p>
<p>The downside of this is that highlight.js is heavier and probably slower than it could've been. These were the reasons why we recently <a href="http://meta.stackexchange.com/a/279361">lost a bid</a> on replacing the incumbent highlighting library on Stack Overflow. I still think they made a mistake :-)</p>
<p class=strong><strong>Because quality beats lightness!</strong></p>
<h2>Come join us!</h2>
<p>Of course no code base is ideal, especially a 10 year old one, there's always so much to do! However, since our way of dealing with the stress of Open Source maintenance is to not have it happening to us, the development of highlight.js goes at a rather leisurely pace. Which means we've accumulated quite a few plans without any reasonable expectation of when they might happen.</p>
<p>There's a new exciting parser in the making. We'd like to do an overhaul of our build system and packaging. There are plans to have pluggable renderers in addition to HTML.</p>
<p>You could be the one taking one of those over and covering yourself with great glory! If interested, drop me a line at <a href=mailto:maniac@softwaremaniacs.org>maniac@softwaremaniacs.org</a>.Cadence for highlight.js
2015-09-09T12:22:55.312000-07:00https://softwaremaniacs.org/blog/2015/09/09/highlight-js-cadence/We're now doing releases of highlight.js on a cadence of 6 weeks. The latest release 8.8 was the second in a row (which is what technically allows me to write "are now doing"). The reason for that is we (well, mostly me) had a certain difficulty deciding when to actually ...
<p>We're now doing releases of <a href="https://highlightjs.org/">highlight.js</a> on a cadence of 6 weeks. The latest release <a href="https://highlightjs.org/#news-87">8.8</a> was the second in a row (which is what technically allows me to write "are now doing").</p>
<p><a name=more></a></p>
<p>The reason for that is we (well, mostly me) had a certain difficulty deciding when to actually release something. We don't develop new grand features on a regular basis, all that's happening is bug fixes, new language definitions and new styles. And releasing a new version for every little change is going to annoy end users and drive downstream maintainers mad. So releases tended to happen pretty much by chance. Like someone would ask on a random GitHub issue when is the next release and I would think, why not right now?</p>
<p>This anarchic approach actually worked for some time while the project wasn't going too fast. But as this has changed in the recent couple of years and as I've had left users stranded waiting for a new release for months on a couple of occasions I though it's time to get more serious.</p>
<p>Our <a href="http://highlightjs.readthedocs.org/en/latest/release-process.html">release process</a> is now quite simple, too. A maintainer only has to document the changes, update the version number and push it all to GitHub. GitHub then pings a certain API handler on highlightjs.org and the site does everything else: </p>
<ul>
<li>updates the code,</li>
<li>builds a CDN package and pushes it to GitHub from where two independent CDN providers pick it up, also automatically,</li>
<li>builds and pushes a package to npmjs.org,</li>
<li>updates the live <a href="https://highlightjs.org/static/demo/">demo</a> and various metadata (version number, language count, etc),</li>
<li>pre-builds site's caches used for dynamic custom builds,</li>
<li>publishes version-related news from the CHANGES file,</li>
<li>restarts itself,</li>
<li>goes on social media and spends a day generating and over-excited buzz about the release (OK, probably not this :-) ).</li>
</ul>
<p>The process is still fragile but bugs are getting fixed and it's anyway immensely simpler than doing it all manually.</p>
<p>See you next on October, 20th!Styles unification: first results
2015-05-06T14:37:53.791000-07:00https://softwaremaniacs.org/blog/2015/05/06/styles-unification-first-results/Yesterday I gathered some willpower and began working on a long awaited (by myself, at the least) style unification in highlight.js. Here's the first taste of why I think it is important. Let's take one of the recently added style — the "Android Studio" — and see how it displays ...
<p>Yesterday I gathered some willpower and began working on a long awaited (by myself, at the least) <a href="https://github.com/isagalaev/highlight.js/issues/348">style unification in highlight.js</a>. Here's the first taste of why I think it is important.</p>
<p><a name=more></a></p>
<p>Let's take one of the recently added style — the "Android Studio" — and see how it displays two config languages that happen to not count as "hot" these days: Apache and .Ini:</p>
<p class="picture center"><img src="/media/blog/apache-before.png"></p>
<p class="picture center"><img src="/media/blog/ini-before.png"></p>
<ul>
<li>Section headers, variable expansions, rewrite flags aren't highlighted at all.</li>
<li>Pre-defined literals ("True", "on") are highlighted in .Ini unisg the same color as directive names in Apache.</li>
</ul>
<p>To fix this particular case I had to <a href="https://github.com/isagalaev/highlight.js/blob/new-styles/docs/css-classes-reference.rst#stylable-classes">define semantics for classes "section", "meta", "variable", "name" and "literal"</a>, and dropped all the Apache- and .Ini-specific rules from styles.</p>
<p>Here's how it looks now, nice and consistent:</p>
<p class="picture center"><img src="/media/blog/apache-after.png"></p>
<p class="picture center"><img src="/media/blog/ini-after.png"></p>
<p>There's a looong road ahead but after it's done designing a new style will be a matter of using a relatively short list of well-documented classes with a good guarantee that <em>all</em> languages will look decent.I learned C# in 4 days!
2018-10-23T09:34:10.711863-07:00https://softwaremaniacs.org/blog/2015/02/06/learned-csharp-4-days/You know those crazy books, "Learn whatever programming in 21 days"? I mean, who can afford spending that much time, right? Some background I have a friend who employs a very particular workflow for dealing with his digital photos. It often involves renaming and merging files from different cameras into ...
<p>You know those <em>crazy</em> books, "Learn whatever programming in 21 days"? I mean, who can afford spending that much time, right?</p>
<p><a name=more></a></p>
<h2>Some background</h2>
<p>I have a friend who employs a very particular workflow for dealing with his digital photos. It often involves renaming and merging files from different cameras into a single chronologically ordered event, relying on natural sorting of file names in Windows Explorer. File names are constructed of picture time fields and running counters, like "2015-02-06_001.jpg".</p>
<p>This is of course too tedious to do by hand, so he was very happy with a small specialized Windows utility that I wrote for him a few years ago when Windows XP ruled the world and I still programmed in Delphi. The program worked fine until, with the natural flow of time, the world switched to Unicode and newer Windows started to display question marks in place of Cyrillic characters in the program's UI. This made it rather unusable. There were also other small and not so small <em>imperfections</em> about the program that, as I understand, added considerable factor of irritation to the act of processing photos. ("And when it happens upon a panoramic shot you can as well go and pour yourself some coffee because UI is frozen for minutes while loading the preview…")</p>
<p>So a year ago when we've been visiting his family for Christmas he nagged me, politely but emphatically, about at least making the UI readable again and also, just may be, fixing some of the most outrageous annoyances uncovered over the years of usage. The only problem was… I've lost the source code! I know, it might sound utterly unbelievable these days but it was written in the era before GitHub, and back in those days I've been using — wait for it — <a href="http://en.wikipedia.org/wiki/Zip_drive">Zip drives</a> to store my backups. Which in hindsight turned out to be suboptimal: they fail.</p>
<p>All this, however, provided me with a unique opportunity for making a <em>really good</em> Christmas gift this year…</p>
<p>I suppose there exist people out there who could come up instantly with a perfect gift idea for any of their dozens of friends upon being woken up in the middle of the day, but most of us seem to be destined to endure the agony of scratching the bottom of the void bowl of "what on Earth should we give them this time that won't suck like the last time!" So I was pretty much stoked when some weeks before we were about to leave for the trip it hit me that I actually could <em>write the same program from scratch!</em></p>
<p>And I'm happy to say that ultimately the idea did work out as intended and at some point it has even been uttered that it was "the best gift ever!" </p>
<p class="picture center"><img src="/media/blog/pe-screenshot.png"></p>
<p>The best thing though is that now I can actually maintain the code (which I'm doing once a week these days) and not feel sorry for writing another half-working utility. Software is a process, after all.</p>
<h2>The endeavor</h2>
<p>So I had to learn how to write Windows GUI apps, again. Going back to Delphi was pretty much out of the question as even back in the time it was already loosing the mind share to quickly rising C# and I simply assumed that by now this process has completed. Besides, I actually wanted to learn how Windows GUI programming is "officially" done these days. (Notwithstanding the fact that we're still talking about traditional desktop software, not Metro tiles.)</p>
<p>The lazy evaluation phase took me a couple of weeks, during which I only figured out which of the three-letter acronyms I need to know: WPF, MVVM, C#. The actual design and implementation with ongoing research took 4 days — literally. The most helpful resources along the way were <a href="http://www.wpf-tutorial.com/">WPF Tutorial</a> and <a href="http://stackoverflow.com/">Stack Overflow</a> (of course).</p>
<p>Most importantly though, it was rigorous planning and doing design ahead of coding that allowed me to get the thing done. Here's a few snapshots of my whiteboard with the UI mock-up and current tasks divided by priority:</p>
<p class="picture center">
<img src="/media/blog/pe-plan-1.jpg">
<img src="/media/blog/pe-plan-2.jpg">
</p>
<p>And though this entire article is not of particular practical importance — I'm simply sharing my emotions here — there is one point I'd really like to drive home:</p>
<p class=strong><strong>Planning works. Always.</strong></p>
<p>If you're one of those who doesn't "believe" in it, and for whom "plans never work", I say you most certainly are just doing it wrong and fixing it is a matter of learning how. Indulge yourself.</p>
<h2>C# and WPF</h2>
<p>I'll say from the get go that I can't presume on having an accurate opinion about a mainstream language after spending just 4 days with it. This is only my first impression. </p>
<p>It <em>feels</em> to me like a modern Delphi, which is probably not surprising given that both were invented by the same <a href="http://en.wikipedia.org/wiki/Anders_Hejlsberg">Anders Hejlsberg</a>. Type inference makes static typing a lot more palatable, however the time spent on satisfying the compiler's complaints about inconsistent types still feels to me like the time lost. I was pleasantly surprised though by some nice things making their way into a 10+ year old language: lambdas, <code>+=</code> for registering event listeners, LINQ — this is all very handy.</p>
<p>But overall, for a Pythonista, the language still feels way too verbose and ceremonious. Want to display a regular public attribute in UI? Oh, just turn it into a property with a getter and a setter <em>and</em> an accompanying separate private field of the same type. A dozen or so lines of code to satisfy a convention — not cool.</p>
<p>Likewise, I can't compare WPF to any modern UI framework as I didn't use any (which is a shame, really). From this position, what immediately feels right about WPF is the data binding concept. Instead of writing disjoint pieces of imperative code updating disjoint pieces of UI and trying doing it in the right order and not forgetting anything, you now define <em>relations</em> like "this ListView shows this list from my data model" and "this action is enabled when these conditions are met and it is bound to these UI controls". And all the controls' state is updated pretty much automatically. I believe it's that thing they call "reactive programming" these days…</p>
<p>The GUI editor is unusable. It took me probably only half a day before I completely switched to editing XAML by hand, and as I understand it's how it's done in practice. Here's a simple example why the editor sucks. XAML layout works best by dividing your window into panels, some of which are of fixed size while others automatically fill available space. Only the GUI editor doesn't do that, instead it gives all panels fixed sizes in <em>pixels</em>, thus defeating the purpose completely. So, surprisingly the old Delphi GUI editor remains the best in my limited opinion: it <em>was</em> usable and it did the right things by default most of the time.</p>
<h2>The code</h2>
<p>I didn't publish it anywhere yet but I will once I figure out SSH keys on Windows and choose proper licensing. I'm very interested in a code review from someone versed in WPF/C# but what I <em>don't</em> want to do though is maintain it as a proper project with contribution and such, it's just too much hassle.ijson 2.0
2014-10-12T23:37:27.377000-07:00https://softwaremaniacs.org/blog/2014/10/11/ijson-20/Yesterday I released version 2.0 of the streaming JSON parser ijson. It mostly includes bug fixes accumulated over the last year and the only reason to change the major part of the version number was that import ijson doesn't do any discovery magic anymore. Import Previously, when you did import ...
<p>Yesterday I released version 2.0 of the streaming JSON parser <a href="https://pypi.python.org/pypi/ijson/">ijson</a>. It mostly includes bug fixes accumulated over the last year and the only reason to change the major part of the version number was that <code>import ijson</code> doesn't do any discovery magic anymore.</p>
<p><a name=more></a></p>
<h2>Import</h2>
<p>Previously, when you did <code>import ijson</code> it used to first go on a trial-and-error search for the latest version of the C library yajl and if none found used the Python backend as a fallback. This approach proved to be <a href="https://github.com/isagalaev/ijson/pull/22">buggy</a> and unpredictable: simply moving your app into another environment might have introduced different behavior, like being significantly slower on a machine without yajl or exposing bugs present in one backend but not the other.</p>
<p>So, following the "<a href="http://legacy.python.org/dev/peps/pep-0020/">explicit is better than implicit</a>" commandment I dropped the discovery, so <code>import ijson</code> now always loads the safe pure Python backend. You can explicitly import any of them with <code>import ijson.backends.<name> as ijson</code>.</p>
<p>You might argue that <code>import ijson</code> is still not explicit enough but I didn't want to force users to always use a full backend name. Because "<a href="http://legacy.python.org/dev/peps/pep-0020/">practicality beats purity</a>".</p>
<h2>Other changes</h2>
<ul>
<li>Fixed breakage when a multi-byte UTF-8 characters was split by a buffer boundary.</li>
<li>Python backend now accepts custom buffer size as an argument.</li>
<li>Always return integer values as 'type int' even if spelled like <code>1.0</code> or <code>1E2</code> in JSON.</li>
<li>Use <a href="http://pythonwheels.com/">Wheels</a> for a distribution format.</li>
</ul>
<p>Also the lexer is now reimplemented as a generator and simplified a little bit, it's now <a href="https://github.com/isagalaev/ijson/blob/8da3f09151b26d4a754601305a617b7891a9aa39/ijson/backends/python.py#L24-L69">only 46 lines of code</a>. Funny thing, though: this change made it slightly faster on CPython but slightly slower on PyPy. Looks like PyPy really likes objects and doesn't mind all the <code>self.something</code> references and myriads of method calls. Go figure :-).highlight.js: what's next
2015-08-30T12:39:17.611000-07:00https://softwaremaniacs.org/blog/2014/07/26/highlight-js-what-next/This is a loosely ordered dump of ideas about the future of highlight.js presented for purposes of information and discussion. The project is already big enough that the best I can do for it is not writing code but trying to get people interested in joining in. Let's see if ...
<p>This is a loosely ordered dump of ideas about the future of <a href="http://highlightjs.org/">highlight.js</a> presented for purposes of information and discussion. The project is already big enough that the best I can do for it is not writing code but trying to get people interested in joining in. Let's see if I can show you that this project is not just about herding a bunch of regexes :-)</p>
<p><a name=more></a></p>
<h2>Testing</h2>
<p>Our current "<a href="http://highlightjs.org/static/test.html">test suite</a>" is well past being adequate. It started its life as a demo page that accidentally assumed along the way some rudimentary testing responsibilities. A good demo is short, neat and beautiful while a good test should be comprehensive. Our suite is unfortunately neither: it's big and ugly and at the same time it doesn't actually evaluate tests, relying instead on a human to notice that something is wrong with those few dozens of languages.</p>
<p>So before we go any further we need a test suite that would:</p>
<ul>
<li>test language detection on small and non-obvious fragments</li>
<li>compare produced markup against control samples with <em>all</em> supported language features</li>
<li>perform special tests for different settings and features of the library</li>
</ul>
<p>I'd say it's a nice big project in its own right!</p>
<h2>Class name unification</h2>
<p>One of the early design principles for highlight.js was having just a few common class names in order to have universal styles that would work for any language. Unfortunately this principle wasn't strongly enforced. We now have language-specific classes, language-specific style rules and whole language-specific styles. This is a maintenance nightmare as the number of unique conditions that should be visually tested is a production of the number of unique language features and the number of styles. And since this is an impossible amount of work, we usually test only a small subset of languages and styles and rely on pure luck for the rest which is a majority.</p>
<p>One way to deal with that is to confine class names to a very generic fixed set and force languages to use only that. Apart from reducing the amount of mess it will also enable an interesting side feature — automatically generated styles. If the semantics of class names is fixed we could intelligently group them and assign to those groups a few distinct font/color combinations provided by a user.</p>
<p>And this is certainly going to be completely backwards incompatible.</p>
<h2>De-specializing keywords</h2>
<p>Currently we parse keywords differently than the rest of language features. They use their own completely independent parsing pass. This gives us speed (which was the main reason for introducing it) and a neat definition syntax at the price of code size and complexity. The speed advantage is largely irrelevant by now, as browsers became much faster than six years ago. So it is a good time to throw that special code away.</p>
<p>The syntax will change, so instead of this:</p>
<pre><code>{
keywords: 'if for while ... ',
contains: [
STRINGS,
NUMBERS
]
}
</code></pre>
<p>We're going to have something like this:</p>
<pre><code>{
contains: [
{
className: 'keyword',
beginWords: 'if for while ... ',
},
STRINGS,
NUMBERS
]
}
</code></pre>
<p>It's not by any means final, it just shows the idea that I want keywords to become a regular parsing mode.</p>
<p><a name=complex-modes></a></p>
<h2>Complex modes</h2>
<p>Our current syntax can't express things defined as a sequence of other things, like this: </p>
<pre><code>function ::= <title> '=' <params> '->' <body>
</code></pre>
<p>The problem here is that you don't know that you're in a function definition until you get to the <code>-></code> symbol. Our parser can start a new parsing mode based only on a single starting lexeme: an opening quote, a keyword, a number, etc.</p>
<p>To work around that we use a horrible kludge: </p>
<ol>
<li>Match the <em>whole</em> body of the mode with a single regex.</li>
<li>Start a new mode at the beginning of the matched string.</li>
<li>Return the whole thing back to the parser.</li>
<li>Parse it again, by the rules of the new mode.</li>
</ol>
<p>Not only it's ugly, it also works only when we're lucky that the whole body can actually be parsed by a regex.</p>
<p>So we need to implement a logic allowing the parser to treat anything that matches just the first lexeme as a beginning of the mode, start parsing it and fall back if it doesn't work out.</p>
<h2>Pipe dreams</h2>
<p>These I'm posting mostly for fun. There's no plan, not even any certainty that they're actually needed. But, hey, may be there's something to it, still :-)</p>
<ul>
<li>
<p>A background feedback mechanism for reporting language usage/detection statistics directly from highlighted code on other sites.</p>
</li>
<li>
<p>Balance keywords relevance based on their usage frequency using machine learning instead of human guessing.</p>
</li>
<li>
<p>Group all languages into more groups for more convenient download (like "Scripting", "Scientific"… "Toy", "Dead", "Weird" etc.)</p>
</li>
</ul>
<h2>Interested?</h2>
<p>If you're interested in helping with any of those things please drop a message to our <a href="https://groups.google.com/forum/#!forum/highlightjs">developer discussion group</a>. Thank you!New life of Marcus
2012-10-21T18:29:01.485000-07:00https://softwaremaniacs.org/blog/2012/10/21/marcus-new-life/A while ago I reported on switching this blog to a custom software named Marcus. Despite its source code being available in the open I didn't intend developing it into a full-blown project for two reasons: a) maintaining it would have taken much more time than I could afford and ...
<p>A while ago I reported on <a href="http://softwaremaniacs.org/blog/2010/07/19/marcus-bilingual-blog/en/">switching this blog to a custom software named Marcus</a>. Despite its source code being available in the open I didn't intend developing it into a full-blown project for two reasons: a) maintaining it would have taken much more time than I could afford and b) being completely <em>anal</em> about my own blog software I didn't want to piss off contributors by constantly rejecting all the features they would propose. Anyway, if someone felt so compelled they could take the code and start developing it on their own.</p>
<p>Which is exactly what happened. <a href="https://github.com/adw0rd">Mikhail Andreev</a> took my old code, put it on GitHub and, as far as I can see, already added quite a bit to it. The project got a different name (by my own request) — <a href="https://github.com/adw0rd/marcus">django-marcus</a>. It is also <a href="http://pypi.python.org/pypi/django-marcus">available on PyPI</a>.</p>
<p>I'm honored someone deemed my code useful and glad that it won't end up being neglected after all. All hail to Open Source!HTTP and JSON in highlight.js
2012-05-10T02:02:40.440000-07:00https://softwaremaniacs.org/blog/2012/05/09/http-and-json-in-highlight-js/Fresh from the oven, highlight.js now has a pretty cool feature that, to the best of my knowledge, is not supported by any other syntax highlighter. Namely, we can now recognize and highlight HTTP request headers and its body if it happens to be code in a language we know. ...
<p class="picture right"><img src="/media/blog/http-json.png"></p>
<p>Fresh from the oven, highlight.js now has a pretty cool feature that, to the best of my knowledge, is not supported by any other syntax highlighter. Namely, we can now recognize and highlight HTTP request headers <em>and</em> its body if it happens to be code in a language we know. This is intended for all sorts of API docs that often present the entire HTTP payload transferring some kind of JSON or XML.</p>
<p><a name=more></a></p>
<h2>Story</h2>
<p>The feature was born out of a conversation with a user asking a very strange question: <a href="https://groups.google.com/d/msg/highlightjs/68R9l7zRWNo/8idom3cQ3fMJ">"How to disable highlighting for only a certain part of the code snippet."</a> I couldn't even imagine why anyone would want to have more than one language in one code snippet until he provided a simple example with an HTTP prologue and a chunk of JSON payload. The actual problem was that highlight.js simply didn't know JSON at that time. But it was also obvious that even if it knew the language of the body the headers could completely break language detection or just pick up random part of body highlighting like incidentally matching keywords, numbers etc.</p>
<p>So I answered the user with apologies that we can't help him right now but we might look into this problem at some point in the future. It turned out "some point in the future" came later that evening when I realized that we already have two key ingredients to solve it: highlighting nested languages (used for JavaScript in HTML for example) and language detection. It was just the matter of putting them together.</p>
<h2>Outcome</h2>
<p>We now have the language "<a href="https://github.com/isagalaev/highlight.js/blob/master/src/languages/http.js">HTTP</a>" that knows how to highlight request lines with a query string inside it, status lines with a numeric code, headers and their values.</p>
<p>We also have a strictly defined "<a href="https://github.com/isagalaev/highlight.js/blob/master/src/languages/json.js">JSON</a>" language that knows pretty much all of JSON. Many thanks to Douglas Crockford for making it so limited and simple to parse. The strict definition makes auto-detection very reliable.</p>
<p>Both languages are now in the so-called "common" set which means they will be available in the CDN-hosted version by default in the next release.</p>
<h2>Problems</h2>
<p>Since no heuristics is completely reliable it would be nice to have some way to specify the sub-language inside a snippet in the same way as it now possible for the whole snippet. The hard part is to invent a way that doesn't suck :-). If you have any ideas — please share!</p>
<p>The other problem is obviously that the code is still very fresh and inevitably contains bugs. So get the source, build it, test it and let us know. Thank you!Sponsoring in highlight.js
2012-04-11T14:53:29.547000-07:00https://softwaremaniacs.org/blog/2012/04/11/sponsoring-in-highlight-js/I want to draw your attention to an interesting offer made by Adam Kennedy from Kaggle to sponsor the development of syntax highlighting for the R language: highlight.js came to our attention with the addition of MATLAB support, as it is one of the two dominant languages used by our ...
<p>I want to draw your attention to an interesting offer made by Adam Kennedy from <a href="http://www.kaggle.com/">Kaggle</a> to <a href="https://groups.google.com/d/topic/highlightjs/-ogtOaK-TQY/discussion">sponsor the development of syntax highlighting for the R language</a>:</p>
<blockquote>
<p>highlight.js came to our attention with the addition of MATLAB
support, as it is one of the two dominant languages used by our
community. We plan to switch to highlight.js from prettify.js (and
already have in a dev branch).</p>
<p>Further, we would like to sponsor the addition to highlight.js of the
primary language used by our community, the R statistical computing
language ( <a href="http://www.r-project.org/">http://www.r-project.org/</a> ).</p>
</blockquote>
<p><a name=more></a></p>
<p>I think this is a nice opportunity to help a good project and make some money along the way. I'm sure Adam will be happy to clarify any details, so reply to the group if you're interested.</p>
<p>From the highlight.js part there is a <a href="http://softwaremaniacs.org/wiki/doku.php/highlight.js:language">language definition guide</a> to get you started. Of course I'm always happy to explain how things work in the highlighter in the hopes of getting more contributors on board and sharing maintenance :-).</p>
<p>Also I'm pretty excited about this thing in general. One of my focus since… well… pretty much since the inception of highlight.js was encouraging other people to contribute to the library. This way we've got such unique languages among syntax highlighters as MEL, RenderMan and Axapta, to name just a few. This sponsoring is a good precedent and if it works out to the mutual satisfaction of the parties I hope it won't be the last.Rainbow.js — a new kid on the highlighters' block
2012-09-07T01:35:24.504000-07:00https://softwaremaniacs.org/blog/2012/03/26/rainbow-js/There was a small spike in my referrers stats that led me to a new JavaScript highlighting library — rainbow.js. And since I love bashing other highlighters I couldn't resist this time too :-). Oh, but be sure that all of this is intended of course as a constructive criticism ...
<p>There was a small spike in my referrers stats that led me to a new JavaScript highlighting library — <a href="http://craig.is/making/rainbows">rainbow.js</a>. And since I love <a href="http://softwaremaniacs.org/blog/2011/05/22/highlighters-comparison/en/">bashing other highlighters</a> I couldn't resist this time too :-). </p>
<p>Oh, but be sure that all of this is intended of course as a constructive criticism only!</p>
<p><a name=more></a></p>
<h2>Size claim</h2>
<p>It says upfront that it's 1.2K in size. It isn't. If you include 5 languages it currently supports — it's 8.1K. Which is still impressive given that highlight.js is 11.8K with the same languages.</p>
<h2>Features</h2>
<p>It doesn't have any beside highlighting itself. No user markup, no line numbers, no language detection, etc. But as far as I understand, it's a design goal. And I can only wish the author <em>a lot</em> of patience in <del>telling people to shut up</del> <ins>carefully evaluating feature requests</ins>!</p>
<h2>Correctness</h2>
<p>This is where things get ugly, unfortunately. I loaded up my test suite and on the spot found these:</p>
<ul>
<li>prefixed strings in Python (<code>r""</code>, <code>u""</code>) aren't detected</li>
<li>tripple-quote strings in Python are detected wrong (first two quotes are treated as strings)</li>
<li>backslash escapes in strings aren't detected which can break the whole further highlighting in cases like <code>"a \" b"</code></li>
<li>names of old-style Python classes are not recognized (because of the lack of parens after the names)</li>
<li>doctype declaration in HTML is treated as a tag</li>
<li>tag attributes in HTML aren't detected reliably, like "checked" here: <code><input checked type="checkbox"></code></li>
<li>in the CSS snippet <code>{margin: 1cm 2cm 1.3cm 4cm;}</code> "1." and "4cm" are not recognized as values</li>
<li>in <code>div {width: 100%}</code> "100%" is not recognized as a value</li>
<li>in the selector <code>p[lang=en]</code> all "p", "lang" and "en" are detected as tags</li>
<li>in JavaScript literal regexps are not distinguished from devision operators which leads to all sorts of breakage</li>
</ul>
<p>I'm sure there are many other bugs because…</p>
<h2>Speculation</h2>
<p>… rainbow.js employs <a href="https://github.com/ccampbell/rainbow/blob/master/js/language/generic.js">generically defined lexing</a> for all supported languages. Which is good for keeping the library fit and slender but won't work for all the sheer insanity of syntaxes that humanity cared to invent over the latest half a century. </p>
<p>There are backslash escapes and double-quote escapes for strings. PHP, Ruby, Shell all allow embedded code within certain types of strings to a certain extent. JavaScript has literal regexps that clash with division. Pascal has different syntax for hex numbers. Lines starting with # are comments in many languages but in C they're preprocessor directives. And don't even get me started on <a href="https://github.com/isagalaev/highlight.js/blob/master/src/languages/perl.js">Perl</a>…</p>
<p>All in all I think that current design of rainbow.js won't allow it to grow past a family of not-too-conflicting language syntaxes. Which puts it in the same position as <a href="http://code.google.com/p/google-code-prettify/">Google Code Prettify</a>. Which for me means that we don't have to worry about this competition yet. But Google should :-). </p>
<p>Anyway I wish the best of luck to Craig Campbell in his endeavor!</p>
<h2>Envy</h2>
<p>I totally envy their site design!!!Completely unfair comparison of Javascript syntax highlighters
2012-03-23T00:00:17.503000-07:00https://softwaremaniacs.org/blog/2011/05/22/highlighters-comparison/During the time before latest release of highlight.js 6.0 I decided — for the first time in more than 4 years — to actually look at other highlighting libraries. Sure I knew of their existence before but nonetheless never felt compelled to do any serious comparison because highlight.js is a ...
<p>During the time before latest release of <a href="http://softwaremaniacs.org/soft/highlight/en/">highlight.js</a> 6.0 I decided — for the first time in more than 4 years — to actually look at other highlighting libraries. Sure I knew of their existence before but nonetheless never felt compelled to do any serious comparison because highlight.js is a fun project and I'm quite happy with the result. In fact this comparison has also been made for fun more than for anything else. I just wondered how actually good (or bad) highlight.js was looking among similar libraries.</p>
<p>I decided not to take into account highly subjective things like visual appeal (I'm not a good judge here), installation simplicity and documentation clarity (don't know how to measure them). Also I didn't evaluate number of supported languages. While it is a measurable quantity it doesn't mean much for an end user: if a tool doesn't support the language you need you don't care about dozens of others that it does support. Instead I concentrated on universally measurable things that make sense to everyone: size, speed and correctness.</p>
<p>Why "completely unfair" then, you ask? Because I knew who'd win before I even started :-).</p>
<p><a name=more></a></p>
<h2>Contenders</h2>
<p>If you go to trouble of searching the Internet for "javascript syntax highlighter" you'll inevitably stumble upon hoards of posts all ingeniously similarly titled "<code>N</code> useful/beautiful javascript tools" where <code>N</code> varies from 4 to 20-something. Those were circulating the network for years but, predictably, aren't a very good source of information because they don't actually evaluate usefulness or beauty of solutions they link to.</p>
<p>So I've just picked up those names that I've got used to seeing around in blogs and forums where people try to find such a tool:</p>
<ul>
<li><a href="http://alexgorbatchev.com/SyntaxHighlighter/">SyntaxHighlighter</a> by Alex Gorbachev, used on <a href="http://developer.mozilla.org/">MDN</a> and others.</li>
<li><a href="http://shjs.sourceforge.net/">SHJS</a> — a library built to be compatible with <a href="http://www.gnu.org/software/src-highlite/">GNU source-highlight</a> language definitions.</li>
<li><a href="http://code.google.com/p/google-code-prettify/">Google Code Prettify</a> — a highlighter used on <a href="http://code.google.com/">Google Code</a> and <a href="http://stackoverflow.com/">Stack Overflow</a>.</li>
<li>and finally <a href="http://softwaremaniacs.org/soft/highlight/en/">highlight.js</a> originally written by me, used on a popular Russian tech site <a href="http://habrahabr.ru/">Habrahabr.ru</a> and others.</li>
</ul>
<p>I've compiled an enterprisey-looking matrix of features supported by these libraries. It isn't intended for comparison per se because there are different use-cases and sometimes lack of features is a feature too. It's here to give you a general idea on what goal each one can serve.</p>
<table>
<tr>
<th> <th>highlight.js <th>SyntaxHighlighter <th>SHJS <th>Google Code Prettify
<tr>
<th>User markup in code snippets <td><b>yes</b> <td>no <sup>1)</sup> <td><b>yes</b> <td><b>yes</b>
<tr>
<th>Line numbers <td>no <td><b>yes</b> <td>no <td><b>yes</b>
<tr>
<th>Striped background <td>no <td><b>yes</b> <td>no <td><b>yes</b>
<tr>
<th>Replacing indenting TABs with spaces <td><b>yes</b> <td><b>yes</b> <td>no <td>no
<tr>
<th>Language detection <td><b>yes</b> <td>no <td>no <td><b>yes</b> <sup>2)</sup>
<tr>
<th>Multi-language code <td><b>yes</b> <td><b>yes</b> <sup>3)</sup> <td>no <td><b>yes</b>
<tr>
<th>Arbitrary HTML container for code<td><b>yes</b> <td>no <td>no <td>no
<tr>
<th>HTML5 compatibility <sup>4)</sup> <td><b>yes</b> <td>no <td>no <td>no
</table>
<p>Notes:</p>
<ol>
<li>
<p>SyntaxHighlighter doesn't support arbitrary markup but has two special features that cover some use-cases: turning URLs into links and highlighting lines of code that require attention.</p>
</li>
<li>
<p>Prettify doesn't actually do any <em>detection</em>. Instead it employs an interesting approach of generalized highlighting that works independent of language. Though this makes it more prone to errors than the heuristic detection mechanism found in highlight.js.</p>
</li>
<li>
<p>I wasn't able to configure SyntaxHighlighter to do this but I attribute it to my lack of persistence. It works fine on <a href="http://alexgorbatchev.com/SyntaxHighlighter/manual/demo/html-script.html">the demo page</a>. </p>
</li>
<li>
<p>Surely one couldn't expect being taken seriously these days without shoving trendy "HTML5" moniker <em>somewhere</em>! What it actually means here is that highlight.js automatically recognizes code snippets marked up according to <a href="http://dev.w3.org/html5/spec/Overview.html#the-code-element">HTML5 recommendation</a> with <code><pre><code class="language-something"> .. </code></pre></code>.</p>
</li>
</ol>
<h2>Test case</h2>
<p>The test page consists of code snippets using 7 popular languages: Python, Ruby, PHP, XML, HTML, CSS and Javascript. The "completely unfair" part of the article shows up here full-scale since those snippets come from highlight.js' own test suit! Anyway I think it was a good idea to use them because they were designed to be short and to exercise as many features of a language as possible. Here are four versions of the test case using <a href="http://softwaremaniacs.org/media/blog/highlighters/highlight.js/test.html">highlight.js</a>, <a href="http://softwaremaniacs.org/media/blog/highlighters/syntaxhighlighter/test.html">SyntaxHighlighter</a>, <a href="http://softwaremaniacs.org/media/blog/highlighters/shjs/test.html">SHJS</a> and <a href="http://softwaremaniacs.org/media/blog/highlighters/google-code-prettify/test.html">Google Code Prettify</a> in all their styled-by-default glory.</p>
<h2>Size</h2>
<p>All libraries have their way to include only required languages definitions on the page: simple linking to language files, on-demand loading, packing into a single file. Also all of them provide minified/packed production versions of files. Gzip compression wasn't used for no specific reason. The following table shows the overall size of all Javascript needed to highlight test snippets. </p>
<table>
<tr>
<th> <th>highlight.js <th>SyntaxHighlighter <th>SHJS <th>Google Code Prettify
<tr>
<th>Size (KB) <td>16.4 <td>34.6 <td>16.8 <td>19.2
</table>
<p>I didn't include CSS into calculation because it's not actually required: a site can define highlighting style within its main stylesheet.</p>
<h2>Speed</h2>
<p>To be honest modern browsers have made this test irrelevant. All highlighters are pretty fast to the point where highlighting is applied instantly. The only exception was SHJS that was configured to load language files on-demand which led in a couple of test runs to raw un-highlighted code being visible for a split-second. It doesn't tell anything bad about the speed of SHJS itself but rather shows that on-demand loading was a bad idea for the task.</p>
<p>I've measured the speed of highlighting using Firebug. It wasn't as straight-forward as counting size because there are more things to take into account here. After some tinkering I've decided on the following method:</p>
<ul>
<li>To represent the most common real-world case all files are loaded from cache but the browser still performs DNS lookups and establishes TCP connections for each file.</li>
<li>Total load time is defined by <code>DOMContentLoaded</code> event for highlight.js and by <code>onload</code> event for the rest. This may seem unfair but I just did what libraries suggest in their docs.</li>
<li>The time of highlighting itself is measured with Firebug's profiler. Since profiling affects performance this time cannot be simply added to the load time and should be considered separately.</li>
</ul>
<table>
<tr>
<th> <th>highlight.js <th>SyntaxHighlighter <th>SHJS <th>Google Code Prettify
<tr>
<th>Load time (msecs) <td>870 <td>1394 <td>1008 <td>1007
<tr>
<th>Highlighting time (msecs) <td>55 <td>67 <td>54 <td>72
</table>
<h2>Richness and correctness</h2>
<p>Here is where things get interesting. Size and speed turned out not to affect user experience significantly but the difference in richness and correctness is plainly visible. There won't be any numbers though, just some notes. </p>
<p>I should note that the notion of "correctness" differs from library to library. While there are plain bugs there are also missing features that could be left out deliberately. Here I tried to adhere to my personal views on the subject and you may well be in disagreement with me. That's fine!</p>
<p><strong>SyntaxHighlighter</strong> doesn't produce very rich highlighting to begin with. No Python decorators, no Javascript regexps, no CSS @-rules etc… Also it seems to being downright unable to highlight things that require more sophisticated parsing than a regular grammar, like names in function and class definitions. This is not bad by itself. The result still looks useful and leaves fewer places to screw up :-). But there are some issues with correctness anyway: </p>
<ul>
<li>no multi-line strings in PHP</li>
<li>value-less attributes in HTML tags aren't recognized</li>
<li>within CSS @-rules seemingly random words are recognized as "values" (whatever it could mean)</li>
</ul>
<p class=center>
<a href="http://softwaremaniacs.org/media/blog/highlighters/syntaxhighlighter/test.html"><img src="/media/blog/highlighters/sh.png"></a><br>
<small>Not much is highlighted in Javascript.</small>
</p>
<p><strong>SHJS</strong> was looking promising since it uses language definitions from the GNU source-highlight project and I thought <em>those</em> guys would do their job rather meticulously. But in practice it mishandled highlighting the most of all others:</p>
<ul>
<li>names of old-style classes in Python aren't highlighted (those in new-style classes do)</li>
<li>class inheritance in Ruby badly breaks the whole line</li>
<li><code>#{}</code> constructs in Ruby strings aren't recognized</li>
<li>PHP <code>throw</code> keyword is not highlighted</li>
<li>tags are highlighted inside CDATA-escaped sections in XML</li>
<li>unquoted attribute values in HTML tags aren't recognized</li>
<li>@-rules in CSS break the whole highlighting flow</li>
<li>"$" isn't considered part of identifiers in Javascript</li>
</ul>
<p class=center>
<a href="http://softwaremaniacs.org/media/blog/highlighters/shjs/test.html"><img src="/media/blog/highlighters/shjs.png"></a><br>
<small>Class inheritance (<code>A < B</code>) in Ruby breaks the whole line.</small>
</p>
<p><strong>Google Code Prettify</strong> works very well both in terms of richness and correctness. It can highlight CSS and Javascript within HTML, recognizes Python decorators, Javascript regexps. Speaking of the latter, it was Prettify where I borrowed ideas on how to implement those in highlight.js.</p>
<p>I've found very few issues with it:</p>
<ul>
<li>tags highlighted inside CDATA-escaped sections in XML</li>
<li><code>@font-face</code> in CSS is not recognized as @-rule</li>
<li>Ruby highlighting is also simplistic but doesn't cause such severe problems as in SHJS</li>
</ul>
<p class=center>
<a href="http://softwaremaniacs.org/media/blog/highlighters/google-code-prettify/test.html"><img src="/media/blog/highlighters/gcp.png"></a><br>
<small>That <code><not></code> inside CDATA shouldn't be highlighted as tag.</small>
</p>
<p>As for <strong>highlight.js</strong>, it's pushed down to the end of the comparison for a reason :-). Obviously there won't be any correctness issues since I used code snippets from its own test suit which it successfully passes. Of course it doesn't in any way mean it's bug-free. But where the library really stands out is highlighting richness. It just knows much more about languages than others. Here are just those features visible only in this very test case that are unique to highlight.js:</p>
<ul>
<li>raw Python strings</li>
<li>Ruby inheritance, <code>#{}</code> things, quoted symbols, symbolic function names etc.</li>
<li><a href="http://yardoc.org/">yardoc</a> in Ruby comments</li>
<li>phpdoc in PHP comments</li>
<li>classes, ids, tags and attributes in CSS selectors</li>
</ul>
<p>Some of the recognized features (like variables in PHP) are deliberately not styled to maintain visual sanity. Most of these features (and those in other languages) are the result of elaborate effort of many <a href="https://github.com/isagalaev/highlight.js/blob/master/AUTHORS.en.txt">highlight.js contributors</a> in defining most intricate parsing rules (just look at <a href="https://github.com/isagalaev/highlight.js/blob/master/src/languages/perl.js">Perl definition</a> for example).</p>
<p class=center>
<a href="http://softwaremaniacs.org/media/blog/highlighters/highlight.js/test.html"><img src="/media/blog/highlighters/hljs.png"></a><br>
<small>HTML with emedded Javascript and CSS. All sorts of ways to define tag attributes are supported.</small>
</p>
<h2>Completely balanced conclusion</h2>
<p>If you need a solid syntax highlighter (and don't care about line numbers or striped backgrounds) use <a href="http://softwaremaniacs.org/soft/highlight/en/">highlight.js</a>. It is small, fast, rich and correct!</p>
<p>And if you don't like something about it — <a href="http://softwaremaniacs.org/wiki/doku.php/highlight.js:highlight.js">contribute</a>!highlight.js 6.0 beta
2011-04-26T02:31:29.684000-07:00https://softwaremaniacs.org/blog/2011/04/24/highlight-js-60-beta/В порыве борьбы с прокрастинацией занялся задачкой, которую давно откладывал — рефакторингом определений языков в highlight.js в новый синтаксис. Да так удачно занялся, что решил заодно и другие мелкие задачки, которые планировал на версию 6.0. И вот без лишних слов представляю бету новой большой версии и прошу её потестировать. Ссылки ...
<p>В порыве борьбы с прокрастинацией занялся задачкой, которую давно откладывал — рефакторингом определений языков в <a href="http://softwaremaniacs.org/soft/highlight/">highlight.js</a> в новый синтаксис. Да так удачно занялся, что решил заодно и другие мелкие задачки, которые планировал на версию 6.0. И вот без лишних слов представляю бету новой большой версии и прошу её потестировать.</p>
<p><a name=more></a></p>
<h2>Ссылки</h2>
<p>К тестированию предлагаются:</p>
<ul>
<li><a href="https://github.com/isagalaev/highlight.js">Проект на GitHub</a>. Исходники, тулзы, тесты.</li>
<li><a href="https://github.com/downloads/isagalaev/highlight.js/highlight.full.pack.js">Полная упакованная библиотека</a>. 90 КБ, все языки.</li>
<li><a href="https://github.com/downloads/isagalaev/highlight.js/highlight.common.pack.js">Упакованная версия с 12 популярными языками</a>. 26 КБ, содержит HTML/XML, Javascript, CSS, PHP, Ruby, Perl, Python, C++, C#, Java, SQL, Bash.</li>
<li><a href="https://github.com/downloads/isagalaev/highlight.js/styles.zip">Архив стилей</a>. Для удобства запаковал отдельно.</li>
</ul>
<p>Ставьте к себе на сайты, ловите баги, пишите в <a href="https://groups.google.com/forum/#!forum/highlightjs">рассылку</a> или в <a href="https://github.com/isagalaev/highlight.js/issues">баг-трекер</a>.</p>
<h2>Синтаксис</h2>
<p>Главная новость этой версии касается не пользователей библиотеки, а разработчиков. Синтаксис определения языков стал проще структурно, умолчания стали более логичными и пропали некоторые атрибуты, нужные раньше для обработки краевых исключительных случаев. Вот упрощённый пример для наглядности.</p>
<p>Было:</p>
<pre><code class=javascript>defaultMode: {
contains: ['string'],
modes: [
{
className: 'string',
begin: '"', end: '"',
contains: ['escape']
},
{
className: 'escape', noMarkup: true,
begin: '\\\\.', end: hljs.IMMEDIATE_RE
}
]
}</code></pre>
<p>Стало:</p>
<pre><code class=javascript>defaultMode: {
contains: [
{
className: 'string',
begin: '"', end: '"',
contains: [{begin: '\\\\.'}]
}
]
}</code></pre>
<p>Поменялось вот что:</p>
<ul>
<li>определения режимов <code>modes</code> и их вложенности <code>contains</code> слились в одну структуру</li>
<li><code>hljs.IMMEDIATE_RE</code> стал дефолтным значением для регулярок</li>
<li>вместо указания <code>className</code> одновременно с <code>noMarkup</code> стало можно не указывать <code>className</code></li>
</ul>
<p>По большей части код стал более красивым и читаемым, хотя и не без изъянов: прямо сейчас определение Руби насчитывает <em>десять</em> переменных для строк, которые <a href="https://github.com/isagalaev/highlight.js/blob/2fb1afe66072f7cd8df8aa8bc7edac0ec15ceebd/src/languages/ruby.js#L131">таскаются хвостом по всему файлу</a> :-).</p>
<p>Конвертация всех языков в новый синтаксис была самой долгой и нудной задачей, и именно из-за этого я решил выложить новую версию сначала в виде беты — не верю, чтобы ничего не сломалось, даже несмотря на то, что внутренние тесты проходят. Пользуясь случаем, хочу сказать отдельное спасибо <a href="https://github.com/vhbit">Валерию Хиоре</a> за <a href="https://github.com/isagalaev/highlight.js/commit/bce993c4de5a160d26a46452a1f25f17d8c4fbfb">конвертацию своего определения Objective C</a>!</p>
<h2>Тулзы</h2>
<p>Точнее, теперь — "тулза". Два скрипта, которые паковали и собирали языки в финальную сборку, <a href="https://groups.google.com/d/topic/highlightjs/FjzV5fMVfyI/discussion">стали одним</a>, которым стало удобней пользоваться, в том числе и при отладке.</p>
<h2>Языки</h2>
<p>В этой версии 4 новых языка:</p>
<ul>
<li>Haskell авторства <a href="https://github.com/sourrust">Джереми Халла</a></li>
<li>Erlang в двух видах — модуль и REPL — коллективного авторства <a href="http://desh.su/">Николая Захарова</a>, <a href="https://github.com/arhibot">Дмитрия Ковеги</a> и <a href="https://github.com/ignatov">Сергея Игнатова</a></li>
<li>Objective C от <a href="https://github.com/vhbit">Валерий Хиоры</a></li>
<li>Vala от <a href="https://github.com/antono">Антоно Васильева</a></li>
</ul>
<p>Общее количество языков таким образом достигло 40!</p>
<p>Кроме того, два старых языка — HTML и CSS — подверглись радикальному изменению. Я решил, что два отдельных определения HTML и XML не имеют смысла и объединил их в одно. А заодно выкинул длинные списки ключевых слов из HTML и CSS, потому что синтаксис обоих языков задуман расширяемым и не зависит от конкретных ключевых слов. Теперь названия тегов и атрибутов раскрашиваются всегда, даже если они нестандартные.</p>
<p>Самое приятное, что выкидывание ключевых слов вместе с переходом на новый синтаксис позволило новой версии библиотеки быть <em>меньше</em>, даже с учётом четырёх совершенно новых языков!</p>
<h2>Инфраструктура</h2>
<p>Переезд на <a href="https://github.com/isagalaev/highlight.js">GitHub</a> себя вполне оправдал: появились <a href="https://github.com/isagalaev/highlight.js/contributors">новые контрибьюторы</a>! Причём, как хостинг кода, он настолько хорош, что даже скрашивает мне переезд на git, как на новую для меня VCS.</p>
<p>А вот с <a href="https://groups.google.com/forum/#!forum/highlightjs">группой для обсуждений</a> всё сложнее. По большей части там тихо, а те обсуждения, которые велись, вполне могли бы вестись и в частной переписке. Если подумать, то это и не удивительно, потому что автор у ядра хайлайтера с самого начала был один, оно пережило несколько переписываний, и сейчас, наверное, кроме меня, ни один человек этого кода хорошо не знает. Тем не менее, я не думаю, что от группы надо отказываться, потому что каши она не просит, и лучше, если она есть и не нужна, чем вдруг понадобилась — а нету.</p>
<h2>Что дальше</h2>
<p>План простой и очевидный: я хочу подождать неделю-другую сообщений о багах, починить их (а ещё лучше — просто вмёрджить патчи от самих репортеров) и выпустить финальную версию.</p>
<p>Ещё, как я <a href="https://twitter.com/#!/isagalaev/statuses/56073798759350272">вскользь упоминал в Твиттере</a>, мне очень хочется получить стили, основанные на палитре <a href="http://ethanschoonover.com/solarized">Solarized</a>. Сам я за это вряд ли возьмусь, поэтому просто ещё раз протранслирую здесь эту просьбу. Если вам нравится хайлайтер и вы любите внимание к мелочам, ваш вклад будет очень ценен сообществу!highlight.js открывается
2011-01-03T00:21:43.808000-08:00https://softwaremaniacs.org/blog/2011/01/02/highlight-js-opens-up/Хотя код highlight.js всегда был открыт, библиотека никогда не была в полном смысле слова проектом. Не было общего места общения разработчиков, wiki с документацией и баг-тракинга. Вместо этого я просто принимал по почте новые языки, патчи и отвечал на вопросы. Причём часто делал это очень медленно. Несмотря на это, хайлайтер ...
<p>Хотя код <a href="http://softwaremaniacs.org/soft/highlight/">highlight.js</a> всегда был открыт, библиотека никогда не была в полном смысле слова проектом. Не было общего места общения разработчиков, wiki с документацией и баг-тракинга. Вместо этого я просто принимал по почте новые языки, патчи и отвечал на вопросы. Причём часто делал это очень медленно. Несмотря на это, хайлайтер умудрился стать самым большим из моих проектов, если считать по количеству контрибьюторов!</p>
<p>И вот я, наконец, решил перестать мешать ему развиваться и сделал из него нормальный проект.</p>
<p><a name=more></a></p>
<p>Основные вещи: </p>
<ul>
<li><a href="http://softwaremaniacs.org/wiki/doku.php/highlight.js:highlight.js">Разработческая документация</a> в публичной wiki</li>
<li><a href="https://github.com/isagalaev/highlight.js">Код на GitHub</a></li>
<li><a href="https://groups.google.com/group/highlightjs">Гуглогруппа</a> для обсуждений разработки</li>
</ul>
<p>Хотя git я не шибко люблю в сравнении с bzr, код я таки переложил на GitHub — просто уступив общественному мнению. Из этого неявно исходит, что моя долгосрочная цель в том, чтобы перестать писать код в этом проекте, а сплавить эту задачу заинтересованном сообществу разработчиков. Буду сидеть, аки царь, и только патчи вливать :-).</p>
<p>Wiki открыта сейчас всем, и я уже страдаю от периодического спама. Если не удастся его эффективно побороть, придётся, видимо, ввести какую-нибудь регистрацию.</p>
<p>Последняя нерешённая проблема — где вести баг-тракинг. По этому поводу я стартовал <a href="https://groups.google.com/d/topic/highlightjs/f3EAxhD5Je8/discussion">дискуссию в группе</a>. Язык группы — английский.</p>
<p>Вливайтесь!Хостинг для highlight.js
2010-09-27T16:05:37.959000-07:00https://softwaremaniacs.org/blog/2010/09/27/hosted-highlight-js/Теперь highlight.js хостится на Яндексе, и его не обязательно скачивать, можно просто линковать напрямую с yandex.st. Этот архив, правда, содержит не все языки, потому что тогда бы он был неприлично большой. Поэтому я выбрал языки, которые чаще всего скачивались, и взял столько их, чтобы итоговый архив не превышал 30К. В ...
<p>Теперь <a href="http://softwaremaniacs.org/soft/highlight/download/">highlight.js</a> <a href="http://api.yandex.ru/jslibs/">хостится на Яндексе</a>, и его не обязательно скачивать, можно просто линковать напрямую с yandex.st. Этот архив, правда, содержит не все языки, потому что тогда бы он был неприлично большой. Поэтому я выбрал языки, которые чаще всего скачивались, и взял столько их, чтобы итоговый архив не превышал 30К. В итоге в финал попали: HTML/XML, Javascript, CSS, PHP, Ruby, Perl, Python, C++, C#, Java, SQL, Bash (да, Bash!).</p>
<p>А ещё там же хостятся ещё и стилевые темки, к которым тоже напрямую можно линковаться. Как это делать, <a href="http://softwaremaniacs.org/soft/highlight/download/">описано в инструкции</a>, повторяться не буду.</p>
<p><a name=more></a></p>
<p>Надеюсь, что это поможет хайлайтеру распростаниться на блогохостингах вроде <a href="http://www.blogger.com/">blogspot.com</a>, где у народа вечные проблемы, куда бы файл положить. Да и в принципе рекомендую всем, кому хватает языков, перейти на хостенную версию, чтобы эффективней использовать браузерный кеш ваших пользователей.</p>
<p>P.S. Меня где-то в твиттерах спрашивали, почему не <a href="http://code.google.com/apis/libraries/">Google</a>. Всё очень просто — я не был уверен, что меня там захотят захостить и, должен признаться, не сразу нашёл на странице, кого куда спрашивать. А с коллегами из собственной компании мне было <a href="http://clubs.ya.ru/jslibs/replies.xml?item_no=94">поговорить</a>, конечно, проще :-). Спасибо!highlight.js 5.9
2010-06-27T15:23:31.564000-07:00https://softwaremaniacs.org/blog/2010/06/17/highlight-js-59/Лучше поздно, чем никогда. Прошедшей ночью я наконец выложил очередную версию highlight.js, в которой появилось много приятных добавлений, некоторые из которых были готовы ещё полгода назад. Новые языки Андрей Фёдоров описал язык Lua давний контрибьютор хайлайтера Пётр Леонов описал язык конфигурации Nginx Владимир Москва описал TeX Таким образом, сейчас highlight.js ...
<p>Лучше поздно, чем никогда. Прошедшей ночью я наконец выложил очередную версию <a href="http://softwaremaniacs.org/soft/highlight/">highlight.js</a>, в которой появилось много приятных добавлений, некоторые из которых были готовы ещё полгода назад.</p>
<p><a name=more></a></p>
<h2>Новые языки</h2>
<ul>
<li>Андрей Фёдоров описал язык Lua</li>
<li>давний контрибьютор хайлайтера <a href="http://kung-fu-tzu.ru/">Пётр Леонов</a> описал язык конфигурации Nginx</li>
<li><a href="http://fulc.ru/">Владимир Москва</a> описал TeX</li>
</ul>
<p>Таким образом, сейчас highlight.js поддерживает ни много, ни мало, <a href="http://softwaremaniacs.org/media/soft/highlight/test.html">32 языка</a>!</p>
<h2>Фиксы к существующим языкам</h2>
<p>Существующие языки тоже получили некоторое количество багфиксов и улучшений. Подробно про них можно прочитать в <a href="http://bazaar.launchpad.net/~isagalaev/+junk/highlight/changes">базарном логе</a>, но про два языка стоит сказать отдельно.</p>
<p><a href="http://gnuu.org/">Лорен Сегал</a> довольно глубоко переработал описание Руби и добавил туда раскраску инлайновой документации <a href="http://yardoc.org/">YARD</a> (собственного изобретения). По его совету у режимов языка появился новый атрибут — <code>displayClassName</code>, который подставляется в генерируемую разметку вместо <code>className</code>. Необходимость в этом возникла, когда он вынес определение заголовков функций в отдельный режим, который пришлось назвать по-другому — "ftitle" вместо "title". Чтобы не сломать наложение стилей, можно было поступить двояко: либо во всех стилевых файлах рутинно добавлять ко всем селекторам <code>.title</code> ещё и <code>.ftitle</code>, либо сделать так, чтобы новый режим остался в раскладке как "title". Так и порешили.</p>
<p>Вторая переделка коснулась описания SQL, из-за чего я как раз и засиделся ночью допоздна :-). SQL всегда был очень "жадным" языком с точки зрения автоопределения. Например вот такой невинный фрагмент на Питоне определялся как SQL:</p>
<pre><code>from django.utils import translation
translation.activate(language)
</code></pre>
<p>Здесь нужен небольшой экскурс в архитектуру хайлайтера.</p>
<p>Поскольку хайлайтеру не нужно исполнять программу на языке, а только его расцвечивать, синтаксис языков не описывается полностью. С точки зрения хайлайтера язык — это некая большая масса текста с вкраплениями в него специальных конструкций вроде строк и комментариев, которые в хайлайтере называются "режимами". Режимы определяют, какие ключевые слова в них могут встречаться. У большинства языков все ключевые слова определены в самом базовом режиме, обозначающем ту самую большую массу текста за пределами строк, комментариев и прочих специальных вещей.</p>
<p>SQL — как раз такой. Проблема с ним в том, что у него ключевых слов <em>много</em> — 217 штук (у Питона, например, всего 37). Это приводит к такому эффекту, что ключевые слова SQL легко встретить в других программах, где эти слова используются как имена переменных. В приведённом выше фрагменте к ключевым словам SQL относятся например "from", "translation" и "language". В то время, как от Питона там только "from" и "import". 4 вхождения против 2 — вот вам и SQL.</p>
<p>Чтобы это побеждать, я применяю всё время один и тот же способ. Представление языка меняется с "кучи свободного текста с вкраплениями режимов" на "весь язык состоит только из конкретных режимов". Надо только придумать, как это сделать малой кровью. Поразмыслив, я подумал, что SQL состоит по-крупному только из двух вещей: комментариев и SQL-операторов, и уже операторы содержат различные ключевые слова. Операторы же начинаются с небольшого количества зарезервированных слов ("select", "insert", "alter" и т.д. — всего 22 штуки), а заканчиваются точкой с запятой или концом файла. </p>
<p>В таком виде при попытке распарсить фрагмент выше в нём не находится ни одного SQL-оператора, потому что там нет слов, с которых они начинаются. Так что, с этой версии хайлайтер должен определять SQL только там, где это действительно он. С поправкой на баги :-).</p>
<h2>Библиотечность</h2>
<p>Я немного порефакторил код инициализации хайлайтера, и теперь его стало удобней использовать как библиотеку, а не только как самостоятельное приложение. Раньше он настаивал на том, чтобы быть проинициализированным вызовом <code>initHighlightingOnLoad</code>, которая сама вешалась на событие загрузки страницы, после чего искала блоки кода и раскрашивала их. У этого подхода было несколько неудобств:</p>
<ul>
<li>постоянно возникал вопрос "как подсветить код, подтянутый потом через ajax"</li>
<li>блоки кода узнавались только в виде <code><pre><code>..</code></pre></code> (хотя это, по правде говоря, фича, потому что это рекомендует <a href="http://dev.w3.org/html5/spec/Overview.html#dfnReturnLink-10">HTML5</a>)</li>
<li>момент инциализации был непредсказуем, если на странице делалась другая инициализация средствами какого-нибудь js-фреймворка</li>
</ul>
<p>Теперь всё это можно контролировать. Например с использованием <a href="http://jquery.com/">jQuery</a> хайлайтинг может выглядеть так:</p>
<pre><code>$(document).ready(function() {
$('div.pre').each(function(i, e) {hljs.highlightBlock(e, ' ')});
});
</code></pre>
<p>В функцию <code>highlightBlock</code> передаётся DOM-элемент с текстом кода, а вторым опциональным параметром — замена для символов табуляции. Эту же функцию можно использовать в любой момент для раскраски кода в любой момент жизни страницы.</p>
<h2>Плагин WordPress</h2>
<p>В этой версии я отказался от поддержки плагина к <a href="http://wordpress.org/">WordPress</a>, по той простой причине, что мой блог теперь работает на собственном движке, и этот плагин мне даже отладить не на чем. Будет очень хорошо, если кто-нибудь его поддержку подхватит, <a href="http://bazaar.launchpad.net/~isagalaev/+junk/highlight/annotate/342/src/wp_highlight.js.php">старый код плагина</a> по прежнему доступен в Базаре, и по идее, должен работать прямо в таком виде.</p>
<p>Правда, про этот плагин мне писали, что он возможно страдает уязвимостью безопасности, так как <a href="http://codex.wordpress.org/WordPress_Nonces">не проверяет nonce'ы</a>. У меня не хватило здоровья разобраться в том, что это, реально ли есть дыра, и как её починить.highlight.js в IE: прошу помощи
2010-06-03T00:36:58.854000-07:00https://softwaremaniacs.org/blog/2010/06/03/highlight-js-help-wanted/А есть ли среди моих читателей специалисты по отладке javascript'а в IE? Народ у меня на форуме заметил ошибку с тем, что IE где-то падает, когда highlight.js пытается раскрасить текст с тегами. Остальные браузеры работают. И поскольку у меня навыков отладки в IE нет почти никаких, вот — прошу помощи.
<p>А есть ли среди моих читателей специалисты по отладке javascript'а в IE? Народ у меня на форуме <a href="http://softwaremaniacs.org/forum/highlightjs/22226/">заметил ошибку</a> с тем, что IE где-то падает, когда <a href="http://softwaremaniacs.org/soft/highlight/">highlight.js</a> пытается раскрасить текст с тегами. Остальные браузеры работают. И поскольку у меня навыков отладки в IE нет почти никаких, вот — прошу помощи.Слияние DOM-деревьев на Javascript'е
2012-11-03T21:16:00.970000-07:00https://softwaremaniacs.org/blog/2009/08/24/merging-dom-trees/Вчера полдня реализовывал фичу для новой версии highlight.js: слияние пользовательской и подсвеченной разметки кода. В процессе написания у меня родилась довольно общая функция слияния DOM-деревьев, которой хочется поделиться. Мне, вообще-то, кажется, что это уже где-то есть написанное, но вчера моё Google-fu меня подвело. Поэтому я отчасти надеюсь, что мне кто-нибудь ...
<p>Вчера полдня реализовывал фичу для новой версии <a href="http://softwaremaniacs.org/soft/highlight/">highlight.js</a>: слияние пользовательской и подсвеченной разметки кода. В процессе написания у меня родилась довольно общая функция слияния DOM-деревьев, которой хочется поделиться.</p>
<p>Мне, вообще-то, кажется, что это уже где-то есть написанное, но вчера моё Google-fu меня подвело. Поэтому я отчасти надеюсь, что мне кто-нибудь покажет, как это правильно делается.</p>
<p><strong>Внимание!</strong> Это загрузочный пост для суровых javascript'овых кодеров :-).</p>
<p><a name=more></a></p>
<h2>История</h2>
<p>Исторически highlight.js отказывался подсвечивать фрагмент кода, в котором уже есть пользовательская разметка. Делалось это по двум причинам:</p>
<ul>
<li><p>Пользовательское форматирование может конфликтовать с расцветкой. Например пользователь захотел что-то выделить в коде жирным, а расцветка взяла, и выделила жирным все keyword'ы.</li>
<li><p>Это в общем случае ни фига не просто. Поскольку пользовательская разметка может быть какой угодно, то может оказаться, что она наложится на расцветку совсем не по правилам вложенности. Вот пример, где посреди CSS'а вставлен какой-нибудь иллюстративный <code><del></code>, начинающийся в середине цифры, которая при расцветке обернётся в <code><span class="value"></code>:</p>
<pre><code>.div {
width: 5<del>00px; margin-left: </del>20px;
}</code></pre></li>
</ul>
<p>Однако где-то в феврале <a href="http://dolzhenko.blogspot.com/">Владимир Долженко</a> <a href="http://softwaremaniacs.org/forum/highlightjs/6612/">убедил меня</a>, что обе проблемы имеют решение. Первая решается тем, что у пользователя уже есть средство отключить подсветку классом "no-highlight", поэтому разрешая подсветку по умолчанию, мы никого свободы не лишаем. Про вторую проблему он прислал патч, который так до вчерашнего дня у меня в почте и лежал. Сам патч, в итоге, я так и не взял, потому что он довольно сильно вмешивался в общую механику парсинга, и я, говоря по-простому, так и не смог его уместить в голове, код получался слишком сложный. Поэтому вчера я по мотивам этой идеи написал отдельную постобработку.</p>
<h2>Слияние деревьев</h2>
<p>Итак нам даны два произвольных дерева HTML-элементов, обрамляющих куски одного и того же текста. Нужно слить их в одно дерево таким образом, чтобы в нём соблюдалась правильная вложенность элементов. Единственный известный мне способ этого добиться на границах, нарушающих правильную вложенность — это закрывать и заново переоткрывать "вылезающие" элементы:</p>
<p class="center picture"><img src="/media/blog/merge-dom-trees.png"></p>
<p>Чтобы это реализовать, я сначала представил деревья в виде плоской "ленты событий":</p>
<ul>
<li>символ 5, начался элемент <code><p></code></li>
<li>символ 10, начался элемент <code><em></code></li>
<li>символ 12, кончился элемент <code><em></code></li>
<li>символ 25, кончился элемент <code><p></code></li>
</ul>
<p>Внутри одного DOM-дерева гарантируется правильная вложенность (оно, в конце-концов, дерево!). Делается это вот таким кодом:</p>
<pre><code>function nodeStream(node) {
// рекурсивная функция обхода childNodes
// складывает результаты в общий массив result
function _(node, result, offset) {
for (var i = 0; i < node.childNodes.length; i++) {
// текстовые node'ы только увеличивают смещение
if (node.childNodes[i].nodeType == 3)
offset += node.childNodes[i].nodeValue.length;
else if (node.childNodes[i].nodeName == 'BR')
offset += 1
// все остальные ноды добавляются в список
else {
result.push({
event: 'start',
offset: offset,
node: node.childNodes[i]
});
offset = _(node.childNodes[i], result, offset)
result.push({
event: 'stop',
offset: offset,
node: node.childNodes[i]
});
}
}
return offset;
}
var result = []
_(node, result, 0)
return result;
}
</code></pre>
<p>Помимо этой функции, которая выдирает из node <em>структуру</em>, нужна ещё аналогичная, которая бы выдирала из node только его <em>текст</em>. Она ещё проще:</p>
<pre><code>function nodeText(node) {
var result = '';
for (var i = 0; i < node.childNodes.length; i++)
if (node.childNodes[i].nodeType == 3)
result += node.childNodes[i].nodeValue;
else if (node.childNodes[i].nodeName == 'BR')
result += '\n';
else
result += nodeText(node.childNodes[i]);
return result;
}
</code></pre>
<p>Дальше эти потоки объединяются вместе примерно по такому алгоритму:</p>
<ol>
<li><p>Из двух потоков выбрать событие, которое случилось раньше по offset'у. При одинаковом offset'е более ранними считаются закрывающие теги, потому что выгоднее от них быстрее избавляться.</li>
<li><p>Открывающися тег вписывается в результирующую строку и складывается в стек "наверх".</li>
<li><p>Закрывающий тег:</p>
<ol>
<li><p>Закрывает все теги сверху стека, пока не найдёт свой парный открывающий. При правильной вложенности он сам гарантированно был бы наверху стека, но у нас это не так.</li>
<li><p>Убирает себя из стека.</li>
<li><p>Идёт обратно наверх стека по пути переоткрывая все теги, которые были в него вложены.</li>
</ol></li>
<li><p>Повторять, пока не кончатся оба потока.</li>
</ol>
<p>И где-то там в середине этого ещё куски исходной строчки добавляются между тегами.</p>
<p>Вот код:</p>
<pre><code>// получает два потока и строку чистого текста
function mergeStreams(stream1, stream2, value) {
var processed = 0; // счётчик пройденных символов
var result = ''; // результирующая строчка
var nodeStack = []; // стек вложенности тегов
// вспомогательная функция выбора потока с более ранним событием
function selectStream() {
if (stream1.length && stream2.length) {
// оба потока не пустые
if (stream1[0].offset != stream2[0].offset)
// если смещения не равны, выбрать меньшее
return (stream1[0].offset < stream2[0].offset) ? stream1 : stream2;
else
// если равны, выбирается закрывающий
return (stream1[0].event == 'start' && stream2[0].event == 'stop') ? stream2 : stream1;
} else {
// если один из потоков пустой, возвращается оставшийся
return stream1.length ? stream1 : stream2;
}
}
// вспомогательная функция формирования открывающего тега для node
function open(node) {
var result = '<' + node.nodeName.toLowerCase();
for (var i = 0; i < node.attributes.length; i++) {
result += ' ' + node.attributes[i].nodeName.toLowerCase() + '="' + escape(node.attributes[i].nodeValue) + '"';
}
return result + '>';
}
// вспомогательная функция формирования закрывающего тега для node
function close(node) {
return '</' + node.nodeName.toLowerCase() + '>';
}
// основной цикл, пока хотя бы один поток не пуст
while (stream1.length || stream2.length) {
// выбирается ближайшее теговое событие, и всё до него добавляется в результат
var current = selectStream().splice(0, 1)[0];
result += escape(value.substr(processed, current.offset - processed));
processed = current.offset;
if ( current.event == 'start') {
// открывающий тег добавляется в результат, элемент попадает в стек
result += open(current.node);
nodeStack.push(current.node);
} else if (current.event == 'stop') {
// закрывающий тег бежит по стеку, закрывая теги, пока не найдёт себя
var i = nodeStack.length;
do {
i--;
var node = nodeStack[i];
result += close(node);
} while (node != current.node);
// найдя, удаляет себя из стека и бежит обратно переоткрывая закрытые теги
nodeStack.splice(i, 1);
while (i < nodeStack.length) {
result += open(nodeStack[i]);
i++;
}
}
}
result += value.substr(processed);
return result;
}
</code></pre>
<p>Вот, собственно, и всё. Буду рад комментариям!