Маниакальный веблог » Языки программированияhttps://softwaremaniacs.org/blog/category/languages/2023-05-09T10:00:44.797047-07:00ManiacИван Сагалаев о программировании и веб-разработкеhttp://softwaremaniacs.org/media/sm_org/style/photo.jpgTrie in Python
2023-05-09T09:54:46.475589-07:00https://softwaremaniacs.org/blog/2022/11/10/python-trie/A post about Haskell vs. Python readability came onto my radar the other day. It compares two implementations of a trie structure, and after looking upon the Python version I wanted to make my own attempt. I didn't make it to necessarily compare or "battle" against the other solutions, it's ...
<p>A post about <a href="https://mazzo.li/posts/haskell-readability.html">Haskell vs. Python readability</a> came onto my radar the other day. It compares two implementations of a <a href="https://en.wikipedia.org/wiki/Trie">trie structure</a>, and after looking upon the Python version I wanted to make my own attempt. I didn't make it to necessarily compare or "battle" against the other solutions, it's more of an exercise in the vein of "how would I do it".</p>
<p><a name=more></a></p>
<h2>The code</h2>
<p>Here's the original code (for easier lookup, as I refer to a few things in it in the notes):</p>
<pre><code>class Trie(object):
def __init__(self, value=None):
self.children = {}
self.value = value
self.flag = False # Flag to represent that a word ends at this node
def add(self, char):
val = self.value + char if self.value else char
self.children[char] = Trie(val)
def insert(self, word):
node = self
for char in word:
if char not in node.children:
node.add(char)
node = node.children[char]
node.flag = True
def find(self, word):
node = self
for char in word:
if char not in node.children:
return None
node = node.children[char]
return node.value
def all_prefixes(self, wlist):
results = set()
if self.flag:
results.add(self.value)
if not self.children: return results
return reduce(lambda a, b: a | b,
[node.all_prefixes() for
node in self.children.values()]) | results
def autocomplete(self, prefix):
node = self
for char in prefix:
if char not in node.children:
return set()
node = node.children[char]
return node.all_prefixes()
</code></pre>
<p>My code:</p>
<pre><code>class Trie:
def __init__(self):
self.children = {}
self.is_word_end = False
def insert(self, word):
for char in word:
self = self.children.setdefault(char, Trie())
self.is_word_end = True
def words_with(self, prefix):
if self.is_word_end:
yield prefix
for char, node in self.children.items():
yield from node.words_with(prefix + char)
def autocomplete(self, prefix):
try:
for char in prefix:
self = self.children[char]
return list(self.words_with(prefix))
except KeyError:
return []
</code></pre>
<p>A few notes:</p>
<ul>
<li>Not storing <code>self.value</code> does seem to reduce complexity, perhaps counter-intuitively.</li>
<li>The oft neglected <code>dict.setdefault</code> allowed me to inline the entire <code>Trie.add</code>.</li>
<li>Another Pythonism, <code>yield</code> and <code>yield from</code>, is a nice pattern for recursive tree walking that would otherwise require temporary containers. It also usually results in tighter code.</li>
<li>I attempted a couple more experiments with making the code more functional, like using <code>reduce</code> and recursions instead of for-loops, but it didn't improve things really. Python is not a functional language in its soul :-)</li>
<li>The idea of re-binding <code>self</code> while tree-walking may scare some people, but I thought doing <code>node = self</code> just to avoid this was a bit silly :-)</li>
</ul>
<p>P.S. Please don't talk to me about types and dataclasses, I will ridicule you to no end :-)nfp
2022-01-09T13:13:40.753060-08:00https://softwaremaniacs.org/blog/2021/10/11/nfp/So what happened was, I fed up manually uploading pictures I export from Darktable to my Flickr photo stream using the browser's file picker. So I decided to do something about it. The initial idea was to craft a FUSE file system which would automatically upload new files, but this ...
<p>So what happened was, I fed up manually uploading pictures I export from <a href="https://www.darktable.org/">Darktable</a> to my <a href="https://www.flickr.com/photos/isagalaev/">Flickr photo stream</a> using the browser's file picker. So I decided to do something about it. The initial idea was to craft a FUSE file system which would automatically upload new files, but this turned out to be hard, so I switched to a much simpler solution: a little <a href="https://www.man7.org/linux/man-pages/man7/inotify.7.html">inotify</a> watcher handing over new files to an upload script. I managed to code up a working solution over a weekend!</p>
<p>More interestingly, I made the watcher part — "nfp", for "New File Processor" — as a generic configurable tool which I <a href="https://nest.pijul.com/isagalaev/nfp">published</a>. It was only when I started writing this very blog post that I stumbled upon a standard Linux tool that does it, <a href="https://www.man7.org/linux/man-pages/man1/inotifywait.1.html">inotifywait</a> :-)</p>
<p>Still, I hope there's something to be salvaged from this project. Read on!</p>
<p><a name=more></a></p>
<h2>Darktable</h2>
<p>Darktable is my tool of choice for working with camera RAWs, and I just want to take a moment to share my appreciation for the folks making it. It's a ridiculously advanced, polished photo processor. A real testament to open-source software.</p>
<p>It actually used to have a Flickr export plugin, but it hasn't been working for a while, and got dropped in recent versions. Which is totally fair because it's very much out of scope for a photo editing software. Having a generic solution like <code>nfp</code> makes much more sense because it can connect arbitrary file producers and consumers. It doesn't even have to be about images.</p>
<h2>Rust</h2>
<p>Since "inotify" sounds very "systems" and "core", I immediately took it as an opportunity to play with <a href="https://www.rust-lang.org/">Rust</a> once more. That was the main reason. A nice side effect of it is that it builds into a small self-contained binary which you can bring with you anywhere. As long as it's Linux, anyway :-)</p>
<p>If I had to mention a single gripe with the language during this last foray, that would be implementing an ordered type with the <code>PartialEq</code>/<code>Eq</code>/<code>PartialOrd</code>/<code>Ord</code> trait family. This just feels unnecessarily hard. I still don't get what's the point of having partial variants, and why things couldn't be inferred from each other. Like, even the <a href="https://doc.rust-lang.org/std/cmp/trait.Ord.html#how-can-i-implement-ord">official docs on <code>Ord</code></a> recommend writing a boilerplate for <code>PartialOrd</code> that just calls out to <code>Ord</code>. I'm sure there are Reasons™ for it, but somehow Python can infer <a href="https://docs.python.org/3/library/functools.html#functools.total_ordering">total ordering</a> from just <code>__eq__</code> and <code>__lt__</code>.</p>
<p><a name=debouncing></a></p>
<h2>Debouncing</h2>
<p>After using the tool for a week I noticed that the uploaded photos didn't have any metadata on them. After some digging this turned out to be due to the way Darktable writes exported files: it does it twice for every file. The second write, I assume, is specifically to add metadata to the already fully written JPEG. The problem was, <code>nfp</code> has been snatching the file away immediately after the first write.</p>
<p>The only way I know how to deal with this problem is "<em>debouncing</em>", a term familiar to programmers working with UI and hardware. Which means, adding a short grace period of waiting until a jittery signal stops appearing on the input or the user stops rapidly clicking a button. Or Darktable stops rapidly overwriting a file.</p>
<p>Quick search for a generic debouncer for Rust turned up only specific solutions tied to mpsc channels, or async streams, or hardware sensors. So <a href="https://nest.pijul.com/isagalaev/nfp:main/EPIHKWJDBRZLG.EQAAA">I wrote my own debounce</a>, which is a passive data structure with a couple of methods that doesn't want to know anything about where you get the data and what's the waiting mechanism. It just tracks time and removes duplicates.</p>
<p>I may yet turn it into a full-blown crate, and may be build a unixy-feeling debounce tool along the lines of:</p>
<pre><code>inotifywait -m -e close_write /path | debounce -t 500 | python upload.py
</code></pre>
<p class=note><small><b>Update:</b> this has been <a href="https://softwaremaniacs.org/blog/2021/10/25/debounce/en/">implemented</a>.</small></p>
<p>To do it properly though, I'll have to implement it as a two-threaded process, which will give me an opportunity to play with concurrency in Rust, something I haven't done yet. In <code>nfp</code> I cheated: it waits for new notifications on the same thread that sleeps for debounce timeouts, so it uses an ugly hack of sleeping in short chunks and constantly checking for new events:</p>
<pre><code>loop {
match debouncer.get() {
State::Empty => break,
State::Wait(_) => sleep(Duration::from_millis(50)),
State::Ready(file) => { ... }
}
for event in inotify.read_events(&mut buffer)? {
debouncer.put(...)
}
}
</code></pre>
<h2>Flickr uploader</h2>
<p>The uploader script was a story in itself. Ironically I spent more time trying to make various existing solutions work for me than I did with <code>nfp</code>, but didn't have any luck. So I ended up cobbling together a Python script using <a href="https://stuvel.eu/software/flickrapi/">flickrapi</a>.</p>
<p>The ugly part of all these scripts is <strong>OAuth</strong>. More precisely, its insistence on having to register a client app to get a unique id and secret (apart from the user auth for whoever is going to be using it). It's totally fine for a web service, but in anything distributed to user-owned general-purpose computers it means that a determined user can fish out the client credentials and use them for something else (oh horrors!) I remember dealing with this problem when we worked on an OAuth service for Yandex around 2009, and we didn't come up with a good solution for it. These days I believe client credentials should be optional, akin to the <code>User-Agent</code> header in HTTP, and shouldn't be used for anything outside of coarse statistical data.</p>
<p>Anyway… Since I'm using this script only for myself, I registered it on Flickr, put the credentials in a config and forgot about it :-)</p>
<p>Here's the whole script for posterity:</p>
<pre><code>import argparse
import logging
from pathlib import Path
import toml
import flickrapi
logging.basicConfig(level='INFO')
log = logging.getLogger()
def main():
parser = argparse.ArgumentParser()
parser.add_argument('filename', type=str)
args = parser.parse_args()
filename = Path(args.filename)
config = toml.load(open(Path(__file__).parent / 'flickr.toml'))
token = flickrapi.auth.FlickrAccessToken(**config['user'])
flickr = flickrapi.FlickrAPI(**{**config['app'], 'token': token})
log.info(f'Uploading {filename}...')
title = filename.name.split('.', 1)[0]
# TODO: use 'Xmp.darktable.colorlabels' to control visibility
result = flickr.upload(filename, title=title, is_public=0)
if result.attrib['stat'] != 'ok':
raise RuntimeError(result)
log.info(f'Successfully uploaded {filename}')
if __name__ == '__main__':
main()
</code></pre>
<p>P.S. I love dict destructuring with <code>**</code>!</p>
<h2>What's next</h2>
<p>I'm not yet sure what to do with <code>nfp</code>. The good sense tells me to extract a debouncer out of it and drop the rest in favor of <code>inotifywait</code>, but it actually does add some extra value: it has a sensible config format and I can modify it further into being able to exec multiple processor scripts in parallel. Although I suspect the latter part can be handled by yet another unix voodoo :-)</p>
<p>And its best feature is that it works for me right now!On Kotlin
2023-05-09T10:00:44.797047-07:00https://softwaremaniacs.org/blog/2020/04/14/on-kotlin/I've been writing code in Kotlin on and off over a few months, and I think I'm now at this unique stage of learning something new when I already have a sense of what's what, but not yet so far advanced so I don't remember beginner's pain points. Here's a ...
<p>I've been writing code in Kotlin on and off over a few months, and I think I'm now at this unique stage of learning something new when I already have a sense of what's what, but not yet so far advanced so I don't remember beginner's pain points.</p>
<p>Here's a dump of some of my impressions, good and bad.</p>
<p><a name=more></a></p>
<h2>Functional</h2>
<blockquote>
<p>We were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp.</p>
</blockquote>
<p>— <a href="https://en.wikipedia.org/wiki/Guy_L._Steele_Jr.">Guy Steele</a></p>
<blockquote>
<p>Kotlin drags Java programmers another half of the rest of the way.</p>
</blockquote>
<p>— me</p>
<p>That is to say, Kotlin doesn't feel like a real functional-first language. It's still mostly Java with all its imperativism, mutability and OO, but layered with some (quite welcome) syntactic sugar that makes it less verbose and actually encourages functional style. Where it still feels mostly Java-ish is when you need to work with Java libraries. Which is most of the time, since the absolutely transparent Java interop doesn't make writing Kotlin-flavored libraries a necessity.</p>
<h2>Classes are not required</h2>
<p>For starters, you don't <em>have</em> to put everything in classes with methods any more. Plain top-level functions are perfectly okay.</p>
<p>You also don't need to write/generate a full-blown class if what you really need is a struct/record. Instead you just do:</p>
<pre><code>data class Person(val name: String, val age: Int)
</code></pre>
<p>These have some handy features (like comparability) implemented out of the box, which is nice. And then you can pass them to functions as plain arguments, without necessarily having to make them methods on those argument's classes.</p>
<h2>Extension functions</h2>
<p>Like other newer languages (Swift, Rust) Kotlin allows you to add your own methods to existing classes, even to built-in types. They are neatly scoped to whatever package they're defined in, and don't hijack the type for the entirety of the code in your program. The latter is what happens when you add a new method to a built-in class dynamically in Ruby, and as far as I know, it's a constant source of bad surprises.</p>
<p>It doesn't require any special magic. Just keep in mind that <code>T.func()</code> is not really different from <code>func(T)</code>, only the name of the first parameter is going to be <code>this</code>, and it's going to be available implicitly.</p>
<p>This, I think, is actually a big deal, becasue looser coupling between types and functions operating on them pushes you away from building rigid heirarchies. And by now I believe most people have realized that inheritance doesn't scale. So these days the only real value in having <code>T.func()</code> over <code>func(T)</code> is the ability to compose functions in the natural direction:</p>
<pre><code>data.prepare().process().finalize()
</code></pre>
<p>… as opposed to</p>
<pre><code>finalize(process(prepare(data)))
</code></pre>
<p class=note><small>Yes, I know your Haskell/OCaml/Clojure have their own way of doing it. Good. Kotlin has chaining.</small></p>
<h2>Immutable declarations</h2>
<p>Kotlin uses <code>val</code> and <code>var</code> for declaring local data as immutable and mutable, respectively. <code>val</code> is encouraged to be used by default, and the compiler will yell at you if you use <code>var</code> without actually needing to mutate the variable.</p>
<p>This is very similar to Rust's <code>let</code> and <code>let mut</code>. Unfortunately however, Kotlin doesn't enforce immutability of a class instance inside its methods, so it's still totally possible to do:</p>
<pre><code>val obj = SomeObject( ... )
obj.someMethod()
</code></pre>
<p>… and have internal state changed unpredictably.</p>
<h2>Expressions</h2>
<p>Kotlin is another new language adopting "everyhing is an expression" paradigm. You can assign the result of, say, an <code>if</code> statement to a variable or <code>return</code> it. This plays well with a shortened syntax for functions consisting of a single expression, which doesn't involve curly braces and the <code>return</code> keyword:</p>
<pre><code>fun recencyScore(item: Item): Int =
if (item.crated < LocalDateTime.now().minusDays(RECENT_DAYS)) 1 else 0
</code></pre>
<p>You still need <code>return</code> in imperative functions and for early bail-outs.</p>
<p>This is all good, I don't know of any downsides.</p>
<h2>Lambda syntax</h2>
<p>I think Kotlin has easily the best syntax for nameless in-place functions out of all languages with curly braces:</p>
<ul>
<li>
<p>You put the body of the function within <code>{ .. }</code>, no extra keywords or symbols required.</p>
</li>
<li>
<p>If it has one argument (which is very common), it has an implicit short name, <code>it</code>.</p>
</li>
<li>
<p>This one is really cool: if the lambda is the last argument of the accepting function, you can take it outside the parentheses, and if there are no other arguments, you can omit the parentheses altogether.</p>
</li>
</ul>
<p>So filtering, mapping and reducing a collection looks like:</p>
<pre><code>entries.filter { it < 5 }
.map { it * 2 }
.fold(10) { acc, value -> acc + value }
</code></pre>
<p>Note the absence of <code>()</code> after the first two functions. The line with <code>.fold</code> is more complicated because it <em>does</em> have an extra argument, an initial value, which has to go into parentheses, and it also has a two-argument lambda, so it needs to name them.</p>
<h2><code>.let</code></h2>
<p>Many times you can get away with not inventing a name for another temporary variable:</p>
<pre><code>val reader = File(directory, name).let {
if (!it.exists()) {
it.createNewFile()
}
it.bufferedReader() // last expression returned from `let`
}
</code></pre>
<p><code>.let</code> takes the object on which it was called (<code>File()</code> in this case), passes it as a single argument to its lambda, where you can use it as, well, <code>it</code>, and then returns whatever was returned from the lambda. This makes for succinct, closed pieces of code which otherwise would either bleed their local variables outside the scope, or require a named function.</p>
<p>This reminds me of Clojure's <code>let</code>, and Kotlin also has its own idiom similar to <code>when-let</code> which is a variant that only works when the value is not <code>null</code>:</p>
<pre><code>val something = nullableValue()?.let { ... }
</code></pre>
<p>If the result of <code>nullableValue</code> is <code>null</code> the operator <code>?.</code> would safely short-cirquit the whole thing and not call the <code>.let</code> block.</p>
<h2>Friends of <code>.let</code></h2>
<p>Speaking of <code>.let</code>, it's actually one of no fewer than <em>five</em> slight variations of the same idea. They vary by which name the object is passed inside the lambda block, and by what it returns, the object itself or the result of the lambda.</p>
<p>Here they are:</p>
<ul>
<li><code>.apply</code> takes the object as <code>this</code>, returns the object</li>
<li><code>.run</code> takes the object as <code>this</code>, returns the result of the block</li>
<li><code>.also</code> takes the object as <code>it</code>, returns the object</li>
<li><code>.let</code> takes the object as <code>it</code>, returns the result of the block</li>
</ul>
<p>Technically, you can get by with only ever using <code>.let</code>, because you can always return <code>it</code> explicitly, and the difference between <code>it</code> and <code>this</code> is mostly cosmetic: sometimes you can save more characters by omitting typing <code>this.</code>, sometimes you still need it to avoid things like <code>name = name</code>, so you switch to using <code>it</code>.</p>
<p>The real reason for all these variations is they're supposed to convey <a href="https://kotlinlang.org/docs/reference/scope-functions.html">different semantics</a>. In practice I would say it creates more fuss than it helps, but it may be just my lack of habit.</p>
<p>And no, I didn't forget about the fifth one, <code>with</code>, which is just a variant of <code>run</code>, but you pass the object in parentheses instead of putting it in front of a dot:</p>
<pre><code>// Totally equivalent
obj.run {
someMethod()
}
with(obj) {
someMethod()
}
</code></pre>
<p>I can only probably justify its existence by a (misplaced) nostalgia for a similar <code>with</code> from Pascal and early JavaScript. And there's a reason nobody uses it anymore: the implicit <code>this</code> was a reliable source of hard to spot bugs.</p>
<p>By the way, this sudden language complexity is something that Lisps manage to avoid by simply not having the distinction between "functions" and "methods", and always returning the last expression from a form. "An elegant weapon for a more civilized age", and all that :-)</p>
<h2>Collection transformations and laziness</h2>
<p>That one caught me off guard. Turns out there's a difference on what kind of value you call <code>.map</code>, <code>.filter</code> and such. Calling them on a <code>List<T></code> does not produce a lazy sequence, it actually produce a concrete list. If you want a lazy result you should cast a concrete collection to <code>Sequence<T></code> first:</p>
<pre><code>val names = listOf("john", "mary")
val upperNames = names.map { it.toUpperCase() } // List<String>, eagerly built
val lazyUpperNames = names.asSequence().map { ... } // lazy Sequence<String>
// Or:
fun upper(items: Sequence<String>): Sequence<String> =
items.map { it.toUpperCase() }
val lazyUpperNames = upper(names) // implicit cast to Sequence<String>
</code></pre>
<p>That's one more gotcha to be aware of if you want to avoid allocating memory for temporary results at every step of your data transformations.</p>
<h2>No tuples</h2>
<p>In Python, tuples are a workhorse as much as dicts and lists. One of their underappreciated properties is their natural <em>orderability</em>: as long as corresponding elements of two tuples are comparable with each other, tuples are also comparable, with leftmost elements being the most significant, so you have:</p>
<pre><code>(1, "A", 20) < (1, "B", 10)
</code></pre>
<p>This is <strong>tremendously</strong> convenient when sorting collections of custom elements, because you only need to provide a function mapping your custom value to a tuple:</p>
<pre><code>sorted(autocomplete, key=lambda a: (a.frequency, a.recency, a.title))
</code></pre>
<p>Kotlin doesn't have tuples. It has <a href="https://kotlinlang.org/api/latest/jvm/stdlib/kotlin/-pair/">pairs</a>, but they aren't orderable and, well, sometimes you need three elements. Or four! So when you want to compare custom elements you have two options:</p>
<ul>
<li>
<p>Define comparability for your custom class. Which you do at the class declaration, way too far away from the place where you're sorting them. Or it may not work for you at all if you need to sort these same elements in more than one way.</p>
</li>
<li>
<p>Define a comparator function in place. Kotlin lambdas help here, but since it needs to return a -1/0/1, it's going to be sprawling and repetitive: for all elements, subtract one from another, check for zero, return if not, move to the next element otherwise. Bleh…</p>
</li>
</ul>
<h2>Type inference</h2>
<p>It's probably to widespread type inference that we owe the resurgence in popularity of typed languages. It's what makes them palatable. But implementations are not equally capable across the board. I can't claim a lot of cross-language experience here, but one thing I noticed about Kotlin is that it often doesn't go as far as, say, Rust in figuring out what is it that you meant.</p>
<p>For example, Kotlin can't figure out the type of an item of an initially empty list based on what data you're adding to it:</p>
<pre><code>val items = mutableListOf()
items.add("One")
println(items)
// Type inference failed: Not enough information to infer parameter T in inline fun <T> mutableListOf(): MutableList<T>
// Please specify it explicitly.
</code></pre>
<p>Rust does this just fine:</p>
<pre><code>let mut items = vec![];
items.push("One");
println!("{:?}", items);
</code></pre>
<p>It's a contrived example, but in paractice I also had stumbled against Kotlin's inability to look into how the type is being used later. This is not a huge problem of course…</p>
<h2>Messy parts</h2>
<p>I'm going to bury the lead here and first give you two examples that look messy (to me) before uncovering the True Source of Evil.</p>
<p>The first thing are <code>in</code> and <code>out</code> modifiers for type parameters. There is a long detailed article about them in the <a href="https://kotlinlang.org/docs/reference/generics.html">docs about generics</a> which I could only sort of understand after the third time I read it. It all has to do with trying to explain to the compiler the IS-A relationship between containers of sub- and supertypes. Like <code>List<String></code> could be treated as <code>List<Object></code> if you only read items from it, but you obviously can't write a random <code>Object</code> into it. Or something…</p>
<p>The second example is about extension methods (those that you define on some third-party class in your namespace) that <a href="https://kotlinlang.org/docs/reference/extensions.html#extensions-are-resolved-statically">can't be virtual</a>. It may not be immediately apparent why, until you realize that slapping a method on a class is <em>not</em> the same as overriding it in a descendant, but is simply a syntactic sugar for <code>f(this: T, ... )</code>. So when you call <code>T.f()</code> it doesn't actually look into the VMT of <code>T</code>, it looks for a free-standing function in a local namespace.</p>
<p>Now, things like these made me acutely realize how much I appreciate Rust, a language that simply doesn't <em>have</em> inheritance! It still has all the power to express any polymorphic behavior you care about, yet doesn't have any complexity coming with hierarchies of objects! Kick me if you want me to elaborate on that…I learned C# in 4 days!
2018-10-23T09:34:10.711863-07:00https://softwaremaniacs.org/blog/2015/02/06/learned-csharp-4-days/You know those crazy books, "Learn whatever programming in 21 days"? I mean, who can afford spending that much time, right? Some background I have a friend who employs a very particular workflow for dealing with his digital photos. It often involves renaming and merging files from different cameras into ...
<p>You know those <em>crazy</em> books, "Learn whatever programming in 21 days"? I mean, who can afford spending that much time, right?</p>
<p><a name=more></a></p>
<h2>Some background</h2>
<p>I have a friend who employs a very particular workflow for dealing with his digital photos. It often involves renaming and merging files from different cameras into a single chronologically ordered event, relying on natural sorting of file names in Windows Explorer. File names are constructed of picture time fields and running counters, like "2015-02-06_001.jpg".</p>
<p>This is of course too tedious to do by hand, so he was very happy with a small specialized Windows utility that I wrote for him a few years ago when Windows XP ruled the world and I still programmed in Delphi. The program worked fine until, with the natural flow of time, the world switched to Unicode and newer Windows started to display question marks in place of Cyrillic characters in the program's UI. This made it rather unusable. There were also other small and not so small <em>imperfections</em> about the program that, as I understand, added considerable factor of irritation to the act of processing photos. ("And when it happens upon a panoramic shot you can as well go and pour yourself some coffee because UI is frozen for minutes while loading the preview…")</p>
<p>So a year ago when we've been visiting his family for Christmas he nagged me, politely but emphatically, about at least making the UI readable again and also, just may be, fixing some of the most outrageous annoyances uncovered over the years of usage. The only problem was… I've lost the source code! I know, it might sound utterly unbelievable these days but it was written in the era before GitHub, and back in those days I've been using — wait for it — <a href="http://en.wikipedia.org/wiki/Zip_drive">Zip drives</a> to store my backups. Which in hindsight turned out to be suboptimal: they fail.</p>
<p>All this, however, provided me with a unique opportunity for making a <em>really good</em> Christmas gift this year…</p>
<p>I suppose there exist people out there who could come up instantly with a perfect gift idea for any of their dozens of friends upon being woken up in the middle of the day, but most of us seem to be destined to endure the agony of scratching the bottom of the void bowl of "what on Earth should we give them this time that won't suck like the last time!" So I was pretty much stoked when some weeks before we were about to leave for the trip it hit me that I actually could <em>write the same program from scratch!</em></p>
<p>And I'm happy to say that ultimately the idea did work out as intended and at some point it has even been uttered that it was "the best gift ever!" </p>
<p class="picture center"><img src="/media/blog/pe-screenshot.png"></p>
<p>The best thing though is that now I can actually maintain the code (which I'm doing once a week these days) and not feel sorry for writing another half-working utility. Software is a process, after all.</p>
<h2>The endeavor</h2>
<p>So I had to learn how to write Windows GUI apps, again. Going back to Delphi was pretty much out of the question as even back in the time it was already loosing the mind share to quickly rising C# and I simply assumed that by now this process has completed. Besides, I actually wanted to learn how Windows GUI programming is "officially" done these days. (Notwithstanding the fact that we're still talking about traditional desktop software, not Metro tiles.)</p>
<p>The lazy evaluation phase took me a couple of weeks, during which I only figured out which of the three-letter acronyms I need to know: WPF, MVVM, C#. The actual design and implementation with ongoing research took 4 days — literally. The most helpful resources along the way were <a href="http://www.wpf-tutorial.com/">WPF Tutorial</a> and <a href="http://stackoverflow.com/">Stack Overflow</a> (of course).</p>
<p>Most importantly though, it was rigorous planning and doing design ahead of coding that allowed me to get the thing done. Here's a few snapshots of my whiteboard with the UI mock-up and current tasks divided by priority:</p>
<p class="picture center">
<img src="/media/blog/pe-plan-1.jpg">
<img src="/media/blog/pe-plan-2.jpg">
</p>
<p>And though this entire article is not of particular practical importance — I'm simply sharing my emotions here — there is one point I'd really like to drive home:</p>
<p class=strong><strong>Planning works. Always.</strong></p>
<p>If you're one of those who doesn't "believe" in it, and for whom "plans never work", I say you most certainly are just doing it wrong and fixing it is a matter of learning how. Indulge yourself.</p>
<h2>C# and WPF</h2>
<p>I'll say from the get go that I can't presume on having an accurate opinion about a mainstream language after spending just 4 days with it. This is only my first impression. </p>
<p>It <em>feels</em> to me like a modern Delphi, which is probably not surprising given that both were invented by the same <a href="http://en.wikipedia.org/wiki/Anders_Hejlsberg">Anders Hejlsberg</a>. Type inference makes static typing a lot more palatable, however the time spent on satisfying the compiler's complaints about inconsistent types still feels to me like the time lost. I was pleasantly surprised though by some nice things making their way into a 10+ year old language: lambdas, <code>+=</code> for registering event listeners, LINQ — this is all very handy.</p>
<p>But overall, for a Pythonista, the language still feels way too verbose and ceremonious. Want to display a regular public attribute in UI? Oh, just turn it into a property with a getter and a setter <em>and</em> an accompanying separate private field of the same type. A dozen or so lines of code to satisfy a convention — not cool.</p>
<p>Likewise, I can't compare WPF to any modern UI framework as I didn't use any (which is a shame, really). From this position, what immediately feels right about WPF is the data binding concept. Instead of writing disjoint pieces of imperative code updating disjoint pieces of UI and trying doing it in the right order and not forgetting anything, you now define <em>relations</em> like "this ListView shows this list from my data model" and "this action is enabled when these conditions are met and it is bound to these UI controls". And all the controls' state is updated pretty much automatically. I believe it's that thing they call "reactive programming" these days…</p>
<p>The GUI editor is unusable. It took me probably only half a day before I completely switched to editing XAML by hand, and as I understand it's how it's done in practice. Here's a simple example why the editor sucks. XAML layout works best by dividing your window into panels, some of which are of fixed size while others automatically fill available space. Only the GUI editor doesn't do that, instead it gives all panels fixed sizes in <em>pixels</em>, thus defeating the purpose completely. So, surprisingly the old Delphi GUI editor remains the best in my limited opinion: it <em>was</em> usable and it did the right things by default most of the time.</p>
<h2>The code</h2>
<p>I didn't publish it anywhere yet but I will once I figure out SSH keys on Windows and choose proper licensing. I'm very interested in a code review from someone versed in WPF/C# but what I <em>don't</em> want to do though is maintain it as a proper project with contribution and such, it's just too much hassle.Memory is slow
2014-06-14T13:33:24.049000-07:00https://softwaremaniacs.org/blog/2014/06/13/memory-is-slow/Did you know that memory is slow compared to CPU? I kinda knew too but recently I've got a revelation from two unrelated sources about how it affects design of programming languages. I also learned about a new thing called "value/reference type dichotomy". I've stumbled upon this term while reading ...
<p>Did you know that memory is slow compared to CPU? I kinda knew too but recently I've got a revelation from two unrelated sources about how it affects design of programming languages. I also learned about a new thing called "value/reference type dichotomy".</p>
<p><a name=more></a></p>
<p>I've stumbled upon this term while reading a preliminary <a href="http://graydon2.dreamwidth.org/5785.html">review of Apple's Swift</a> from Graydon Hoare, creator of <a href="http://rust-lang.org/">Rust</a>. I sent him an email asking about it and he replied with a wonderfully detailed explanation covering not only that but also memory, caches, garbage collection and simplicity of programming models of various languages. You gotta love Rust simply because of the fact that Graydon is such a terrific guy!</p>
<h2>Allocation models</h2>
<p>In a nutshell, there are two ways in which programming languages allocate variables:</p>
<ul>
<li>
<p>Some use "uniform representation" when all variables allocated on the stack are of the same size: one machine word. This leads to scalar objects (ints, bools, etc.) being allocated directly on the stack and non-scalar objects being allocated automatically on the heap and represented on the stack by references. This is a value/reference type dichotomy. Java, C# (mostly), Python, Ruby are such languages.</p>
</li>
<li>
<p>Other languages allow objects of different type, size and complexity to be allocated directly on the stack and give programmers a choice where to keep things. C/C++, Go and Rust are such languages.</p>
</li>
</ul>
<p>The catch with uniform stack languages is, while they hide a great deal of complexity from a programmer, they pay for it in performance. Because now instead of manipulating objects on the stack you <em>always</em> have to dereference pointers and look into main memory. It wasn't obvious to me that the penalty is significant though. "The stack" after all is just a software concept and it resides in the same physical memory as the heap, so theoretically you just have to make two memory lookups instead of one.</p>
<h2>Cache matters</h2>
<p>The practical difference is in cache locality. If you constantly push and pull values on the same stack it will be mostly in cache pretty much all the time. And some of those values will even be already in the CPU registers thanks to compiler optimizations. And on the other hand when you're constantly allocating and grabage-collecting things in random places in memory they're pretty much never going to be cached successfully. Graydon has quoted a number of 80% performance hit for Java code caused only by that. Yes, that's <strong>eighty percent</strong>!</p>
<p>Another confirmation of the same idea I've got from the presentation "<a href="http://dave.cheney.net/2014/06/07/five-things-that-make-go-fast">Five things that make Go fast</a>" by Dave Cheney. I don't know how to link to the relevant part within the page so you will have to read it in full :-). Which I recommend doing anyway, it's actually very clear and informative even if you don't know Go.</p>
<p>I'm only going to quote a picture from that presentation showing why this problem has "suddenly" started to bother language designers:</p>
<figure class="picture center">
<img src="/media/blog/memory-cpu-gap.jpg">
<figcaption>Memory/CPU performance gap over the years. (<a href="http://dave.cheney.net/2014/06/07/five-things-that-make-go-fast">Source</a>.)</figcaption>
</figure>
<h2>How about Python?</h2>
<p>When Alex Gaynor tells us <a href="http://vimeo.com/61044810">why Python is slow</a> he specifically names three main culprits for the performance hit: <a href="https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow?slide=26">hash lookups, allocations, copying</a>. When talking about allocations he mostly focuses on the fact that idiomatic Python forces an interpreter to do a lot of them, and the secret of getting better performance is to provide APIs that don't require as many.</p>
<p>However it turns out that it also matters <em>where</em> all those allocations happen. Alex claims that PyPy in particular by now has learned how to allocate really efficiently, so I wonder if it's by any chance smart enough to do it on the stack?