kivikakk.ee

k6_bytea

I’ve started a project in Erlang recently, and I needed a big block of mutable memory – a massive byte array, buffer, whatever you want to call it. Erlang’s binary strings are immutable and so probably not suitable.

I figured the core distribution must’ve had something like this … nope. Spending 30 minutes Googling and checking docs twice and thrice over, but there’s clearly no mutable byte array in Erlang itself.

Is there a hack that approximates to this? Search … this StackOverflow almost seems hopeful at the end, referencing hipe_bifs:bytearray/2 and hipe_bifs:bytearray_update/3, but alas, they are so-named because they are part of the HiPE native compiler, and not Erlang itself. (I’m not currently using HiPE, and it would be nice to not be tied to it.)

Then, I thought, surely someone else has extended Erlang to do this. There are several modes of extending Erlang, but native implemented functions (NIFs) would be the obvious candidate for manipulating large chunks of memory.

Googling this yielded complete failure: people just don’t seem to be doing it. (If people are doing it and you know this, please, let me know.)

So I did it: k6_bytea.

The interface is simple and fairly pleasant:

1> Bytea = k6_bytea:from_binary(<<"Hello, world.">>).
<<>>
2> k6_bytea:set(Bytea, 7, <<"Thomas">>).
ok
3> k6_bytea:to_binary(Bytea).
<<"Hello, Thomas">>
4> Gigabyte = k6_bytea:new(1000000000).
<<>>
5> k6_bytea:delete(Gigabyte).
ok
6>

The gigabyte allocation caused a small notch on my memory performance graph:

Screenshot of a GNOME desktop menubar, with a memory performance widget showing a very abrupt increase, then decrease, in memory allocated.

perf

The obvious question remains: how does it perform, vis-à-vis binary strings?

Let’s make a contrived benchmark: initialise a large buffer, write all over it in a deterministic fashion, and copy data all over it. Let’s do it in parallel, too.

Here’s the benchmarking code for your perusal:

-module(k6_bytea_bench).
-export([run/0, binary_strings/1, k6_bytea/1]).
-define(COUNT, 1024). % Parallel operations.
-define(SIZE, 102400). % Size of buffer to work on.
run() ->
measure(<<"Binary strings">>, ?MODULE, binary_strings),
measure(<<"k6_bytea">>, ?MODULE, k6_bytea).
measure(Name, M, F) ->
{SM, SS, SI} = erlang:now(),
spawn_many(?COUNT, M, F, [self()]),
receive_many(?COUNT, done),
{EM, ES, EI} = erlang:now(),
Elapsed = ((EM - SM) * 1000000 + (ES - SS)) * 1000 + ((EI - SI) div 1000),
io:format("~s [~p ms]~n", [Name, Elapsed]),
ok.
spawn_many(0, _, _, _) -> ok;
spawn_many(N, M, F, A) ->
spawn(M, F, A),
spawn_many(N - 1, M, F, A).
receive_many(0, _) -> ok;
receive_many(N, M) -> receive M -> receive_many(N - 1, M) end.
binary_strings(Done) ->
B0 = <<0:(?SIZE*8)>>,
B1 = binary_strings_set_bits(B0, lists:seq(0, ?SIZE - 1024, ?SIZE div 1024)),
_ = binary_strings_copy_bits(B1, lists:seq(0, ?SIZE - 1024, ?SIZE div 1024)),
Done ! done.
binary_strings_set_bits(B, []) -> B;
binary_strings_set_bits(B, [H|T]) ->
<<LHS:H/binary, _:1/binary, RHS/binary>> = B,
binary_strings_set_bits(<<LHS/binary, (H rem 255), RHS/binary>>, T).
binary_strings_copy_bits(B, []) -> B;
binary_strings_copy_bits(B, [H|T]) ->
<<LHS:H/binary, Bit:1/binary, _:1/binary, RHS/binary>> = B,
binary_strings_copy_bits(<<LHS/binary, Bit/binary, Bit/binary, RHS/binary>>, T).
k6_bytea(Done) ->
B = k6_bytea:new(?SIZE),
k6_bytea_set_bits(B, lists:seq(0, ?SIZE - 1024, ?SIZE div 1024)),
k6_bytea_copy_bits(B, lists:seq(0, ?SIZE - 1024, ?SIZE div 1024)),
k6_bytea:delete(B),
Done ! done.
k6_bytea_set_bits(B, []) -> B;
k6_bytea_set_bits(B, [H|T]) ->
k6_bytea:set(B, H, <<(H rem 255)>>),
k6_bytea_set_bits(B, T).
k6_bytea_copy_bits(B, []) -> B;
k6_bytea_copy_bits(B, [H|T]) ->
Bit = k6_bytea:get(B, H, 1),
k6_bytea:set(B, H + 1, Bit),
k6_bytea_copy_bits(B, T).

Over 3 runs, binary_strings averaged 24,015ms, and k6_bytea 198ms (0.83% time, or 121x speed).

There’s nothing very surprising about this; it’s large, unwieldy immutable data-structures vs. one mutable data-structure. It’s even the case that I have no idea if there are any better ways to simulate a byte array in Erlang, either with binary strings, or without!

The binary string-manipulating code above is ugly and error-prone, as it’s clearly not the purpose it was built for. If it should turn out that this really hasn’t been done better by someone else, then I encourage you to look to and improve k6_bytea for this purpose.

Inefficiences and insufficiencies: C++, others

Lately I’ve been game programming. A few things have been coming to mind:

  • The Sapir–Whorf hypothesis is totally true when it comes to programming languages.
  • Dynamically typed languages encourage a whole lot of waste.
  • There’s some sweet spot of expressivity, low level and non-shoot-yourself-in-the-footness in language that’s missing.

I will attempt to expound.

The languages I end up using on a day-to-day basis tend to be higher level. Non-scientific demonstration by way of my GitHub account’s contents at time of writing:

15 Ruby
9 OCaml
6 Go
6 Javascript
4 C
3 C++
2 Clojure
3 Perl
2 Haskell
...

In my professional career, I’ve concentrated on JavaScript, PHP, Erlang, Python and C#. The lowest level of these is, by far, Erlang. Perhaps it’s fairer to say that Erlang keeps me in check, perf-wise, more than any of the others.

So my mind has been a fairly high-level sort of mindset. I’ve made occasional trips to lower-level code, but there hasn’t been much of that lately, particularly as I’ve changed jobs and needed to concentrate solely on work stuff for a while.

Choosing C++ to write a game wasn’t too hard; it’s fairly standard in the industry, and I know it quite well. Bindings to libraries are plentiful, and getting the same codebase compiling on Windows, OS X and Linux is a well-understood problem that’s known to be solvable.

The thing is, C++ makes it abundantly clear when you’re doing something costly. This is something that occurs particularly to me now as I’ve not done this lower-level stuff in a while.

You wrote the copy constructor yourself, so you know exactly how expensive pushing a bunch of objects into a vector is. You chose a vector, and not a list, so you know exactly why you don’t want to call push_front so many times. You’re creating a ostringstream to turn all this junk into a string: it has a cost. Are we putting this on the stack or in the heap? Shall we use reference counting?

You make hundreds of tiny decisions all the time you’re using it; ones which are usually being abstracted away from you in higher level languages. It’s why they’re higher level.

And that’s basically all I have to say on that: the language makes you feel the cost of what you choose to do. Going to use a pointer? Better be sure the object doesn’t get moved around. Maybe I’ll just store an ID to that thing and store lookups in a map. How costly is the hashing function on the map key? You add such a lookup table in Ruby without a moment’s notice; here, you’re forced to consider your decision a moment. Every time you access the data, you’re reminded as well; it’s not something that ever goes out of mind.

Moreover, the ahead-of-time compilation means you can’t do costly runtime checks or casts unless you really want to (e.g. dynamic_cast), but again, the cost of doing so means you’ll never be caught unaware by slowing performance. In many (dynamic) higher level languages, basically every operation is laced with these.

So it’s suited for games programming, because performance is usually pretty important, and the language keeping performance on your mind means it’s not hard to consistently achieve high performance.

But C++’s deficiencies are also well-documented. It’s awful. It’s waiting to trip you up at every turn. After re-reading those talk slides, I figured I’d just port the code to C – until I remembered how much I used std::string, std::vector, std::list, and moreover, enjoyed the type- and memory-safety they all bring. I’m not particularly fond of giving that up and implementing a bunch of containers myself, or using generic containers and throwing away my type checks.

I think I’m after C with templates for structs (and associated functions), but I’m not sure yet. If you think I want C++, you probably need to re-read those notes.

The other solution is to only use as much of C++ as I like, and that’s basically what I do – but the language is still waiting to trip me up, no matter how much I try not to use it.

Time to think a bit about what the issues at hand really are.

Snapchat: not for state secrets

I use Snapchat. It’s an app where you can take a photo or short (< 10 second) video and send it to your friends who use the service; they’ll then be able to see it, once, before it disappears forever.

Ostensibly, the app is for sexting, because there’s no fear that your photo will get spread around (no forwarding/etc.) or retained for longer than you’d like, but it seems like it’s not as much a sexter’s hangout as the media might want you to think.

My circle of friends use it basically as an extension of weird Twitter – most snaps I send and receive are strange angles of weird objects; the completely mundane but somehow therapeutic (7 seconds of the camera pointed outside the window of a tram, pointed at the ground moving below); or just closeups of Curtis Stone’s face, wherever we see him.

Of course, the promise that they won’t get retained is just that: a promise. Since your phone receives this image and shows it to you at some point, it must be downloaded by your phone. If it can be downladed by the phone, it can be downloaded by something else. We decided to find out how.

Read more

Contrigraph: drawing in GitHub

Here’s Contrigraph, a “data visualisation” (?) created by generating commits to match the Contribution Graph Shirt from GitHub.

It was pretty much a hack; first, I read the colours straight off the shirt, producing a big block of data like

0002342
2223322
2323241
2224333
3322122
2242231
...

Then we read that in one digit at a time, work out what day to start on so everything aligns correctly, and how many commits on a given day produce each gradient of colour. The result isn’t pretty:

start = Time.new 2012, 4, 23, 12, 0, 0, 0
tbl = {"0" => 0, "1" => 0, "2" => 1, "3" => 9, "4" => 14, "5" => 23}
3.times do
dbd.each do |n|
tbl[n].times do
`echo 1 >> graafik`
`git commit -a -m 'contrigraph' --date='#{start.to_s[0..9]}T12:00:00'`
end
start += 86400
end
end

Three times so this thing will scroll on for the next few years. Other values for tbl would work too; I just didn’t bother to do anything better. I’ve written clearer code, but that wasn’t really the point either.

I actually screwed this up twice: first I didn’t remember to treat the 0 entries correctly (i.e. I should have skipped those days, whereas I ignored them entirely); second, it seemed like I was getting hit by timezone issues where everything was shifted up a block.

In retrospect, I should have first produced a mini-contributions graph generator (i.e. one that takes a Git repository and produces what GitHub would do), validate that against an existing user/repo, then use that to ensure it’d work the first time. I did a similar thing to ensure I had the data correct, by producing a graph directly from the data:

Newlines and regular expressions

This has come up a few times recently, so a little note:

/./ will match any character—except a newline.

/[\s\S]/, /[\w\W]/ and so forth are the only way to portably match every character including newlines.

what about /./m?

/./m does not match a newline (usually—see below).

The m modifier does not alter . in any way (usually—see below).

m changes what ^ and $ match—normally they’ll match only the start and end of the string, but m alters them to also match the start and end of any line, i.e. immediately after and prior a newline.

Mnemonic: m for “multiline”.

what about /./s?

Yes, that will do what you want—unless you’re using Ruby or JavaScript.

In Ruby, the s modifier is inexplicably unused—in 1.8, it stays on the Regexp object, but in no way affects matching, and in 1.9, it doesn’t even stay on the object, it just disappears. Note that these behaviours are different to using a complete nonsense modifier, like f, which causes a SyntaxError1.

Even more inexplicable, this feature has been rolled into m instead! So in Ruby, you can use /./m to match a newline, and you can also use /^a/m to match an a at the beginning of the string, or after any newline in the string.

In JavaScript, the s modifier is absent entirely, and it’s not rolled into anything else. Use /[\s\S]/.

Mnemonic: s for “single line” (the string is treated as one line, in a twisted sense).

conclusion

Yay, regular expressions. Be sure what you mean.

To match any character including newlines:

  • In JavaScript or any other language lacking the s modifier, use /[\s\S]/.
  • In Ruby, use /./m, but be aware that this also modifies ^ and $. If unsure, use /[\s\S]/.
  • If you have true PCREs, you may safely use /./s.

Note that this is often what you really want—rarely do you want to explicitly match every character except newlines. If you do that on purpose, at least leave a comment to that effect, otherwise your coworkers will just assume you didn’t know what you were doing.

  1. Or just gets thrown away if you use Regexp.new directly.