o_o


I had an amusing thought in bed last night, and I was fortunate enough to remember it:
What if currying wasn’t all good?
# let f x =
let now = CalendarLib.Time.now ()
in
fun y ->
if CalendarLib.Time.now () <> now then
x - y (* !! *)
else
x + y;;
val f : int -> int -> int = <fun>
#
The type-signature is sure innocuous.
Hence:
# f 1 2;;
- : int = 3
# f 100 ~-123;;
- : int = -23
# let add100 = f 100;;
val add100 : int -> int = <fun>
# add100 77;;
- : int = 23
# (* !! *)
I don’t know what convinced me to do this, but I was thinking along the lines of, “OCaml makes all functions curryable, so someone seeing a type-signature of int -> int -> int is so used to thinking of it as a (curryable) function that “takes two arguments”, that they might forget that real things could happen between “fixing” arguments one and two.”
This demonstration is somewhat poor (relying on CalendarLib.Time), but it still gives me pause for thought.
After many discussions recently at work with my coworker who does our webdev and DBA about how crap MySQL is, I decided to finally check out the alternative which I’ve known since the start was better (yet never have bothered [dared?] to try): PostgreSQL.
Postgres is a generally better polished piece of software (often compared to , with less rubbish surrounding its licensing (fuck you, MySQL AB Oracle), better extensibility (PL/*), better standards compliance, foreign-key constraints (which are often forced in software instead in MySQL), a much richer type set .. sold you yet? You can have frickin’ multi-dimensional arrays as column types, and query on subelements! Amazing.
Of course, if you install the postgresql and postgresql-client packages, you’ll find you have an installation which you can do very little with. (no way to get in by invoking psql).
Postgres does its authentication—by default—a bit differently. After installing the server, you’ll have a new postgres user on your system; if you su into it (perhaps via a superuser account), you’ll now be the superuser for your database server, too, by virtue of being logged into that account. Postgres—again, by default—ties the database accounts to the system ones.
We can now use createuser to make an account for our regular user. Since I’m doing this on a development machine (my laptop), I’ll just create a superuser account for it. My account’s called “celtic”, so it’s just a matter of entering createuser -s celtic. Read createuser’s man page for more.
Exiting out of those subshells and back as our regular user, we now can create ourselves a database to toy with, using createdb. All Postgres tools that operate on databases will default to using your username as a database name (i.e. celtic’s default database is named celtic). createdb is no different, so you can just invoke it with no arguments and it’ll make your default database after a short delay.
You can now invoke psql, and get a lovely prompt:
(celtic) ~:6320 (0)% psql
psql (8.4.5)
Type "help" for help.
celtic=#
The word celtic at the prompt represents the database I’m using, and the # is because I’m a superuser (non-superusers see >).
Now that we can connect to our database and mess around with stuff, let’s take a look at the SQL understood by Postgres.
It differs in small ways from MySQL. Table and column names lose their capitalisation if you don’t quote them, for example:
celtic=# create table XYZabc ();
CREATE TABLE
celtic=# \d
List of relations
Schema | Name | Type | Owner
--------+--------+-------+--------
public | xyzabc | table | celtic
(1 row)
celtic=# create table "XYZabc" ();
CREATE TABLE
celtic=# \d
List of relations
Schema | Name | Type | Owner
--------+--------+-------+--------
public | XYZabc | table | celtic
public | xyzabc | table | celtic
(2 rows)
celtic=#
Quoting is another good point. Backticks aren’t used for identifiers; instead, use double-quotes (!). Single-quotes quote strings. This may take a moment to get used to.
Another thing is Postgres’s explicit use of sequences. In MySQL, if you want a unique numeric ID for a table, you’d probably just add a column like abcId INT PRIMARY KEY NOT NULL AUTO_INCREMENT, with a few caveats: there can be only one AUTO_INCREMENT per table, it must be indexed, they don’t work if you try to use negative numbers with them, and it has to be an integer or floating point number column.
Postgres has a construct that looks quite similar; abcId SERIAL PRIMARY KEY NOT NULL. In this case, SERIAL is the “type” of this column, but what it ends up looking like is quite different:
celtic=# create table abc (abcID serial primary key not null);
NOTICE: CREATE TABLE will create implicit sequence "abc_abcid_seq"
for serial column "abc.abcid"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"abc_pkey" for table "abc"
CREATE TABLE
celtic=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------+----------+--------
public | abc | table | celtic
public | abc_abcid_seq | sequence | celtic
(2 rows)
celtic=# \d abc
Table "public.abc"
Column | Type | Modifiers
--------+---------+-----------------------------------------------------
abcid | integer | not null default nextval('abc_abcid_seq'::regclass)
Indexes:
"abc_pkey" PRIMARY KEY, btree (abcid)
celtic=# \d abc_abcid_seq
Sequence "public.abc_abcid_seq"
Column | Type | Value
---------------+---------+---------------------
sequence_name | name | abc_abcid_seq
last_value | bigint | 1
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 1
is_cycled | boolean | f
is_called | boolean | f
celtic=#
That’s a lot to digest!
When we create the table, PgSQL tells us it’s creating an implicit sequence called abc_abcid_seq for the serial column abc.abcid. Okay. The PRIMARY KEY invocation is also expanded into creation of an index; note that Postgres tells you these things explicitly. This is also a nice reminder that these things can be done manually, too.
When we detail the database with \d, we now see there are two relations; our table, and its sequence. Note that the sequence isn’t really explicitly bound to the table in any way, it’s just named so you can tell it belongs to it.
Inspecting the table, we can see that abcid is now actually an integer column marked not null. It also has a default value, nextval('abc_abcid_seq'::regclass). This is actually a function call—Postgres supports arbitrary expressions (subject to some caveats) for calculating default values, not just NOW().
I won’t delve into the function call too much at this stage, but nextval() is an Postgres-supplied function which takes a handle to a relation (which should be a sequence), which is of type regclass, and returns the next value the sequence should generate, as well as causing it to advance its internal pointer such that it’ll return the following number next time.
The paamayim nekudotayim :: is the cast operator, and in this case casts the string-literal 'abc_abcid_seq' to a regclass. Technically it’s not necessary, as it gets casted implicitly for you anyway:
celtic=# select nextval('abc_abcid_seq');
nextval
---------
1
(1 row)
celtic=# select nextval('abc_abcid_seq');
nextval
---------
2
(1 row)
celtic=#
.. but I suppose the system is being explicit, given that it’s the one generating the values.
So that’s how the serial column “type” works in Postgres. You can use these for anything, and not necessarily tied to only one table, or any table at all; perhaps you create a standalone sequence (CREATE SEQUENCE xyzzy), then use it in some functions. There’s a family of functions for manipulating its value manually, so you have complete control of how the sequence is generated. It’s a really neat mechanism, and I’m looking forward to exploiting it in my own applications.
Now, let’s say we wanted to use this amazing database system in our very own OCaml application. The first thing I did was look for a Debian package called libpostgresql-ocaml-dev, and lo and behold, there’s one! I installed it, and searching for docs for it, I stumbled upon the homepage of PG’OCaml. Normally I eschew the “non-standard” library, but I was pleasantly surprised to find that this was also in Debian: libpgocaml-ocaml-dev! Let’s install that instead.
Its killer feature? PG’OCaml type-checks your SQL statements at compile-time to make a strong, type-safe SQL library with syntax extensions so you do no manual (and unsafe) casting of data, nor do you accidentally pass the wrong type of data when trying to concatenate a barely-injection-safe query together.
The “downside” (if you can call it that) is that PG’OCaml needs access to your database at compile-time to find out the structure of your database. This is rarely an issue. The amazing thing is that, not only does it make sure that you’re working with the right types and doing all the ugly conversion under-the-table for you, it also carries the amazing feeling of using a safe static-type-inferring language right through to your database layer: make a change to your database (that mangles your code without you realising), and you’ll now get a compile-error when you recompile! Of course, the caveat is that you have to recompile to get the knowledge upfront, but you’ll get runtime errors as well, just like you would with any other database connector.
Let’s begin coding with PG’OCaml:
let report (name,age) =
Printf.printf "%s is %d years old\n%!" name age
let dbh = PGOCaml.connect ()
in
let results =
PGSQL (dbh) "select name, age from tblpeople"
in
List.iter report results
This is deceptively simple: we connect to a database (where are the connection settings?!), query it using PGSQL, and then List.iter on the supposedly well-formed and well-typed result?!
The answer, of course, is yes.
To get this running, try compiling it using ocamlfind with a line like this:
ocamlfind ocamlc -linkpkg -package pgocaml.syntax -syntax camlp4o test.ml -o test
Bam! Compile error:
ERROR: 42P01: relation "tblpeople" does not exist
File "test.ml", line 7, characters 2-47:
Camlp4: Uncaught exception: PGOCaml_generic.Make(Thread).PostgreSQL_Error ("ERROR: 42P01: relation \"tblpeople\" does not exist", [(83 | CstTag84, "ERROR"); (67 | CstTag68, "42P01"); (77 | CstTag78, "relation \"tblpeople\" does not exist"); (80 | CstTag81, "23"); (70 | CstTag71, "parse_relation.c"); (76 | CstTag77, "885"); (82 | CstTag83, "parserOpenTable")])
File "test.ml", line 1, characters 0-1:
Error: Preprocessor error
There’s a lengthy error courtesy of camlp4, but the first line is the important one; the table doesn’t exist! We’re informed at compile-time. Saviour!
Let’s fix this in psql:
celtic=# create table tblpeople (name varchar primary key not null, age int not null);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tblpeople_pkey" for table "tblpeople"
CREATE TABLE
celtic=# insert into tblpeople values ('Mysterious', 21), ('Anneli', 20), ('Bjoerk', 45);
INSERT 0 3
celtic=#
Compiling again gives us a slightly different error:
File "test.ml", line 9, characters 17-24:
Error: This expression has type
(string * int32) list PGOCaml.monad PGOCaml.monad =
(string * int32) list
but an expression was expected of type (string * int) list
Well, you can’t get everything right the first time. We defined tblpeople.age as an int in Postgres, which is 4 bytes long. An OCaml int, on the other hand, is either 31 or 63 bits long (depending on your machine architecture) due to their tagged pointer representation, and PG’OCaml will not sacrifice the accuracy silently.
Since we’re not concerned about the last bit (we hope no one should live that long ..), just fix the report function:
let report (name,age) =
Printf.printf "%s is %d years old\n%!" name (Int32.to_int age)
(Edit: as Eric pointed out in the comments, an easier way is to just print the int32 itself using %ld in the format string!)
Recompiling gives great success. Now run:
(celtic) pgo:6427 (0)% ./test
Mysterious is 21 years old
Anneli is 20 years old
Bjoerk is 45 years old
(celtic) pgo:6428 (0)%
Just like that, it works! The types were all inferred, the database connection was smooth.. it all just worked.
PG’OCaml looks to your environment to determine which database to use, both when compiling and running. The environment variables are the same used by the Postgres tools themselves, so this is a great help: set PGDATABASE, PGHOST, PGHOST, etc. if need be.
You can also override options manually in the code itself, and they’re specified separately (and possibly extraneously) for the compile-time stage, and run-time.
For compile-time, you can set options by using connection strings just before a query itself:
let results =
PGSQL (dbh) "database=mydb" "user=jenkins" "select name, age from tblpeople"
These are honoured at compile-time when type-checking (and type-determining, I suppose) each individual PG’OCaml expression, and override environment variable settings.
These are not to be found at run-time—indeed, the information is relevant and necessary only once at runtime: when connecting. For that reason, the PGOCaml.connect function has this type-signature:
val connect : ?host:string ->
?port:int ->
?user:string ->
?password:string ->
?database:string ->
?unix_domain_socket_dir:string ->
unit ->
'a t monad
Thus, to have the program use the same (forced) connection settings at runtime:
let dbh = PGOCaml.connect ~user:"jenkins" ~database:"mydb" ()
There’s obviously an advantage to these being separate, as it means computed/inputted values can be used for the connection at runtime (i.e. in production), a concept which makes little sense at compile-time. I tend to use environment variables to control the compile-time settings.
The module name in OCaml is PGOCaml, not PGOcaml. This tripped me up.
They’re easy! (surprise!) Let’s say you have a value (id : int) which you want to match on an integer column in the database:
let result =
PGSQL (dbh) "SELECT data FROM tblentities WHERE entityid=$id"
It’s that simple! Right? Wrong! OCaml’s int and Postgres’s integer don’t mix! Simply fixed, however:
let pid = Int32.of_int id
in
let result =
PGSQL (dbh) "SELECT data FROM tblentities WHERE entityid=$pid"
It’s even better, though, if you just agree to use int32s elsewhere, though!
The exact same syntax works for INSERTs, too. There’s also a nifty feature: use $?name instead of $name, and it’ll type as an option type—Some x will become x (converted appropriately), but None will render as NULL. Nifty!
See the PG’OCaml website, and this informative tutorial by Dario Teixeira. I strongly suggest you read both, as there’s a lot more to both PostgreSQL and PG’OCaml.
I’ve finally made my first monad; I’m still not 100% sure if I’ve really come to grips on it entirely. Here’s its definition:
import System.IO
import Control.Applicative
newtype IODirector a = IODirector { runIODirector :: (Handle,Handle) -> IO (a, (Handle,Handle)) }
instance Monad IODirector where
return a = IODirector $ \hs -> return (a, hs)
m >>= k = IODirector $ \hs -> do (a, hs) <- runIODirector m hs
runIODirector (k a) hs
class MonadDirectedIO a where
dPutStr :: String -> a ()
dPutStrLn :: String -> a ()
dGetLine :: a String
dGetChar :: a Char
dFlushOut :: a ()
dSetBufferingIn :: BufferMode -> a ()
dSetBufferingOut :: BufferMode -> a ()
dSetBufferingBoth :: BufferMode -> a ()
dSetEcho :: Bool -> a ()
instance MonadDirectedIO IODirector where
dPutStr s = IODirector $ \hs@(_,hOut) -> do hPutStr hOut s
return ((), hs)
dPutStrLn = dPutStr . (++ "\n")
dGetLine = IODirector $ \hs@(hIn,_) -> do r <- hGetLine hIn
return (r, hs)
dGetChar = IODirector $ \hs@(hIn,_) -> do r <- hGetChar hIn
return (r, hs)
dFlushOut = IODirector $ \hs@(_,hOut) -> do hFlush hOut
return ((), hs)
dSetBufferingIn m = IODirector $ \hs@(hIn,_) -> do hSetBuffering hIn m
return ((), hs)
dSetBufferingOut m = IODirector $ \hs@(_,hOut) -> do hSetBuffering hOut m
return ((), hs)
dSetBufferingBoth = (>>) <$> dSetBufferingIn <*> dSetBufferingOut
dSetEcho m = IODirector $ \hs@(_,hOut) -> do hSetEcho hOut m
return ((), hs)
It’s basically a glorified State monad, and I bet I could even probably extend the extant State monad to let me do this if I really wanted to, though I’m not sure about the threading .. more on that in a second.
Its function is to let you specify an input and output Handle (e.g. (stdin,stdout)) which will be used for I/O within the context of the monad. The benefit here is not needing to thread the Handle pair throughout your code, and they get passed all the way down.
As this is my first monad, I had a bit of difficulty working out what everything should be: there was the new type, IODirector a, which actually represented the stateful computation, then making IODirector (and not IODirector a) an instance of Monad. Must remember that Monad takes a type constructor (* -> *).
Next were the difficulties in threading IO through/in/about IODirector. I knew that I had to do it, I just didn’t know how, or where. Reflecting on the newtype declaration now, it seems obvious that the type it wraps is (Handle,Handle) -> IO (a, (Handle,Handle)), which is just a way of saying α -> IO β for some α, β—essentially, any other I/O action which takes an argument. Next was working out where to return values into the IO monad correctly, but again, on reflection it’s clear that it’s in returning the aforementioned stateful computation’s result that it needs to be wrapped into IO, and indeed, that the entire computation therein is a part of IO. Simple!
The binding rules as a part of IODirector’s Monad instance take care of the threading (and indeed, look exactly like the State monad’s bind/return)—then only the functionality needed to be added. Though not required, I encapsulated the requirement for acting as a “MonadDirectedIO” in a type-class of the same name, and then made an instance of it on IODirector.
I lastly had some difficulty in working out how to correctly define dSetBufferingBoth in point-free form: I was trying all sorts of things involving sequence and various monoid and monad constructs, but was repeatedly running into a brick wall because I never involved (>>)! Duh. Applicative functor to the rescue. I’ve already done this before with more boring functions, so I should’ve realised this a bit earlier (as I’d already defined it as dSetBufferingBoth m = dSetBufferingIn m >> dSetBufferingOut m, so it should have come a bit more quickly …).
Here’s what you can do as a result:
main :: IO ()
main = do
runIODirector myIoFunc (stdin,stdout)
return ()
myIoFunc :: IODirector ()
myIoFunc = do
n <- myNamePrompt "I am an IO func: "
dPutStrLn ("Hi " ++ n ++ "!")
return ()
myNamePrompt :: String -> IODirector String
myNamePrompt n = do
dPutStr n
dFlushOut
r <- dGetLine
return r
You can see that the type-signatures are exactly the same as they would be for a normal IO-centric process, only IO is now IODirector, and the I/O functions themselves are prefixed with d. It’s also trivial to change the in/out channels, if you so wanted, by adding dPut in the same vein as State’s put.
This fun was had while writing a small Go game in Haskell, just to get some real practice with the language. Many thanks to Miran Lipovača’s Learn You a Haskell for Great Good!
Onward!
I realised something today: Ruby is my work language. I remember years ago when a group of people were discussing, I think on RubyInside (or maybe just ruby-talk), how to get Ruby into the workplace. There was Ruby Insurgency by Andy Hunt—when I review it now, it’s more or less exactly how I introduced Ruby at my workplace. It started off being used by me to do quick little hacks here and there, prototypes and the like; then I introduced it in more mission-critical places one-by-one until it formed a keystone of many of the pieces of architecture.
We have a sizable amount of code that works in production (our testing framework) entirely written in Ruby—its size is of equal magnitude with the software it’s testing. Preprocessors running over all our production code are written in Ruby. The glue that holds our pieces of architecture together are written in Ruby. We’re at the last stage, “Ruby Rules”, of the Ruby Insurgency.
And I can’t enjoy it anymore. Sad, but true. I love OCaml—I don’t think I can actually code in any other language, any more, and have the same feeling of joy that I get with ML, because it’s a very specific one, and Ruby and other dynamic languages can’t deliver it. I don’t think Ruby will be replaced anytime soon as my desktop calculator and ultra-fast script for batch processing and other text analysis, but this basically reduces it to a glorified Perl. Which, I suppose, it is.
I’ve been known to espouse the various joys and delights of Ruby’s object model, but the reality of the matter is that I rarely use it in the use-cases for Ruby that I have left—when I do, it tends to be only because the script is explicitly for one-time use or insular use, and thus will feel no restriction in appending a dozen instances to String or Class or so on. The supposedly great things that I tout the language for are the ones I use least in larger-scale projects.
Of course, this is somewhat a lie—I do use them in larger-scale projects, such as swiftest—but it’s exactly my point that it’s a work thing. While I’ll admit that I don’t have the ML nous to write anything of the complexity or size of swiftest in ML, I’m working my way very readily there in my own spare time (latest achievement: fairly complete (though minimal) lisp interpreter!), and it was only because I had the requisite knowledge on Ruby that I was able to contribute swiftest in Ruby.
That may sound obvious, but the revelation behind it is this: Ruby didn’t let me build swiftest—I did. Anyone could build a similar system in their language of choice, and while naturally the instrumentation of the resulting system and how it would look and feel would sure differ, their end result would be the same, and I daresay a lot of the architecture in any one system would map more-or-less directly onto the architecture in another. I feel the need to draw a parallel with convergent evolution here, though I think one would be neither necessary nor correct, but perhaps you can feel what I’m trying to say: even with a different set of primitives and ideals or motivations at the language level, once you start building greater levels of complexity, you tend to end up with similar abstractions, even if they’re composed in different ways.
You might wonder what draws me to OCaml. It can’t escape all the points made above, right? Well, probably not. But it jives with me even more than Ruby did when I first encountered it—and that’s not to say that Ruby jives with me any ‘less’, per se, but rather I’ve realised that I can jive with a programming language a lot more than I’ve been assuming for a long time. Jive.
I’d like to elaborate on this some more, soon