There's enough for everyone

गते गते पारगते पारसंगते बोधि स्वाहा गते गते पारगते पारसंगते बोधि स्वाहा

Nicer YAML Deserialisation

Ruby’s current yaml support is good, but under-documented. Syck (from 1.8.7 and 1.9.2 IIRC) had some easily understood and well-documented features for de/serialising. But Psych … well it works, obviously. But I’ve bashed my head on it a couple of times trying to do the stuff that took 5 minutes with Syck.

You can in fact deserialise very nicely and get back a structure of objects instead of a nested hash. Yeah, yeah. I know. functions + property-structs are all the rage. But as the man said, whenever uptake exceeds understanding you end up with a pop culture (Yes, that was an Appeal to Authority) .

Let’s say you want to have a nicely human-readable file (why else would you want yaml? It’s slow, old, and unfashionable…) and you want to import it into ruby:

calendar.yml
1
2
3
4
5
6
7
James:
  Tue, Thu: [13h30, 15h30]
  Fri: [13h00, 17h00]

Fred:
  [Tue, Thu]: [14h00, 15h30]
  Fri: [13h00, 17h00]

If you don’t know why you might want to do something like this, compare to the pure-ruby version:

calendar.rb
1
2
3
4
5
6
7
8
9
10
11
calendar_hash = {
  'James' => {
    'Tue, Thu' => %w[13h30 15h30],
    'Fri' => %w[13h00 17h00],
  },

  'Fred' => {
    %w[Tue Thu] => %w[14h00 15h30],
    'Fri' => %w[13h00 17h00],
  },
}

The yaml version is pleasant without the “ and ‘ and {} and % characters (unless they provide you with a sense of security and comfort…) With a nod to the fashionistas, json can’t come close in readability. Even EDN and s-expressions can’t be much more concise. They are however the way to go if you hate : and , .

calendar.edn
1
2
3
4
5
6
7
{ "James" {
    "Tue, Thu" ["13h30" "15h30"]
    "Fri" ["13h00" "17h00"]}

  "Fred" {
    ["Tue" "Thu"] ["14h00" "15h30"]
    "Fri" ["13h00" "17h00"]}}

But I’m digressing.

Sometimes I like to think of objects as convenience wrappers for domain data, aka key-value pairs, aka hashes.

week_days.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class WeekDays
  def initialize( *days )
    self.days = *days
  end

  attr_reader :days

  # Handle
  #  days = 'Mon'
  #  days = %w[Mon Tue]
  #  days = 'Mon,Tue'
  #
  # protected so it's effectively a value class.
  protected def days=( str_or_ary )
    @days =
    if str_or_ary.size == 1
      str_or_ary.first.split /,\s*/
    else
      str_or_ary
    end
  end
end

And with that class, the extraction code looks like this:

extraction
1
2
3
4
5
6
calendar_hash.map do |name,day_hash|
  day_times = day_hash.map do |days,times|
    [WeekDays.new(*days), times]
  end
  Hash[*day_times]
end

Which is not terrible. But still. There should be a nicer way to do it. After all, the yaml spec has tags like !!str and !!float to specify types when it’s not obvious from the context.

The obvious approach is to use a tag like !ruby/object:WeekDays But I can’t exactly complain about the , and “ and ‘ and {} and [] and % characters and accept !ruby/object:WeekDays, can I now?

Well, there is a way using ruby and Psych and standard yaml tags. It’s not even hard. Just undocumented by Psych:

calendar.yml
1
2
3
4
5
6
7
James:
  !days Tue, Thu: [13h30, 15h30]
  !days Fri: [13h00, 17h00]

Fred:
  !days [Tue, Thu]: [14h00, 15h30]
  !days Fri: [13h00, 17h00]

The !days tags are the clue. Following is the Psych interfacing.

1
2
3
class WeekDays
  yaml_tag '!days'
end

But since I wanted to handle constructing from both strings and arrays, it’s a bit more complex. It’s not hard though, really. Go ahead and read the comments. The ones in the code. Below. They’re important.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
class WeekDays
  # yes, you MUST have the leading ! otherwise Psych adds <> around your
  # tags, and the tags that it recognizes, and nothing works properly.
  # These end up in Psych.load_tags, which is the first place to check
  # if your classes are not deserialising.
  yaml_tag '!ds'

  # You can actually have several tags here. Psych will use the last one
  # it finds as the one for serialising. They're in Psych.dump_tags
  yaml_tag '!days'

  # For encode_with and init_with, coder will be a Psych::Coder instance.
  #
  # Psych::Coder#methods:
  #  []  []=  add  implicit  implicit=  map  map=
  #  object  object=
  #  represent_map  represent_object  represent_scalar represent_seq
  #  scalar  scalar=  seq  seq=  style  style=  tag  tag=
  #  type

  # serialise to yaml
  def encode_with( coder )
    # This doesn't actually have an effect. Apparently it should.
    # I guess that's a bug.
    coder.style = Psych::Nodes::Mapping::FLOW

    if days.andand.size == 1
      # don't set tag explicitly, let Psych figure it out from yaml_tag
      # coder.represent_scalar '!days', days.first
      coder.scalar = days.first
    else
      # don't set tag explicitly, let Psych figure it out from yaml_tag
      # coder.represent_seq '!days', days.to_a
      coder.seq = days.to_a
    end
  end

  # deserialise from yaml
  def init_with( coder )
    case coder.type
    when :scalar
      self.days = [coder.scalar]

    when :seq
      self.days = coder.seq

    else
      raise "Dunno how to handle #{coder.type} for #{coder.inspect}"
    end
  end
end

Now when you say

extraction revisited
1
YAML.load 'calendar.yml'

you get the following pry dump (‘#’ removed cos they break the syntax highlighting)

deserialised calendar.yml
1
2
3
4
5
6
{"James"=>
  {<WeekDays @days=["Tue", "Thu"]> => ["13h30", "15h30"],
   <WeekDays @days=["Fri"]> => ["13h00", "17h00"]},
 "Fred"=>
  {<WeekDays @days=["Tue", "Thu"]> => ["14h00", "15h30"],
   <WeekDays @days=["Fri"]> => ["13h00", "17h00", "preferred"]}}

So how about this then:

calendar.yml
1
2
3
4
5
6
7
James:
  !days Tue, Thu: !at [13h30, 15h30]
  !days Fri: !at [13h00, 17h00]

Fred:
  !days [Tue, Thu]: !at [14h00, 15h30]
  !days Fri: !at [13h00, 17h00, preferred]

And as an extra bonus, if you just parse that without the relevant classes and Psych tags defined, you get back the good ole nested hash of strings ‘n’ things.

Whaddya know – optionally self-describing data.

And if you really must have a schema (implemented in ruby, naturally), check out Kwalify and Yes .

Comments