No!! Well, maybe a little…
In fact I need to receive data from a microcontroller board. It doesn’t have sufficient CPU and memory to have an http stack. So we have to use tcp. However, the person implementing it also wants to use ssl. But not http. Yeah, puzzles me too.
I couldn’t find much by way of documentation, so here’s how to do a server for that using Puma:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
And now (obviously after gem install puma
) you run it with
1
|
|
and you connect to it with
1
|
|
Provided your cert and key are correct and you have the CA root certificates
in /etc/ssl/certs
, that will also verify the cert. Now you say
hello
, press enter
and the server should tell you what you said.
And nobody else will know. Muahahaha!
As usual, please tell me if you find mistakes.
]]>Please let me know if you find errors.
computer with network interface, and preferably at least one other host on the same subnet.
Ruby >= 2.1 installed (2.0 will probably work too. 1.9 doesn’t)
pry (or irb, but not recommended), preferably with a working readline/editline.
some ruby experience
Dataflow concurrency is a simple but powerful idea. It rests on single-assignment variables (sometimes called immutable values) with the addition of blocking semantics, in other words any thread attempting to use an unbound variable will wait until another thread assigns a value to it.
Dataflow variables fall in the same category as I-Vars, Futures and Promises.
We’ll look at threads (covering some little-known things about the Thread
class). Then we’ll see how to use delegators to make dataflow pleasant to use.
Then we will implement a single-value, single-assignment data store with
blocking semantics in Ruby (these are sometimes called I-Var
or Promise
).
This will require a mutex and a condition variable which will also be explained.
Loosely, concurrency means more than one thing happening at the same time. Like in real life. Everybody runs around doing their own thing, and fights erupt whenever we have to share. I know, because I have two sons and only one big tub of lego.
My understanding of dataflow comes primarily from “Concepts Techniques and Models of Computer Programming” by Van Roy and Haridi. Very worthwhile book, I do recommend it. There’s also an edx course.
When you see EXERCISE, it would be best if you try to figure out the answer for yourself, although it’s a line or two further down anyway. Be brave! Give it a try!
Any discussion of threads in Ruby has to mention the GVL/GIL. MRI has it, jruby and rubinius do not. Effectively it means that concurrency in MRI is limited in a particular way. I’ll defer the full explanation until mutexes later on.
The first thing you need to know about threads is that every running program is/has at least one.
1 2 3 |
|
EXERCISE: start up pry, type that in.
A Thread instance is a wrapper around an OS resource / entity. This will show you all live threads.
1
|
|
Thread instances outlive their related OS resource. You’ll see why later on.
So, coding time. Pretend that you’re stranded somewhere on a distant server within a heavily fortified private network, with no root access to install new packages. You have only ruby (but you can’t install gems), and a minimal set of system tools. There is another server – you have to talk to it. However, its ip address is from dhcp, and there’s no dns on this network. DHCP reassigns the ip address fairly frequently. You need to use ping to find the missing server. Most of the ip addresses on this private network are not in use. Sometimes the server is down for an unspecified period.
If you haven’t met ping, in bash say ping localhost
and see what happens. Ctrl-C to
make it stop.
ping -c1 localhost
will only send and wait for one packet.
In ruby, ``
or %x{}
will execute a system command, returning a
string containing stdout from that process. You can do string interpolation
inside of those. The return code from the command will be in $?
until you
execute another command.
This is how to find your ip and subnet in ruby.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
EXERCISE: Find the addresses on the your subnet that respond to pings and put
them in an array called results
. There is actually a problem you’re going to run into
once you get the code working. If you know what the problem is already, just
write the code anyway. Ctrl-C
will be your friend.
1 2 |
|
BONUS CHALLENGE: This is purely optional – if you feel like stretching your
regexes. results
should also contain the response time. As a float.
How long does that take? About 254 * 4 = 1016 seconds moreorless 16 minutes. Which is a long time to find a handful of ip addresses. Effectively, it’s doing this:
1
|
|
So what happens when you do this:
1
|
|
It takes about 4 seconds. How long does it take to boil an egg? 3 minutes. How long does it take to boil 4 eggs? Not 12 minutes.
EXERCISE: Now do the ping code with threads.
1
|
|
You’ll notice that the ip addresses don’t come back in order. Why is that?
ANSWERS:
1) Wait time for reply vs non-reply. The ips that are claimed by a host will reply to the ping quickly and will therefore be at the beginning of results. ip addresses that are not claimed will be subject to a wait of about 4 seconds until ping decides it won’t get a reply.
2) Nondeterminism. CTM explains nondeterminism as something that happens which is outside of the control of the programmer. In this case, the thread scheduler is making decisions that the programmer (that’s you and me) has no control over. We also have no control over how long it takes for a server to answer a ping. Note that these ips may be in order anyway – this would be an example of multithreaded code being correct by accident.
Nondeterminism applies even in a really simple case like:
1 2 |
|
We don’t know. It’s nondeterministic. Try it out a few times in pry. It won’t necessarily be different every time.
Nondeterminism is mostly harmless in this case because it’s easy to restore the lost information (the ordering). But there’s a case where it’s a big problem. First the deterministic program:
1 2 |
|
And now the nondeterminism from threads:
1 2 |
|
This is called a race condition. Which is badly named (because it actually originates in logic circuit design). It’s more like an accident at an intersection because none of the cars was willing to wait. Well, I spose you could see that as a race…
QUESTIONS:
1) Why is a
nil when you execute this in pry?
1
|
|
2) Why is results nil, or mostly empty when you execute the following?
1
|
|
ANSWERS:
1) the threads testing and setting a
have not yet executed
2) threads shelling out to ping have not yet put values in results.
So the existence of values in results
depend on when you ask for them. And a
partial set of results is not really useful, right? This is one of those basic
logic situations: you know when you’ve found something, but how do you know
when you haven’t found it? You have to search everything. In this case we’re
lucky because ‘everything’ is a relatively small array. So how do we ensure
that we have searched everything, before concluding that the server is down?
Think about a fetching your kid from school (or being fetched when you were a kid). What happens when class finishes a few minutes early, or you arrive a few minutes late, or a few minutes early. Or your child is negotiating a play date with friends? Or there was a traffic jam on the way to school. How do we handle those situations? We wait. Hopefully we’re not waiting in different places…
The traditional way is Thread#join
, and I’m going to show you that
first, even though it’s a bit clunky.
1 2 3 |
|
In other words, we’ve brought determinism back, by waiting. The one
thread waits for the other thread to join
it. The two control
flows join together back into one (the one calling .join
).
EXERCISE do that in the ping exercise so that by the time we ask for the
results, they’re guaranteed to all be there. Be careful, if you bring
determinism back too early you’ll be waiting 16 minutes for your results.
Remember that join
is a method that you call on the thread instance that you
want to wait for.
An important question to ask as this point is: what happens when an exception is thrown in the thread block?
1
|
|
Nothing spectacularly continues to happen.
But with this:
1
|
|
the exception shows up in the thread calling join (after the thread being waited for has raised the exception, of course).
EXERCISE: Now do a deliberate typo and get exceptions for using pong instead of ping.
ANSWER: Really?
In other words, exceptions do not come out of threads unless you wait for them. This makes sense – in which thread, and when, would the exception be raised? It would be extremely weird if at some nondeterministic time your main thread stopped because of an exception thrown by another thread. So unless you really don’t care what happens to a thread, somewhere you have to wait for it to finish. (Or you have to set abort_on_exception)
So I said I was showing you the traditional way. Well, the exception showing up
when you call join is already somewhat non-traditional. The nice way of doing
synchronisation uses the ruby idea that everything is an expression, even a
thread. You use Thread#value
:
1
|
|
Notice 1) we don’t need a variable anymore, 2) Why bother with Thread#join
then? It returns the thread instance, and takes an optional timeout
parameter.
EXERCISE: Go over the ping code and use .value
to get rid of unnecessary
variables.
1
|
|
.value
and .join
are the simplest form of synchronisation.
Synchronisation is also badly named (maybe also from circuit design where you have a clock signal?) In programming it makes more sense to think of synchronisation as waiting for something. And you can see how this applies to asynchronous operations – no thread waits. However the logical implication of asynchronous operations is that something has to busy-wait, or do a polling loop, or have some other mechanism to get the value later.
Congratulations. You’ve written a correct program, using threads. All we’ve done so far is learn how to wait. Waiting is also called blocking, or suspending. Here’s a first description of some other ways to wait. Just read over them for now and don’t worry if you feel you don’t understand things fully:
mutex (mutual exclusion) forces other threads to wait. The Mutex class unsurprisingly has a method called synchronize.
condition variable is a way to avoid busy-waiting. It allows a thread to go to sleep and be woken up by a signal. It’s always used in conjunction with a mutex.
monitor is a re-entrant mutex that’s shared across multiple methods often in an object. eg the synchronize keyword in Java. There is a Monitor class in ruby, and it’s useful for ADT style objects, ie stacks, lists, things like that where you have multiple ways of accessing shared state, that must remain consistent, ie not have race conditions. But, it’s quite coarse-grained because it basically locks the whole object.
We all know that a program is essentially a sequence of statements. What I’d like to talk about now is that threads allows parts of a program to execute in a different order than the sequence of statements in the code. In fact we saw this already with race conditions, but I want to show you the useful case.
Think about the ping example: first you need to get your ip and subnet and then can spawn 256 threads to find possible servers in that subnet. In this case you have 2 dependencies – the ip and the subnet.
What happens if you had more dependencies? Then you have a dependency tree (dependency sapling at this point, because it’s so small). Imagine you have a broken bash shell and you have to find the full path of the ping command and also find the ip and subnet. ip address and the bash shell path have no dependencies on one another, so they could execute in separate threads. In sequential code it doesn’t matter which order you do them in. So in concurrent code you might as well just do them at the same time.
From this point of view, code is a linear representation of the tree of dependencies for an expression.
EXERCISE: find the full path of the ping command in separate thread from
ip address and the concurrent pings. ENV['PATH']
is :
-separated.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
BONUS EXERCISE: search the path concurrently. Is it faster than the equivalent single-threaded code? Why?
1
|
|
Obviously in the code, you have to have have path-search before subnet or vice-versa. But when those blocks execute, that can be concurrent. So can you see how threads decouple execution order from statement order? There’s a bit of wriggle room (or maybe a lot) because of nondeterminism. But synchronisation enforces the ordering of the dependencies of an expression. In single-thread code, this ordering is already implicit (dependencies always come before the statement which needs them) and we usually don’t think about it.
EXERCISE: Draw yourself a picture of the dependency tree of the ping code. Then turn it upside down a few times and contemplate the dataflow ;–)
Whenever you change your assumptions, there will be consequences. So we need to look at some of the consequences of using threads to decouple order of execution from sequence of statements. We’ve seen one of the classic problems with threading – race conditions which are a form of nondeterminism. We’re going to briefly look at the other, which is deadlock. But lets start with termination.
We’ve all done this involuntarily at least once:
1
|
|
Fondly known as an “endless loop”.
Will this next one terminate? Execute it in pry and look at top and sort by CPU usage (and then kill the thread or exit pry)
1
|
|
Will this one ever give back a value?
1
|
|
No – the thread never terminates, and so things waiting for the thread to end will wait forever. So it’s kinda ½ a deadlock.
The easiest way to explain full deadlock is with Queue, which is actually really useful in several other ways too – you can think of it as ruby’s sortof-equivalent of channels in Go.
1 2 3 4 5 |
|
So that’s how a queue works. q.pop
will return a value from the queue if
there is one, otherwise it will wait for a value to be pushed. That
will hopefully be fairly intuitive for you by now.
But back to deadlock:
1 2 3 4 |
|
Lucky for us, ruby is smart enough to detect this is possibly a deadlock. But lucky for me, ruby also lets me ignore that warning, so I’ll do it again:
1
|
|
Will that ever terminate? Why?
Deadlock occurs whenever you have mutual waiting. Which generalises to a cycle, hence the dining philosophers problem. Such cycles can sometimes be hard to figure out, like when one of the waiting places is not in your code, or when the waiting places are in completely different parts of the codebase.
We’ve seen that the ping code can be non-terminating (when you don’t use -c1). Is there a way to make the ping code deadlock? Why?
You remember I was talking about dependency trees in expressions?
A useful property of a dependency tree is that a tree (by definition in graph theory) has no cycles. So tree-structured concurrency means that deadlock because of mutual waiting cannot happen. Of course it’s still possible for one of the subexpressions in that tree to call some external code which waits forever (eg leaving off -c1 from ping), or other code which deadlocks because of access to shared values. So we don’t magically get deadlock breaking or deadlock avoidance. But it does make a whole class of deadlock-creating situations disappear. And it greatly reduces contention for locks.
We’re mostly done with threads, so here’s a summary:
programs can do (or wait for in the GVL/GIL case) more than one thing at the same time. This is concurrency.
concurrency introduces nondeterminism, because threads allow the order of execution to be decoupled from the sequence of statements. Uncontrolled nondeterminism can result in race conditions.
Since we can’t rely on the sequence of statements to know when results are ready (which is what we do in single-threaded code), we need synchronisation (ie waiting) to enforce the order of dependencies.
cyclical mutual waiting is deadlock, which is also nondeterministic. It does not happen every time you run the code. Which is why it’s hard to debug.
Effectively, Thread#value
makes a thread behave as an explicit
future. It doesn’t have a value now, but it will, in the future. Why
explicit future? Because you have to ask for it, by calling
.value
. Calling .value
will wait, until there is a value
to return. So futures are: 1) a way of synchronising concurrent operations,
and 2) a way of expressing the idea of a thing that does not have a value
yet. And not only does a thread instance behave as a future, it also
behaves as a single-valued future. It can only ever have one value. So it’s immutable.
So I said earlier that I would cover threads first so that I could explain dataflow. Well, we’ve actually already seen quite a bit of dataflow. Dataflow is effectively blocking semantics for things that don’t have a values yet. It’s called dataflow because as those things change state from unvalued to valued, the data flows through the dependency tree. So dataflow is a weak form of state (that’s straight out of CTM).
We’ve seen how threads decouple the order of execution from the sequence of statements in the code. Dataflow allows the order of execution to be changed without affecting the result of a calculation. It’s deterministic. You can think about the code as if it were sequential, but use threads wherever you want, to make it concurrent. Well, as long as there aren’t other race conditions (ie non-atomic shared writable state). Which is all over the place in your average ruby program. So you have to be careful.
But the dataflow we’ve used so far is quite far from that ideal, so let’s
bring it a bit closer. The ping code, again. That .value
everywhere?
Ewww. Say for example you have a complex calculation of some kind (in other
words a large dependency tree like an income tax calculation, or a invoice
including discounts and VAT and partial payments and forex), and you’d like to
retrofit it with dataflow concurrency without having to go around and add
.value
all over the place. We’d like to just pass in something that will let
it continue operating as before, except concurrently. In short, how do we get
rid of .value
?
1 2 3 4 5 6 |
|
EXERCISE: Create a class called Waiter
, which inherits from Delegator
. You
have to implement the __getobj__
and __setobj__
methods. __setobj__
should raise an exception (Why?).
BONUS EXERCISE: Waiter#initialize
can optionally take a block which
gets passed to a new thread instance, or it can take an existing thread
instance.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
What happens if you comment out inspect
and pretty_print
?
EXERCISE: now redo the full ping example with Waiter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
QUESTION: What happens when you replace all Waiter.new do
with
begin
? Things should work exactly the same (obviously without
concurrency).
And now we have implicit (because we don’t have to call .value
) immutable
futures with blocking semantics, that is they behave just like an ordinary old
variables or objects, except that any method call will wait until there is a
value inside the delegator. Syntactically, delegators are as close as we can
get to proper dataflow variables in ruby. However Ruby works against the idea
because it lets us reassign the variable referencing the delegator at any
time. So you have to be careful, or refactor to use method calls on frozen
objects instead of local variables.
The other place that ruby works against dataflow is the idea of unbound or unassigned variables. What’s the difference between declaring a variable, and assigning it?
You can declare a variable without assignment/binding in (amongst others) Java, Scala, c, c++, Javascript.
And ruby? Local variables? No. Instance variables and globals? Kindof – they’re automatically initialised to nil.
So we don’t have unbound variables in ruby. But as we’ve just seen, threads behave a little bit like a thing that does not yet have a value, and we can make it behave almost exactly the same as an ordinary variable/object by putting a Waiter delegate around it.
However, the one thing you can’t do with our thread + delegator approach is declare one of these in one place, and assign it in another. You need to know how, at the time you create the thread and its delegator, how to calculate the value. This is not always the case. A not-so-good example is: how would you handle a subnet with 65536 ips? Or 16777216 ips (the 10.x.x.x subnet)? Too many threads. So you need to be able to declare the variables first (to keep the ordering) and assign them later.
So we need something that we can wrap in a delegator, and that wrapped thing must have blocking semantics with single-assignment.
These things, where you can actually set the value instead of it being implicit as part of the creation the way it is with Thread and Waiter, are sometimes called promises (as opposed to futures), but from what I can see there’s a lot of overlap in the terminology.
So essentially we need to implement a single-assignment value store with blocking semantics, to take the place of the Thread inside the delegator. That means:
it can be read by as many threads as necessary, as many times as we like. Can Waiter/Thread do this? Yes.
reading threads will block/wait/suspend until it has a value. Can Waiter/Thread do this? Yes.
the value can be set exactly once. Subsequent attempts should raise an exception. Waiter/Thread cannot do this.
you can set an exception to be raised when the delegator is accessed, like with Thread#value
A minimal implementation of this needs a value
and value=
method.
But it must be completely threadsafe.
Do you remember from the different ways of waiting, what we can use here? We have to use a mutex and a condition variable. A condition variable requires a mutex, so let’s do that first.
A mutex (mutual exclusion) is a bit like a cubicle in a public toilet or an airplane. Only one person gets to use the cubicle at one time. Only one thread gets to enter (or occupy) the mutex at any one time. While there is a thread in the mutex, all other threads must wait until the occupying thread exits the mutex. A mutex is also called a lock, although sometimes lock is the name of an operation on a mutex. The GIL/GVL is a mutex around the entire C implementation of MRI, with some unlocks for waiting on IO and other things so that those don’t block threads which can perform useful work.
Mutexes can be used to prevent race conditions. This is the one from earlier:
1
|
|
EXERCISE: use a mutex to prevent the race condition.
1 2 3 4 |
|
So a mutex groups a sequence of statements together, and makes them atomic in
other words nobody else can use the cubicle while you’re busy. It’s still
nondeterministic because we still don’t know what value a
will have, but
at least now it’s atomic. So actually, instead of a ‘race condition’, calling
it an ‘atomic failure’ would be more accurate. That way when someone asks why
it’s taking so long, you can say you’re cleaning up the fallout from your
atomic failure.
In the context of a SingleAssign store, mutex lets us ensure that the value is only set once. Now, for threads wanting the value when it’s not available yet, we need a way to put threads to sleep and wake them up again. For this we need a ConditionVariable.
Condition variable is also badly named. It’s easier to think of it as fifo list of threads (I’m deliberately not saying queue, because most implementations don’t use an actual queue, they just behave as if they do). And the condition part is actually outside the actual ConditionVariable instance. So now it’s badly named twice. It should have been called a WaitList. “Unfortunately sah, we’re full, but we’ll notify you if we have a cancellation.”
EXERCISE: I can’t come up with a good exercise here, so take a look at the documentation for ConditionVariable, and think about which methods should be used where.
This is a simple implementation that satifies the requirements above. It’s less efficient than it could be, but it’s deterministic and it’s atomic, and it only allows one assignment to value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
It’s somewhat inefficient because once the value has been set, the call to
synchronize
and the check for is_a?(Exception)
are no longer necessary.
Fortunately because of the single-assignment assumption, it’s fairly easy to
optimise if necessary. Interestingly because of the single-assignment
assumption, we can mostly ignore the ‘condition’ part of ConditionVariable
(and some of its other complexities).
QUESTIONS:
1) Can @waitlist
and @cubicle
be lazy-assigned? Why?
2) how many threads could concurrently access instance variables
inside `initialize`?
3) What options are there for making #value
more efficient?
ANSWERS:
1) @cubicle
no because you would get a race condition assigning it.
@waitlist
well, you could. But it probably wouldn’t be worthwhile.
2) initialize
is always single-threaded – for your code. The GC thread(s) would have access, but presumably that’s threadsafe.
3) Options I can think of: different modules that implement value
and value=
(ie the State pattern)
EXERCISE: A Delegator for this is almost the same as Waiter, but obviously
you’ll need to implement __setobj__
as well.
1 2 3 |
|
And now (at least in ruby) the ultimate in messing with order of execution. Obviously this won’t work
1 2 |
|
But this will
1 2 3 4 5 6 7 |
|
Concurrency is when several things are happening at the same time. Threads decouple the order of execution from the sequence of statements. Synchronisation is waiting for various events.
Using concurrency in single-threaded code is always going to execute more slowly, because of the overhead of thread creation and synchronisation. So your performance improvement from concurrency needs to more than offset that.
Dataflow variables are things which can be initially unbound, and then they can be assigned only once. Threads that want their value will have to wait, until the value is assigned. Obviously this only makes sense in a threaded environment. Strictly speaking, dataflow variables are reassignable provided that the new assignment is compatible with the existing assignment. This only really makes sense in languages that use unification (Prolog etc) as opposed to languages that use assignment (as ruby does).
Dataflow is when you have a tree of dependencies (which is anyway just an expression), and the values themselves are the synchronisation points. There is no deadlock because a tree has no cycles. Similarly, there is very little contention for mutexes.
We can emulate dataflow variables in ruby using a delegator, which can contain a thread or a single-assignment store with blocking semantics.
Ruby core and standard libraries:
Gems:
Posts and articles:
Books:
You can in fact deserialise very nicely and get back a structure of objects instead of a nested hash. Yeah, yeah. I know. functions + property-structs are all the rage. But as the man said, whenever uptake exceeds understanding you end up with a pop culture (Yes, that was an Appeal to Authority) .
Let’s say you want to have a nicely human-readable file (why else would you want yaml? It’s slow, old, and unfashionable…) and you want to import it into ruby:
1 2 3 4 5 6 7 |
|
If you don’t know why you might want to do something like this, compare to the pure-ruby version:
1 2 3 4 5 6 7 8 9 10 11 |
|
The yaml version is pleasant without the “ and ‘ and {} and % characters (unless they provide you with a sense of security and comfort…) With a nod to the fashionistas, json can’t come close in readability. Even EDN and s-expressions can’t be much more concise. They are however the way to go if you hate : and , .
1 2 3 4 5 6 7 |
|
But I’m digressing.
Sometimes I like to think of objects as convenience wrappers for domain data, aka key-value pairs, aka hashes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
And with that class, the extraction code looks like this:
1 2 3 4 5 6 |
|
Which is not terrible. But still. There should be a nicer way to do it. After all, the yaml spec has tags like !!str and !!float to specify types when it’s not obvious from the context.
The obvious approach is to use a tag like !ruby/object:WeekDays
But I can’t exactly complain about the , and “ and ‘ and {} and [] and % characters
and accept !ruby/object:WeekDays, can I now?
Well, there is a way using ruby and Psych and standard yaml tags. It’s not even hard. Just undocumented by Psych:
1 2 3 4 5 6 7 |
|
The !days
tags are the clue. Following is the Psych interfacing.
1 2 3 |
|
But since I wanted to handle constructing from both strings and arrays, it’s a bit more complex. It’s not hard though, really. Go ahead and read the comments. The ones in the code. Below. They’re important.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
Now when you say
1
|
|
you get the following pry dump (‘#’ removed cos they break the syntax highlighting)
1 2 3 4 5 6 |
|
So how about this then:
1 2 3 4 5 6 7 |
|
And as an extra bonus, if you just parse that without the relevant classes and Psych tags defined, you get back the good ole nested hash of strings ‘n’ things.
Whaddya know – optionally self-describing data.
And if you really must have a schema (implemented in ruby, naturally), check out Kwalify and Yes .
]]>@ivars
in views. More recently I realised you can do
class ActionHandler < Module; end
and I’ve been wondering what one could usefully do
with that.
This morning I had yet another scenario where I was adding three extra methods
to a controller so that one action method could be nicer. None of the other actions
would ever use those 3 methods. And as usual there were @ants
in my @pants
,
causing the proverbial itch which, according to legend, gives rise to open source.
“Separate controller” says my Refactor Pedant who sits on my right shoulder (my left shoulder being occupied by my Cowboy Coder).
And I thought “Hmm. Well, why not just put them in a module?”
And presently the following code emerged:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
So in a nutshell, you define the action method in the controller using the handle :owner
call, which does two things (What!? A method that does TWO things!?!? EEEeeeek. Run awaaaaay…):
owner
method so that rails will have a method to call
and a default view to renderSo now, in the view, you can refer to project
and owner
instead of @project
and
@owner
. Yes, the accessor for @owner
will overwrite the owner
action method.
Eww and Aahh and uuuuh! Well whaddya expect when you have to work around framework
stinkiness!?
In short, a way to get rid of name clashes in the controller, get rid of @ivars in views, and use that wacky inheriting from Module thing. Not bad, eh?
And it would be easy to iterate through all public methods defined in the handler block and add them as helper_methods. Making them automatically available in the view.
]]>No eval, which is possible because it just uses method objects.
Tries to use closures wherever possible to minimise lookups, partly driven by the functional workshop after rubyfuza 2014.
Quite naive because it just raises for methods with parameters, and will more than likely fail with singletons.
Just uses instance variables, so you can reset them at will.
Definitely works in 2.1, untested in 2.0.
class Array
def nilify; empty? ? nil : self; end
def unify; (0..1) === size ? first : self; end
end
module Memo
module ClassMethods
def memoed_methods
@memoed_methods ||= {}
end
# provide a legal ivar name from a method name. instance variables
# can't have ? ! and other punctuation. Which isn't handled. Obviously.
def ivar_from( maybe_meth )
"@#{maybe_meth.to_s.tr '?!','pi'}".intern
end
# store the original method, replace it with a method
# that memoizes the result.
def memo( meth )
unbound_previous_method = instance_method meth
raise "can't memo #{meth} with arity #{unbound_previous_method.arity}" if unbound_previous_method.arity != 0
memoed_methods[meth] = unbound_previous_method
ivar = ivar_from meth
define_method meth do |*args, &blk|
if instance_variable_defined? ivar
instance_variable_get ivar
else
# bind the saved method to this instance, call the result ...
to_memo = unbound_previous_method.bind( self ).call( *args, &blk )
# ... memo it and return value
instance_variable_set ivar, to_memo
end
end
end
end
# hook in class methods on include
def self.included( other_module )
other_module.extend ClassMethods
end
# reset some or all memoized variables
# return cleared values
def clear_memos( *requested_meths )
(requested_meths.nilify || self.class.memoed_methods.keys).map do |meth|
if instance_variable_defined? ivar = self.class.ivar_from(meth)
remove_instance_variable ivar
end
end.unify
end
end
class YourFunkyChunkyCode
include Memo
# for 2.1
memo def expensive
# do various things
end
# for 2.0
def even_more_expensive
# do various more things
end
memo :even_more_expensive
end
yfcc = YourFunkyChunkyCode.new
yfcc.expensive
# the currently memo-ed values, and other stuff
ivars = yfcc.instance_variables
# and all the possible memo-ed values (some of which don't exist yet)
memo_ivars = YourFunkyChunkyCode.memoed_methods.keys.map{|meth_name| YourFunkyChunkyCode.ivar_from meth_name}
# only currently memo-ed values
ivars & memo_ivars
require 'memo.rb'
require 'faker'
describe Array do
def random_values( range )
rand(range).times.map{|i| [nil,rand,Faker::Name.name,Object.new].sample}
end
describe '#unify' do
it 'nil for 0 elements' do
[].unify.should == nil
end
it 'atom for 1 element' do
value = rand
[value].unify.should == value
end
it 'array for >=2 elements' do
random_values(2..15).unify.should be_a(Array)
end
end
describe '#nilify' do
it 'nil for 0 elements' do
[].nilify.should == nil
end
it 'array for >=1 elements' do
random_values(1..15).nilify.should be_a(Array)
end
end
end
describe Memo::ClassMethods do
let (:subject) { Object.new.extend(Memo::ClassMethods) }
describe '#ivar_from' do
it 'ivars a normal method to symbol' do
name = Faker::Name.name
subject.ivar_from(name).should == "@#{name}".to_sym
end
it 'ivars a ? method' do
name = "are_you_mad?"
subject.ivar_from(name).should == "@are_you_madp".to_sym
end
it 'ivars a ! method' do
name = "jetais_perdu!"
subject.ivar_from(name).should == "@jetais_perdui".to_sym
end
end
end
describe Memo do
describe '.memo' do
it 'raises on non-zero arity' do
class_def = -> do
Class.new do
include Memo
memo def calc( x )
end
end
end
class_def.should raise_error(/with arity 1/)
end
it 'stores previous method' do
kl = Class.new do
include Memo
memo def calc
rand
end
end
name, body = kl.memoed_methods.first
name.should == :calc
body.name.should == :calc
body.should be_a(UnboundMethod)
end
it 'calls previous method once' do
inst = Class.new do
attr_reader :call_count
include Memo
memo def calc
@call_count ||= 0
@call_count += 1
rand
end
end.new
3.times{inst.calc}
inst.call_count.should == 1
end
it 'memos value' do
inst = Class.new do
include Memo
memo def calc; rand; end
end.new
# value should be the same for all calls, even though it's rand
first = inst.calc
6.times.map{inst.calc}.uniq.tap do |uniq_values|
uniq_values.size.should == 1
uniq_values.unify.should == first
end
end
end
describe '#clear_memos' do
before :each do
@method_names = %i[calc work apply]
@inst = Class.new do
include Memo
memo def calc; rand; end
memo def work; rand; end
memo def apply; rand; end
end.new
method_names.each do |meth|
@inst.send meth
end
end
attr_reader :inst, :method_names
it 'clears all memos' do
method_names.each do |meth|
inst.instance_variable_defined?("@#{meth}").should be_true
inst.instance_variable_get("@#{meth}").should_not be_nil
end
inst.clear_memos
method_names.each do |meth|
inst.instance_variable_defined?("@#{meth}").should be_false
end
end
it 'clears some memos' do
method_names.each do |meth|
inst.instance_variable_defined?("@#{meth}").should be_true
inst.instance_variable_get("@#{meth}").should_not be_nil
end
to_clear = [:apply, :work]
inst.clear_memos( *to_clear )
to_clear.each do |meth|
inst.instance_variable_defined?("@#{meth}").should be_false
end
inst.instance_variable_defined?("@calc").should be_true
end
end
end
instance_eval
), or prefixing every call with a local variable
(implemented using yield self
)
Turns out there is a way to get the best of both. Which works well, almost all the time. And ends in two rather unexpected places: one is a really odd error; and the other is CoffeeScript-style function definition/call syntax. Sortof.
class DslObject
def initialize &block
evaluate &block if block_given?
end
def evaluate &block
case block.arity
when 0
instance_eval &block
when 1
yield self
else
raise "Too many args for block"
end
end
def do_something_useful rhs
puts rhs
end
end
The instance_eval
vs yield(self)
issue is well known. So this section is for you if you’re
not already clear on that.
Use yield self
and you have to prefix every call with a block variable:
class Other
def surname; 'de la Grace'; end
def yld
DslObject.new do |dsl|
dsl.do_something_useful surname
end
end
end
Other.new.yld
de la Grace
=> #<DslObject:0x9fa554c>
but prefixing every call with a local variable becomes painful in some cases,
for example all the t.
in an ActiveRecord migration.
But in order to make the prefix unnecessary you have to use instance_eval
, and
then code inside the block can’t access methods defined outside the block:
class Other
def surname; 'de la Grace'; end
def inst
DslObject.new do
do_something_useful surname
end
end
end
Other.new.inst
=> undefined local variable or method `surname' for #<DslObject:0x9ea5390> (NameError)
Which is quite a severe limitation.
Use a delegator that knows about both the binding for the block, and the dsl object, and can send method calls to the right place.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Now Other.new.inst
will work.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
=> Oops.new.inst
de la Grace
NoMethodError: undefined method `w' for #<Oops:0xa7ac54c>
from combinder.rb:16:in `method_missing'
=>
Waaaaat!?!
This is caused by the way the ruby interpreter distinguishes between
a method call and a local variable. In this case, the local variable takes_args
is in the binding for the block, so it’s not treated as a method call.
And because that happens in the interpreter, there’s no way to hook into
it and produce a more meaningful error message.
Aside:
%
is being treated as thesprintf
shortcut, andw[one two three]
is not syntactically correct. Unlessw
one
two
andthree
were defined. And I’ve seen another unexpected syntax error in this situation, when passing a literal symbol. Because:
has other meanings in Ruby.
Of course, if you said
takes_args( %w[one two three] )
it would all work fine because the (...)
marks takes_args
as a method call, and there’s
no ambiguity with the local variable, so it ends up in Combinder#method_missing
.
Another workaround is to define methods in Combinder
like this:
class Combinder
def __outside__
__bound_self__
end
def __inside__
# This is a bit harder than __outside__, but can be done
end
end
which would allow explicit access to the binding (__outside__
) and the dsl object (__inside__
),
and those could be used to resolve ambiguous naming.
Squeel
has my{ }
which similarly gets through the instance_eval
block boundary.
So seeing as there are at least 3 workarounds, my opinion is that the weirdness of the error message is the biggest drawback.
This part I discovered by accident. In Combinder
I had some code for accessing the local variables
in the binding. This code turned out to be unnecessary because ruby already accesses those.
But that code sparked off a
realisation that since a method call can be ‘forced’ using ()
, it would be possible
in Combinder#method_missing
to check
if there was a callable object with that name (ie respond_to?( :call ) == true
), and call it.
Resulting in something like this:
fn = ->(*args){ puts "fn gives you: #{args.inspect}" }
functionaliser do
fn(%w[coffee script style])
end
fn gives you: ["coffee", "script", "style"]
=> #<CoffeeDsl:0xdbfe7e0>
So the block inserts indirection into the resolution of names so that it’s possible to treat Procs as methods. I didn’t go any further down that rabbit hole, mainly because right now I don’t have any sensible use cases for something like that.
]]>In fact, this applies to any ‘single transaction’ contemplated in the Act.
DISCLAIMER: I am neither lawyer nor estate agent.
We’re in the process of selling our house. We spoke to an estate agent, and gave him a mandate. He then asked for our FICA documentation – ID documents and proof of address (He’s selling the house that we’re currently living in, but nevermind…)
My understanding is that FICA documents are required to prevent money laundering, so I couldn’t see how it was necessary to provide them in order for the estate agent to execute our mandate. No money has or will changed hands at this stage in the process.
It irks me that these days one has to hand out one’s ID document to all and sundry, progressively reducing the identifying value of said ID document as it becomes progressively more exposed to identity theft. So I went digging and found out the following:
The FINANCIAL INTELLIGENCE CENTRE ACT 38 OF 2001 (also on info.gov.za) Part 1, Section 21, DUTY TO IDENTIFY CLIENTS says
Identification of clients and other persons (1) An accountable institution may not establish a business relationship or conclude a single transaction with a client unless the accountable institution has taken the prescribed steps-
(a) to establish and verify the identity of the client;
[etc, relevant only to proxies] [my emphasis]
The accountable institution in question is the estate agent, and ‘business relationship’ is clearly defined by the Act:
‘business relationship’ means an arrangement between a client and an accountable institution for the purpose of concluding transactions on a regular basis;
By that definition, there is no business relationship between a seller or buyer on the one hand and an estate agent on the other, because there will not be “transactions on a regular basis”, so the ‘single transaction’ clause must then apply. Therefore you are only required to hand over your FICA documentation on concluding the transaction.
However the Estate Agency Affairs Board specimen regulations document says that Estate Agents are required to collect FICA documentation before executing a mandate. This is incorrect.
The specimen regulations seem to be based on FIC guidance PCC 10 which says
- Part 1 of Chapter 3 the FIC Act – The duty to identify clients
2.1 Part 1 of Chapter 3 of the FIC Act, and specifically section 21 of the FIC Act deals with the identification of clients and other persons. Section 21 prohibits accountable institutions from establishing business relationships or entering into single transactions with their clients unless they have established and verified the identities of their clients, or established and verified the identities of persons representing their clients. [my emphasis]
This is incorrect. Section 21 of the FIC Act does not require FICA documentation on “entering into a single transaction” (my emphasis), it requires FICA documentation on concluding a single transaction.
And in fact if you look at the rest of PCC 10, you’ll see there are repeated references to concluding a single transaction, and the fact that FICA documents are required only at that point.
As I said, I’m not a lawyer, so I’m open to correction on this. But it seems clear that there is an error in PCC 10 which has been propagated by the Estate Agency Affairs Board, unnecessarily placing a fairly onerous record-keeping burden on estate agents and annoying buyers and sellers.
And I find it astonishing that this “entering into a single transaction” error is now in documents downloadable from banks and other large financial institutions – even though they would be covered by the ‘business relationship’ clause. Somebody please tell me I’m wrong…
]]>Gem.source_index
:
1
|
|
Fortunately a fix is relatively easy. This monkey-patch works in at least one real 2.3.17 application, although it ignores vendor gems. It has not been tested extensively and may cause your rails app to turn into a DSL for generating backtraces.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
bundle install
is really slow for me. It
downloads the entire specs file from rubygems.org every time I run it.
Maybe that’s because I’m using rvm, and it can’t find previously
downloaded specs. Or something. I haven’t investigated.
And Gemfile.lock
is constantly causing conflicts when I switch
between branches. Which is a Right Royal PITA.
After some not insignificant frustration, I realised that I could do
something like this in Gemfile
. It is after all just ruby code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
Asidedly, this is also handy because one of the boxes this is deployed to does not have sqlite installed. Which may sound weird, but I don’t have root access on that box. And getting things installed is a lot more painful than just writing some code.
And now all I need is a way to keep /var/cache/gems
up to date and
with the correct files and directories:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Now bundle install
talks to the local cache, and it’s fast.
/^\
to Marc and everyone else.
The written version of the talk as a PDF. Sorry, you had to be there to get chocolates ;–) Do it yerself, Rube!.
The Cog on youtube. Worthwhile for the soundtrack.
And the code snippets, which I didn’t have time for:
Fetch a hash from yaml and provide dot-notation access
1 2 |
|
Allows you to trust your executable configuration files. Either by signing or encrypting them.
1 2 3 4 5 6 7 8 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
How to load from untainted file names
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Loading untrusted code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
And some of the things you can and cannot do while running under $SAFE = 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
This is a nice way to see the distinction between class-oriented languages (eg c++,Java) and object oriented languages (eg Ruby) where any instance can have behaviour different to that of the other instances of the class it belongs to.
Also an example of the Class abstraction leaking.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
Every time I do a talk, I learn something. In this case it was these global functions. String, Integer, Float, Array are implemented like this. Also Pathname. Obviously not a good thing to do unless you have objects working at that level.
1 2 3 4 5 6 7 8 9 10 |
|
The private
is there because otherwise you could do something
like "banjo".OpenStruct( key: 'value' )
which would work, but
which doesn’t make much sense.
yaml is easy to type and easy to read. json is fiddly and fussy. So why is json in so many places where humans have to edit it?
For example, have you ever tried to put a ssh private key in a json file? (Let’s not, for now, debate the security implications of that, OK?)
Anyway, a couple of reasons that I can think of, in no particular order:
There’s always tension between user-easy and developer-hard. (Nevermind that the users in this case are developers. Just not the developers.) YAML is harder for computers to parse and generate because it’s easier for humans, with our advanced context-dependent grammar parsing abilities.
But since converting from YAML to JSON is really trivial (as @chrismcg pointed out):
1
|
|
there’s really no excuse for using json in places where humans have to edit it and there’s no pressing need for blinding speed. Like in chef databag definitions in your cookbooks.
But then inspiration struck, nigh unto that which striketh a man as he standeth in the shower. Rake tasks, similar to Ruby classes, are open – you can add dependencies to them later. Oh happy day!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
For the record, there’s also a knife plugin that does something very similar (I haven’t tried it): Knife plugin to create data bags from YAML files — Gist
]]>And some features that I learned the hard way: * tables are transferred in chunks of 10000 records, each chunk as a transaction. This keeps memory usage pretty much constant across the lifetime of the transfer. * use the Sequel::Dataset#import method, which will use bulk update statements if the underlying DBMS supports that. * use yield for progress indicator
It’s not very fast – will transfer a 5Gb database in a couple of hours. If you have bigger data you probably also have a bigger budget for serious data transfer software ;–)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
1 2 3 4 5 6 7 8 9 |
|
etc
]]>So assuming /tmp has a ram fs of some kind mounted on it (tmpfs on linux, OSX has one, I’m not sure what it’s called or how to mount it):
1 2 3 4 5 6 7 8 9 10 |
|
Now tell rails where to find your db. Edit config/database.yml
to have
1 2 3 4 5 |
|
… and the test setup dance
1
|
|
Run tests fast(er). For fun say sudo iotop in another terminal to check that the
postgres processes do not write data to disks. You might also want to create
log/test.log
as a link to a file on the ramfs:
1 2 3 |
|
When you’re finished, say
1 2 |
|
1 2 3 4 5 6 7 8 9 10 11 |
|
Now if you have
1 2 3 |
|
you can do things like this in the controller, using Sequel’s querying capabilities…
1
|
|
…and naturally this in the view
1 2 3 4 |
|
Drawbacks:
You’ll end up having 2 db connections – one for ActiveRecord and one for Sequel. Which is fine unless you have a transaction and one connection does a write which you’re expecting to show up in the other connection. Like in unit tests. So turn transactional fixtures off for those.
Calling ActiveRecord::Base.instantiate skips callbacks. I’m sure there’s an easy way to hook them back in again.