Thursday, February 28, 2013

Advances In Hirsute

Since I first launched Hirsute, I've been plunking away at it, making little changes here and there. I thought I'd do a quick post about the changes, some of which I'm quite happy with.

Specifying Histograms

One of the main things I wanted out of Hirsute was the ability to generate data based on non-uniform histograms. For instance, if most of your users have 0-10 friends, some other percentage has 10-50, and a small amount has 50-100.

But specifying that distribution was non-intuitive. You had to create an array of probabilities, they had to add up to 1, and they had to be the same length as your buckets.

Pondering how I might make it easier, I realized that what I wanted to do was draw out the histogram and let the system figure it out. So that's what I did.

This is now valid:

star_rankings = <<-HIST
****
********
**
HIST

and then you can add a generator as follows:

one_of([1,2,3],star_rankings)

Histograms no longer have to add up to 1 — the system will scale values appropriately — and they can be different lengths, though a histogram with more probabilities than values will throw an exception, while a histogram that has fewer probabilities will generate a warning.

Ranges As Results

If your generator returns a Ruby Range object, Hirsute will return a random value (based on a uniform distribution) from within that range. That lets you easily construct a script for the friends example above:

one_of([1..10,11..50,51..100],[0.75,0.2,0.05])

MySQL Batching And CSVs

The MySQL outputter now bundles up inserts for faster loading. CSV is now a supported output format.

Post-Generator Blocks Run Within Object

When you attach a block to a generator, the code in that block will run within the context of the generated object. This lets you access existing fields within the newly-minted object.

No comments:

Post a Comment