Specifying Histograms
One of the main things I wanted out of Hirsute was the ability to generate data based on non-uniform histograms. For instance, if most of your users have 0-10 friends, some other percentage has 10-50, and a small amount has 50-100.
But specifying that distribution was non-intuitive. You had to create an array of probabilities, they had to add up to 1, and they had to be the same length as your buckets.
Pondering how I might make it easier, I realized that what I wanted to do was draw out the histogram and let the system figure it out. So that's what I did.
This is now valid:
star_rankings = <<-HIST
****
********
**
HIST
and then you can add a generator as follows:
one_of([1,2,3],star_rankings)
Histograms no longer have to add up to 1 — the system will scale values appropriately — and they can be different lengths, though a histogram with more probabilities than values will throw an exception, while a histogram that has fewer probabilities will generate a warning.
Ranges As Results
If your generator returns a Ruby Range object, Hirsute will return a random value (based on a uniform distribution) from within that range. That lets you easily construct a script for the friends example above:
one_of([1..10,11..50,51..100],[0.75,0.2,0.05])
MySQL Batching And CSVs
The MySQL outputter now bundles up inserts for faster loading. CSV is now a supported output format.
Post-Generator Blocks Run Within Object
When you attach a block to a generator, the code in that block will run within the context of the generated object. This lets you access existing fields within the newly-minted object.
No comments:
Post a Comment