Evaluate atomic and mutex based Take implementations.
#15 brought in a new implementation of Take to avoid starving out the older mutex implementation (a major performance win). Before we cut a new release, let's dive a little deeper on the new implementation's behaviors and aim for a single implementation.
The atomic implementation has two warts that make me weary. I don't like reaching for unsafe if it's at all avoidable. The fixed padding is also architecture specific. I'm not sure yet, but it may be possible to avoid the unsafe pointer swap. I'm not too sure what to do about the padding.
The older mutex implementation held its lock during the blocking sleep call. Holding the lock doesn't appear to be necessary (compute and set last & sleepFor, unlock, sleep). Should this hold true, let's use the new benchmark to compare the three implementations.
Nothing more planned.