Mechanical Turk 6

I may be a bit slow on the uptake, but I’ve just discovered Mechanical Turk. It’s a service offered by Amazon, described as “artificial artificial intelligence”. Here’s the blurb off their site:

Get results from Mechanical Turk Workers. Ask workers to complete HITs – Human Intelligence Tasks – and get results using Mechanical Turk…

As a Mechanical Turk Requester you:

  • Have access to a global, on-demand, 24 x 7 workforce
  • Get thousands of HITs completed in minutes
  • Pay only when you’re satisfied with the results

My interpretation is that by paying them a small amount for each task assigned to them, Amazon have a huge pool of people at their disposal.

Could this be used for certain kinds of online usability testing? Has anybody tried using the service for that? I wonder how precisely you can specify which workers are assigned to your HITs?

Sample size and size of effect 17

I read with interest the latest newsletter from London based UX agency Foviance, including the article The more the merrier? by Mariana da Silva (the latest newsletter is unfortunately not on their website yet Update: the full article is now on their website).

Overall this was a fairly well written discussion of sample sizes in user research, in layman’s terms. But one statement confused me somewhat, I’ve highlighted it below:

With surveys, sample size estimation is also somewhat less straightforward than with standard usability evaluations. Here, the information being collected is attitudinal data, which by its sheer nature can be slightly fuzzy. It all comes down to the size of the effect you intend to detect. Imagine you wanted to know whether people in London are taller than people in New York. If people in London and people in New York are actually pretty much the same height, you will need to measure a high number of citizens of both cities. If, on the other hand, people in London were particularly tall and people in New York were shorter than average, this will be obvious after measuring just a handful of people.

Now, I’m no statistics whizz, but that last bit doesn’t make sense to me. Wouldn’t this only be true if you knew ahead of time that Lononders were “particularly tall”? Otherwise the handful you measured might just be anomalous.

Like I said, I may be missing the point and this is in fact an excellent illustration of a fundamental error in my thinking with regard to sample sizes. Feel free to share your thoughts.