What is the right sample size we need to get accurate results in user tests?

What is the right sample size we need to get accurate results in user tests?

Pretend this is the project objective

Find the right price for our new product “chewing gum”

Let’s say we want to sell the product to a population of 3.000.000 people. Ideally we should ask should ask “how much would you pay?” to everyone for maximum accuracy.

Accuracy of 100% = ask the question to the total sample

But because running a tests has a $ cost, we need to sacrifice accuracy by asking the question to less people. So how many people we need?

Here is a method I use:

We start with 2 assumptions:

  1. We want to sell the product to the whole population: 3 million.
  2. The lower the price, the more people will buy the product
  3. There is a price limit for the product.

Then we made up some numbers that will require further validation but it will give us an indication of what we are looking for. Ex:
I say:

  • 3m will happily pay $2.00 for a pack of chewing gum
  • only 250.000 will pay $30

Then I fill up all the data in between.

Interestingly, you will see that the “media” is (1.500.000 people paying $15 p/chewing gum) is the most profitable number as it give the biggest revenue.

How do we validate these numbers?

There are many ways to do it, a classic one is asking people the question: how much would you pay and fill up the data in the previous grid. Similar to prime minister elections: how many people vote for this or that candidate…
But it will be very difficult and costly to ask 3.000.000 people. Therefore we need to sacrifice accuracy and ask only to a small sample that will give us an estimated indication.
There are few calculations we can do to reduce the sample with the minimum loss in accuracy.

Survey Monkey offers a very visual and easy to use calculator.

This calculator offers 2 parameters to play with:

  1. Confidence level which is the probability that your sample reflects the attitudes of your population
  2. Margin of error: reflects how close you are to the confidence level set in the calculation.

The higher the confidence level and smaller the margin of error, the higher the sample size.

What is the right sample size then?

Well, it depends on your budget for testing and how much accuracy or “resolution” you need in your results.
A simple analogy can be made with a pixelated image: 

  • Can you recognise the giraffes? – Perhaps yes.
  • Can you recognise the city – Perhaps no.

This “pixelation” defines how much we can compromise our sample

Let’s do an exercise:

Let’s pretend we have a budget of $70 to resolve a problem that affect a population of 100 people
1 test = $10
100 tests = $1000

What would be the right compromise to get the best possible accuracy?

We have 2 options:

Incremental margin of error:

or decremental confidence level

The compromised generally used in a sample of 5 to 12, which looks something like this:

This method gives a very low “resolution/accuracy” on the results and it has to be taken as serious as a pixelated image. User testing results on small samples are just a guidance. Reality is the ultimate fact checking method to validate an idea but not even 98% accuracy is a definite answer. 

If you are interested in understanding current methods of data capturing and analysis using artificial intelligence, I suggest read the age of surveillance capitalism by Shoshana Zuboff