All You Need to Know About Tests
by Dr. Wendell Williams

This article outlines some important concepts you need to know if you use tests. If you don't know these concepts, you will have no idea whether your tests work or not.

A "Good" Test

A good hiring test is not an MBTI profile, an AVA score, a DISC profile, an MMPI, or a social style. It is a predictor of job performance. Let me repeat. A good hiring test predicts specific traits, skills, and knowledge that have been formally documented to be important to job performance. I know this is a simple concept, but it is one that never gets enough air time. I often suggest that clients think of how they would answer the question, "Just what does Henry's XYZ style have to do with job performance and what proof do you have to support your opinion?" If you don't know the answer, you might as well consult the Psychic Hot Line.

Important Test Terms

Validation ... knowing whether a test score predicts anything. Validation usually involves giving employees your test, comparing their scores with performance and seeing if there is any "take it to the bank" relationship.

Performance Criteria ... performance data used in the validation study (and also the data you are trying to predict). Using "soft" performance criteria such as supervisor or team member ratings can produce flaky results. Sometimes this is all you have, but using "hard" measures like units produced, sales dollars, cold calls, customer ratings, and so forth, will provide better results. Using flaky data often yields misleading information.

Sample Size ... the number of people involved in a validation study. You usually need at least 15 people for the most rudimentary study. Even then, you are making big decisions based on a very few people. Better results are obtained as you increase the number of of people into the hundreds.

Normally Distributed ... a basic requirement of your data. Statistics assumes that test scores fall along a bell curve, job performance ratings fall along a bell curve and both curves are roughly the same shape. This is called "normally distributed, randomized data having equal variance". A real mouthful of words that roughly translates as, "use any other kind of data and you get "squirrelly" numbers that cannot be fully trusted".

Correlation ... the strength of test score-job performance relationship. Correlation varies from -1 (scores go up while performance goes down), to 0 (scores and performance are totally unrelated), to +1 (scores and performance move in the same direction). A + 1 or -1 indicate perfect relationships (often only occurring in your dreams).

This next part is heavy!

Variance ... a number that describes the amount of performance variance "explained" by the test score. It helps to think of variance as a dollop of whipped cream floating in a cup of hot chocolate. A variance of 1.0 means the dollop completely covers the surface -- that is,it explains 100% of the performance. Unstructured interview variance is about the size of a mini-marshmallow (or 4%), behavioral interviews are about the size of a nickel (or 10%), mental ability are about the size of a silver dollar (or 50%), and simulations are about the size of a floating ice cube (about 80%). By the way, correlation and variance are not equal. A test score correlation of .3 means that 30% x 30% (30% squared) or 9% of performance is explained.

p-value ... a number indicating the trustworthiness of the correlation. Smaller is better. It is roughly the probability that your correlation happened by chance.You generally want this number to be 5% or less (i.e., a very small chance you made a mistake). Small sample sizes tend to give BIG p-values (not a good thing).

Predictive design ... a study where you give the test to every applicant, seal the envelope and put it in a drawer. After about a year you open the envelope, score the tests and compare scores to job performance. Most people don't like to wait, so predictive studies are seldom done.

Concurrent design ... a study where you give the test to every job holder (not just the good ones) and compare test scores to job performance. Most people resist giving employees tests, but they are often the best source of information about job performance.

Restriction of range ... Imagine testing members of the PGA for their ability to putt a golf ball. Now imagine going to a mall and giving everyone a putting test. The PGA players would have very small differences. This is called restriction of range. It happens when using a concurrent test design. It makes test scores look like there is no correlation with performance. An experienced scientist controls for restricted range.

Group Averages ... Some vendors take all the good performers, put them into a bucket (good performers are usually very tiny people) and proclaim their average test scores are a "good" target. This is bad practice because it assumes 1) that all good performers are identical, 2) that poor performers have nothing in common with good performers and, 3) that "good performance" ratings are trustworthy. Test validation must work two ways -- predict high performance AND low performance.

Anecdotal Evidence ... another word for junk science. It means that no one ever did a decent study, but "everyone" knows the test "works".

Well, that's a start. Tests are not for the faint of heart. And there is a lot more to know where that came from. Unfortunately, knowing some basic stats is the only way you will ever know if your test scores predict anything important -- you know, like job performance.

Dr. Wendell Williams is Managing Director of, LLC. Contact Dr. Williams by telephone: 770.792.6857 and visit .

Many more articles in The HR Refresher in The CEO Refresher Archives


Copyright 2002 by Dr. Wendell Williams. All rights reserved.

Current Issue - Archives - CEO Links - News - Conferences - Recommended Reading