Jordi Renom «We have found tests in which an unanswered question shows who is best qualified»
At the Department of Social and Quantitative Psychology of the University of Barcelona, Dr. Jordi Renom works on the methodological verification of the quality of examinations as evaluation instruments.
Does the perfect test exist?
We do not use the expression ‘perfect test’, but rather focus on the degree of validity or reliability of a test. Our aim is to evaluate the quality of assessment instruments such as questionnaires, tests, and examinations.
What do you do exactly?
We test the test. If, for example, an entity holds a staff selection process or a competitive examination at an academic level and there is some type of suspicion or problem regarding the quality of that test, we offer an audit service, which consists of a whole process of quality control of the guarantees offered by their assessment instrument.
We carry out two types of audits, namely, preventive and reactive audits. For instance, if someone wants to assess the quality of a test before using it, a qualitative audit can be carried out by studying the material and ensuring that the test works correctly. For example, in 2019 we still find many examinations in which the option “no answer is correct” still appears. This option can be contested. Instead we need to use “none of the other answers is correct” or “none of the previous answers is correct”. It is a mistake, a silly little thing, but many teachers do not realize it and examinees can pick up on it. Moreover, there are patterns that students easily recognize: if you are not sure of which option is correct, choose the longest alternative; do not choose the first options at the top of the page or the last options at the bottom, because test authors tend to place the correct answers as far from the top and the bottom of the page as possible. These are patterns, routines that are perpetuated year after year. In order to detect this kind of things and any possible linguistic problems in the questions we conduct preventive audits.
Reactive or quantitative audits are carried out once the evaluation is over. On the basis of our analysis, before giving the results of the test, we can offer feedback to the authors on whether or not there are any suspicious or malfunctioning questions, as this could lead to a loss of confidence on the part of examinees.
How do you perform this quantitative analysis?
We basically use the original response matrix provided by the author. From this we can think of a series of analyses and simulations that make it possible to verify whether or not the option given by the author as correct is really correct. When we detect that there are items that psychometrically do not work we inform the author, which is the most delicate part of our work. There are authors who admit that there has been an anomaly, and others who go ahead with their original test as if nothing had happened.
Often, when we conduct an audit we ask about the situation: what is the test for, what impact does it have on the future of the candidates … All this is important, because we don’t do an aseptic analysis, we have to contextualize it. This final part, that of giving meaning to controversial questions, corresponds to the author.
In some of the courses we taught at the ICE we gave the teachers who attended the opportunity to bring their own exams. These were jointly revised by all of us, but there was always a moment of silence: when each teacher became aware that perhaps 20% of the questions of an exam they had written two years earlier did not work. Then they realized that the evaluation had been unfair, and this is hard to accept.
What are the most common problems with tests?
The knowledge that a given person may have of a given topic is not linked to the ability of this person to develop the tools to evaluate that particular knowledge. Now there is more sensitivity to this issue, but for years there has been no training in this field. In the case of teachers, they often reproduce the experiences they had as students, which in turn leads to the perpetuation of evaluation styles. There is a kind of acquired heritage that often causes problems.
In this sense, we sometimes find that over half of an official test had serious deficiencies that completely invalidated it. What worries us is that, if we take into account the accuracy that the authors attribute to their tests (you need a score of 5 to pass), this means that the instrument must have the same accuracy. Instead, we find examinations that have an error of +/- 2 points out of 10. Thus, sometimes, 20% of the people who failed an examination should perhaps have passed, and perhaps 20% of those who passed should have failed. This is not because they knew more or less, but because the exam had some design problems that made the limits of the pass / fail area very blurry.
In tests that penalize incorrect answers, the best option is to leave the answer blank. That is to say, the most qualified people have seen something strange in the question that those who are not so qualified have not seen, and as they want to avoid the penalty they leave it blank. We find tests in which leaving the question blank is the correct answer, because it is the one that indicates who is best qualified, and this is a serious problem. The examinees notice this, but they are passive, they trust the examination and its author, and in a way what we do is inject distrust. Because, who guarantees that a test works? Nobody.
Who is interested in this kind of service?
Recently we have signed agreements with private companies that want to guarantee a certain quality in their assessments. They do not have experts, but they want to evaluate the presence of some traits in their workers (resilience, entrepreneurship, motivational profile, and so on). It’s about making sure that the evaluation allows you to achieve everything you want to achieve.
How many places are there where they think tests are sacred, that the questionnaire is what it is, that it is unquestionable and that it works well by default? In these cases there is no awareness that tests are actually very biased. And if we talk about a test that can affect someone’s future, we must be very careful.
More about Jordi Renom
The best invention in history
Sailing boat.
The future invention you would like to see
The time machine.
The future invention you are most afraid of
The time machine too.
The FBG is…
It’s a gateway to society. It allows us to communicate and get in touch with society at an academic level. For me it’s a very important element, and I think that in this sense, we’re very lucky at the UB. Looking at the experiences of colleagues from other entities and universities, we realize that we have this true opportunity, which is not so common.