Laszlo Erdodi (1980), professor in Psychology at Windsor University in Canada, gave a lecture on Performance Validity Tests at the VG-dagen in November 2019. TBV took the chance to interview him about his specialty, and the use of PVTs in occupational health and insurance medicine.
Can you tell us something about your background?
I was born and raised in Transylvania, but I’m Hungarian. I left for the US in 2001, and then immigrated to Canada in 2007. I’m a clinical neuropsychologist and Associate Professor at the Department of Psychology at the University of Windsor in Canada. Within the clinical PhD program we have three sub-specialties: Child Clinical Psychology, Adult Clinical Psychology, and Clinical Neuropsychology.
Do you see patients?
Very few, these days. I’m a full-time academic, I teach, I do research and supervision and academic service. However, I do maintain a small private practice, where I see patients from time to time.
What’s the main subject of your research in the last year?
My main interest is performance validity tests (PVTs) and psychogenic interference in cognitive assessment. I never wanted to get into psychological trauma, because I felt it would be too upsetting of a topic to study. The irony is that in order to understand the complex etiology of PVT failures, I had to circle back to trauma.
Another important research interest of mine is order effects. We usually have a long battery of cognitive tests: four to six, sometimes up to eight hours. Yet, we never ask the question: does matter when a test is administered? Last year I published a paper on the topic – the title says it all: Early bird fails the PVT?1 The short answer is yes, it does. If the PVT was administered early in the morning, at the beginning of the administration, there was a markedly higher failure rate.
Do you have any idea why?
No. I’d rather not speculate (chuckles). My best guess is there was a warm-up effect: early on, patients were tense, didn’t know what to expect, maybe worried about the outcome, so were unable to demonstrate their best ability. This is the phenomenon of psychogenic interference. However, as the testing unfolded, they became comfortable with the situation, developed rapport with the examiner, and were more relaxed. The study was a based on a mixed clinical sample that was cognitively higher functioning. They were all from New England, the North-Eastern US. For some reason, the average scale IQ for these patients with medically confirmed neurological-psychiatric problems was 100, whereas it is typically around 85 in the rest of the US. It would be interesting to see if the results generalize to other populations.
What’s the main goal that you want to achieve?
I want to push the boundaries of our understanding of PVTs. Standard recommendation is to administer PVTs as early within the battery as possible, because of a natural learning effect: examinees get better at them (i.e., they crack the code) once they have done a few. However, the early-bird study shows that timing influences outcome, to a degree that is hard to ignore. We know there are morning people and evening people, but both types had an equal chance of being in the sample, and the effects were quite pronounced across the whole sample. So we showed a pattern of test performance that wasn’t known before.
What are the latest insights on interpretation and predicting value of failed performance validity tests?
There’s an emerging understanding that multi-variate models are the way to go. It’s always been broadly accepted that you should administer multiple PVTs, but how do you translate them onto the same scale, how do you combine the cumulative data that they provide? That’s an ongoing area of investigation, where I feel I have contributed a little bit.
In 2009, Pearson, the publisher of the Wechsler IQ and memory scales in the US, introduced a model to compute a multi-variate base rate of failure. They split a given clinical sample into different levels of PVT failures, and then computed the percentage of the sample that failed each combination of PVTs at a given level. That revolutionized the way we combine the evidence from multiple instruments into a single, easy-to-interpret summary score. However, not all tests are created equal, not all scores contain the same amount of information: some have an alpha (prioritizing sensitivity), some have a beta (prioritizing specificity) bias. Pearson’s model doesn’t allow one to combine weak evidence with strong evidence. That’s why I developed a novel method of aggregating multiple individual PVTs into a single composite (i.e., the Erdodi Index) that takes into account both the number and extent of PVT failures.
Finally, derivative PVTs offer a unique opportunity to examine the relation between multiple test scores. Take, for example, the two components of the processing speed index on the Wechsler IQ scale, coding and simple search. They’re scaled the same: 10 is average, standard deviation is 3. They measure essentially the same thing: visuomotor processing speed. So if those two scores are markedly discrepant, that provides indirect evidence that the lower score isn’t reliable: you can’t run a marathon in the morning and then need a wheelchair in the evening – or vice versa. Interestingly, this discrepancy score is robust to genuine neurological impairment: most patients with severe brain injury pass this derivative PVT.
Normally, a neuropsychological test includes two or three PVT’s. You do many more.
Three simply isn’t enough. I prefer a comprehensive assessment. I once got into trouble for doing too much testing, but I showed that the outcome is ambiguous if I only do the minimum requirement. Additional testing can clear up ambiguity, and allow the assessor to make more accurate diagnostic decisions and treatment recommendations.
What’s the value of the outcome if the number of tests is too limited?
You have to be careful. An examiner who knows the tests well, can manipulate the outcome through test selection. In a hypothetical situation where someone puts a gun to my head and says ‘make sure this patient comes out as valid’, I can pick two or three PVT’s that are super-easy to pass, so that’ll give me ‘objective evidence’ that this patient produced a valid profile. Likewise, I could pick a combination of PVTs that would maximize the likelihood of an invalid profile.
So doing more PVT’s would lower the examiner bias. Should we ask which tests will be taken if we outsource PVTs?
Yes, or you may mandate a minimum, say so many PVT’s from a list, so they can’t manipulate the outcome just by picking the certain ones that are likely to produce a desired outcome.
Should patients be instructed before a neuropsychological assessment?
Are people typically honest, or not? If you think people are normally honest, then say nothing.
The clients that we see, have been on medical leave for at least two years, many have other problems as well. The outcome of the test can play an important role in the social benefit they will receive.
The little evidence that is out there, from Merckelbach for example suggests that moral priming does not change outcome. If you tell people they should do their best, or even if you say you can tell if they don’t, you are in fact complicating the situation. The evidence suggests that people who want to malinger will still malinger, but will be more careful, and therefore, harder to detect.
‘Priming clients does not change outcome’
After introductions and social pleasantries, the opening statement for every assessment is ‘I’m going to do some testing with you. Some of these tests will be easy, some will be difficult, please try your best on all.’ That’s the standardised introduction of the Wechsler IQ scale, that is a good default instruction. I don’t see any reason to go above and beyond that.
Sometimes we tell people that they should do their best, because then we can measure what we need to know.
That’s all right. The outcome of a performance based test can only be manipulated downward. In other words you can’t pretend to have better memory than you do. Glenn Larrabee, one of the giants of PVT’s in the US, recommends to ask patients to report accurately: ‘if you have a lot of symptoms, say so; if you have very few, say so too; don’t exaggerate or minimize your problems, try to be as honest as possible’. Stick to neutral, non-specific, non-leading instructions.
Do you choose your tests differently according to level of education of clients?
No, but pre-morbid functioning is a strong predictor of everything. For example, I had a patient who was a physician. He had noticed small slip-ups at work. Nothing major, but he wanted to be assessed. He said ‘I love my job and I don’t want to retire, but I have enough money to do so, and I don’t want to be one of those people who waited too long to retire, and are now remembered for the mistakes they made in their last years, rather than their illustrious career’. In the whole battery, there was only one score in the Low Average range, on a memory test. Nothing alarming. For anyone else I would have said ‘you’re fine’. But we know that physicians, pre-morbidly, have very good memory. So I suggested to him he should probably retire. For anyone else his age it would be a great outcome, but for him it’s probably a warning sign.
When we design and calibrate PVTs, we go out of our way to make sure that legitimate reasons for low cognitive functioning, such as low level of education, won’t negatively impact test outcome. But in reality, underlying cognitive ability is a significant predictor of outcome. That’s why we should do more research on different populations. That’s another emerging topic: population-specific cutoffs.
Last year, I published a paper about validity cut-offs of PVT’s embedded within the Stroop test.3 My cutoffs were high, even for a high-functioning population, because they were specific to non-credible responding. Now, just recently, a group of researchers published a paper that argued that my cutoffs were too high (i.e., prone to false positives). So I read their paper, it was well-written, well-designed. I found a couple of technical issues that might explain the difference. I evaluated my own cut-offs in a new sample from Windsor, a cognitively lower-functioning population, and I found that the cutoffs that worked for the New England sample don’t work for the Windsor sample. It was a powerful reminder that cutoffs don’t always ‘travel well’, so they should not be assumed to generalize to different populations.
In your lecture, you emphasised that clients who fail a PVT might regard that as an accusation of exaggerating or malingering, and therefore, as a condemnation of their moral character. Should insurance physicians take this risk of self-judging or self-stigmatisation into account?
I don’t know about self-stigmatisation. I do believe you should be more worried about stigmatisation by the evaluator. When patients feel they’re unfairly judged, it can lead to emerging non-credible responding.
I’ll give you an example. One of my patients told me, right out of the gate, that he hated doctors. His neurologist, after taking a quick look at him, had told him ‘you need is a haircut and a job’. I can understand why that neurologist said that, and he was technically correct, but saying it out loud not only doesn’t help, it’s outright counterproductive. In the first hour of testing, he was very guarded, and failed five PVTs, which is more than enough for me to stop testing and conclude that his profile is invalid. However, I continued with the assessment. Soon after the first hour, the patient became more comfortable, let his guard down, and his data were valid from that point on. So I based the report on his test data after the first hour. This is another example of psychogenic interference.
‘How the physician phrases things influences outcome’
How you phrase things does influence outcome. You are the precedent for the next evaluation. If you come out too strong, if you use accusatory language, then the patient will clam up, becomes defensive and suspicious, and the next evaluator will have a harder time doing the assessment. Often, these complicated patients have been tested over and over and over again, often producing inconclusive results. Naturally, it’s a frustrating experience for them.
During my fellowship, I developed a routine to conclude my reports with ‘if you have any questions about this report, please contact me’. The patients that I suspected of malingering never called me. The patients who really wanted to know what was happening to them, who produced invalid data but without realising it, called me immediately upon receiving the report, demanding answers. So I saw them again to explain more about the results. Some of those sessions were really intense. However, these one-hour feedback sessions were my only opportunity to intervene, and change the course for the better. Many patients were dealing with undiagnosed complex trauma, were spiralling out of control, physicians didn’t understand what was going on and they didn’t know how to deal with it. Functional neurological disorders fall outside the expertise/scope of practice of most medical specialties.
I had a patient, a trial lawyer, with subclinical somatic, conversion-like symptoms. In court, just before delivering his closing argument, she had a complete black-out, for a few moments where she felt disoriented and out of place. She managed to collect herself, gave her speech and won the case, but she was terrified by this incident. Someone in her family had been recently diagnosed with Alzheimer’s, she had small children, her practice was expanding. She said: ‘I want to know what’s wrong with me, so I can plan accordingly’. She failed one PVT – the rest of her neurocognitive profile was intact. That made me understand why she was worried (i.e., inexplicable, intermittent cognitive failures), but luckily that rare moment of malfunction occurred on a test that has little to do with genuine memory problem. She had told me that she had PTSD, and I advised her to seek help for that. That’s a rare success story and a good example of therapeutic use of PVT. Unfortunately, the patient-centred use of PVTs is still a radical notion. You don’t use PVT’s to unmask a malingerer, but to better understand the patient.
As an example, I had a patient who was the victim of sexual abuse but was high functioning. She claimed a benefit because of severe cognitive failure. PVTs showed that she didn’t have cognitive problems, so her claim on those grounds were rejected, but she might get a benefit because of emotional problems at work due to the abuse.
In that case, I’d recommend therapy. This late in life she has a chance to let go of the impairment, and in the eleventh hour do something with her life – which in my view is much better than to maintain the victim status.
Psychiatrists aren’t so familiar with PVTs or SVTs.
I think that should change. Two or three decades ago, PVTs weren’t common in neuropsychology, and now they’re standard practice. Do psychiatrists have the time, the training and the interest to start doing PVTs? Probably not yet. There are some rare exceptions, of course: I met a psychiatrist in Windsor who knew PVTs and PVTs better than the average neuropsychologist.
Should insurance physicians be trained in PVT’s?
Yes, I think they should. A brief, basic level of training would be a good start. As much as a General Practitioner needs to know about cognitive functioning. What’s the standard of care? If the patient, or a family member, reports cognitive concerns, the physician notices naming problems, disorientation, a GP might do a basic screening test like the Mini-Mental State Exam of Montreal Cognitive Assessment, and based on the score, refer to a specialist.
One brand new instrument is the Inventory of Problems, the IOP-29.4 It’s one page of self-administered questions. The patient can answer true or false or ‘Doesn’t make sense’. It takes five minutes to complete. To score, the assessor types in the patient’s responses in a scoring software and receives a probability estimate of the psychiatric symptoms being credible or not.
You say we should always consider the probable meanings of PVTs.
That’s right: each PVT score adds a little more information to the overall clinical picture. Performance validity exists on a continuum – it’s only for practical reasons that we dichotomize it as Pass/Fail, valid/invalid.
We as insurance physicians have to weigh probabilities and finally decide yes or no. Ethical aspects can play an important role. What’s your idea about that?
You mean how should you handle the constraints of a dichotomous outcome? That’s a tough one. I recently argued for adding a third outcome, the ‘indeterminate range’ as a legitimate outcome to officially recognize and thus, legitimize ambiguous cases.5 Ultimately, it depends on local norms, rules and values. There’s a consensus in the western world that it’s better to let ten guilty people go free, than to lock up one innocent person. That’s why SVTs and PVTs are optimized for specificity. At the end of the day, it’s a value judgement – that ideally carefully weighs the cost, the pros and cons of each type of error – but it’s also a moral judgement.
Zuccato BG, Tyson BT, Erdodi LA. (2018). Early bird fails the PVT? The effects of timing artifacts on performance validity tests. Psychological Assessment, 30 (11), 1491-1498. https://doi.org/10.1037/pas0000596
Merckelbach H, Collaris J. (2012). Mother Teresa doesn’t help here: lack of moral priming effects on malingered symptom reports and what we can learn from it. Psychologica Belgica, 52 (2-3), 271-285. http://haraldmerckelbach.nl/artikelen_engels/2012/Mother_Teresa.pdf
Erdodi LA, Sagar, S Seke, K Zuccato BG, Schwartz ES, Roth RM. (2018). The Stroop Test as a measure of performance validity in adults clinically referred for neuropsychological assessment. Psychological Assessment, 30(6). http://dx.doi.org/10.1037/pas0000525
Inventory of Problems – 29, an omnibus test designed to assist practitioners assessing malingering of psychiatric or cognitive complaints.
Erdodi LA. (2019). Aggregating validity indicators: The salience of domain specificity and the indeterminate range in multivariate models of performance validity assessment. Applied Neuropsychology: Adult, 26(2), 155-172. doi: 10.1080/23279095.2017.1384925.