Throughout this website I emphasise that all decisions regarding the management of driver behaviour should be based on the science of driver behaviour. Solid, practical science, that is. As I see it, any research in the field of human factors that doesn’t generate applications that can improve human performance is just “academic.”
There’s just one problem… While some excellent studies have made valuable contributions to our knowledge of driver behaviour, there’s also a lot of bad science out there. To put it bluntly, most driver behaviour research sucks.
For decades I had an “informal” interest in driver behaviour research, and I have to say that I was unimpressed with most of it. Then, when I entered academia to study for my master’s degree and started ploughing through hundreds of research papers, I found that the situation was actually worse than I thought.
Coming from a background as a practitioner (i.e. someone working in the field and trying to make a difference) I was struck by how many studies employ slapdash scientific method and/or are conducted in a manner that is so far removed from the real world that they are of little or no practical use.
Dirty little secrets
Every field has its dirty little secrets. Practices that everyone knows about and many engage in, but few openly discuss. It’s considered bad form to criticise these practices; it’s not “politically correct” (how I hate that phrase). In academia it goes something like this…
Researchers are expected to conduct and publish research (“publish or perish”). The problem is that most have miniscule budgets to work with. That means they can’t do the research they really want to do—which, in the field of driver behaviour, would be finding out how human beings really function when they’re really driving.
So we find that studies tend to be limited in scope. With few exceptions, studies are conducted on small and rather homogeneous samples of experimental subjects. Often we read in research papers something like, “The study was conducted with a sample of 22 psychology undergraduates, 17 female and 5 male, aged between 18 and 22.”
Such a sample is hardly representative of the driving population as a whole—they just happened to be those people who were conveniently available and amenable to taking part (do this and you might get in the professor’s good books)—therefore it is unreasonable to expect that the study’s findings would be representative of, or applicable to, the larger population. (This is such a common problem that some cynics argue that the field of psychology should, based on the subjects in typical experiments, more accurately be called “Psychology of undergraduates at major research universities.”)
The other common failing is the absence of proper control groups. In much of social sciences research, you want to find out what effect a particular intervention has had, just as you would in medical research. And to do that you need a control group.
So let’s take the example of driver training. It’s not enough to measure the before and after performance of a bunch of people who had driver training. There may have been other influences to which they were exposed that could have brought about, or contributed to, the change in behaviour.
The correct way to do that sort of before-and-after study is to start with a (preferably very large) population of people and then randomly allocate them to two equal-sized groups. Both groups are exposed to exactly the same conditions with just one exception: one group (the experimental group) gets the intervention, e.g. training, and the other (the control group) does not.
A handful of meta-analyses of the effectiveness of driver training have been done. (A meta-analysis is an overall analysis of the results from a whole load of studies. While one study may find one effect and another study the complete opposite, if you plot the results of many similar studies you can see the overall trend.) Only “proper” scientific studies can be included in a meta-analysis; many driver training studies have to be discarded because they lack control groups—a basic element of the scientific method.
The above limitations are common in a lot of academic research. But this article is specifically about why most driver behaviour research sucks.
The insidious flaws of most driver behaviour research
It’s not so much those obvious limitations that bother me but rather the insidious flaws. It’s not unusual to find that researchers employ methods they know are vastly inferior but which they present as acceptable substitutes and normal academic practice.
Cutting to the chase, most driver behaviour research sucks for two main reasons:
- it relies on what people say they do rather than on actual observation and measurement of their behaviour, or
- behaviour is observed under “laboratory” conditions, which are often so far removed from the real driving environment as to be laughable.
And the reason these shortcomings are tolerated? They’re cheap.
Don’t ask a driver what he does
Doing research on the cheap gives us hideous devices like the self-report questionnaire. This is strongly favoured in the social sciences because it’s the cheapest way to collect data (whether those data are of any use is another matter), especially now that it’s common to conduct surveys on-line instead of pushing paper around.
There are three basic but rather important problems with questionnaires:
- you can study only those people who do questionnaires—and they may not be the ones you’re really interested in;
- people tend to lie about themselves—or can’t be bothered to be accurate;
- people are often unable to assess their own behaviour with any degree of accuracy.
The people you really want to study don’t do questionnaires
As an example of point 1, in the UK there have been two extensive surveys conducted with new drivers, commonly known as Cohort I and Cohort II, intended to gather information about training and practice before obtaining a full driving licence, and driving experience, including crash involvement, thereafter. In order to contribute to the whole survey a new driver would have to complete an extensive questionnaire soon after passing the driving test, another after six months and further questionnaires at 12, 24 and 36 months, making five in all.
Not surprisingly, many of those invited to take part declined, and there was a high drop-out rate amongst respondents. Only a third of the initial 128,000 questionnaires were returned and only 2,765 of the fifth questionnaires were completed—that’s just over two percent of the original group of drivers who were invited to take part.
Those who stayed the course were predominantly well-educated young women with a strong sense of social responsibility (as indicated by their willingness to complete five lengthy questionnaires with no other form of compensation). The data they provided was then subjected to all sorts of clever statistical analysis and many fancy graphs were produced.
Just one problem. We now know a good deal about a group of people who are pretty much the exact opposite of the young drivers most likely to be involved in crashes. We know that the real “problem” drivers tend to be poorly-educated young men who don’t give a damn about society. Should we not be including these people in the study? But they don’t respond to questionnaires.
The problem of focusing research only on those people who are willing, or even keen, to engage in the research process is well known in social science. It’s called the self-selection bias. The 223-page Cohort II Main Report does draw attention to this problem but the fact remains that a great deal of time, money and effort was expended on expanding our knowledge of a very small, select group of people while nothing more was learned about those drivers most likely to do harm to themselves and others.
People lie about themselves
Sometimes lying is a deliberate attempt to present an air-brushed image instead of warts-and-all reality. And sometimes respondents are not deliberately trying to be deceptive but they just can’t be bothered to spend the time on giving carefully-considered, accurate responses. They hardly read the questions and slap a few ticks in boxes. Consequently, many of the data they provide are false. So, in effect, they lie.
People are unable to assess their own behaviour
If lying is the most obvious flaw with questionnaires, the other problem of people being unable to assess their own behaviour is probably the greater weakness in the method. In particular, most people don’t recognise their own incompetence. (Conversely, highly competent people tend to underestimate their relative competence because they erroneously assume that lots of other people are just as competent.) This is known as the Dunning-Kruger effect.
Here’s a common example you’re probably familiar with. If you ask everyone in a group of drivers to state whether they consider themselves to have driving ability that is above or below the average standard for the group, at least three-quarters would say they’re above average.
(Actually the above-average bias does vary a little from one country/culture to another and tends to be stronger in men than it is in women, but at least three-quarters of respondents who consider themselves to be above average is pretty typical.)
But if we were to measure their actual ability by some objective means, we must end up with fifty percent above and fifty percent below the average (median) standard. So, this flaw in the data-collection method (or human failing, if you prefer) produces a massive error or bias (the true 50/50 becomes an estimated 75/25).
Various studies have shown that this error of estimation extends beyond mere self-image of competence. If you ask drivers to estimate some measure within their driving, such as the peak speed they reach when driving in town or the average time gap they leave between their vehicle and vehicle ahead in traffic, and also record those measures while they actually drive, you find that drivers’ perception of their behaviour and their actual behaviour are rather different.
Now, studies like that have useful applications in the management of driver behaviour. They can be drawn upon when developing programmes that help drivers to recalibrate their perceptions and, thus, to modify their behaviour. But, alas, useful studies like that are in the minority.
It’s much more common for researchers to draw conclusions purely from the “ask the driver” method, even though the flaws in this method are so widely known in psychology. Computer folk have a name for this: GIGO, or Garbage In, Garbage Out.
Drivers are not lab rats
A great deal of scientific research, including that conducted in the fields of psychology and human factors, takes the form of laboratory experiments. These can be extremely valuable, especially in helping us to appreciate isolated aspects of human performance.
Where I feel they are often misused in driver behaviour research is when results obtained from subjects in the laboratory are extrapolated as being indicative of drivers’ performance on the highway. This completely overlooks the fact that our behaviour is dependent on context. An artificial context tends to encourage artificial behaviour.
For the sake of brevity let me cite just a couple of examples in which the laboratory and the highway are poles apart: drivers’ reaction times and the effect of alcohol on driving performance.
How fast can you react?
When novices are learning to drive in the UK, they try to commit to memory the vehicle stopping distances quoted in The Highway Code. These are broken down into “thinking” distance (proportional to the time taken to react to a hazard and to apply pressure to the brake pedal) and braking distance (the distance travelled between the start of braking and coming to a full stop).
For decades, the reaction time that has been used in driver education resources is two-thirds of a second, which conveniently equates to a reaction distance of one foot travelled for every mile per hour of speed. And how was this particular value arrived at? It comes from very rudimentary laboratory experiments of the “when this light comes on, move your foot off this button and stamp on that button” variety.
That’s the typical human reaction time to a simple stimulus (obviously some people react more quickly and some more slowly).
And how fast with real driving hazards?
Yet recent studies, conducted either in real driving environments or in pretty realistic simulators have found that it’s not unusual for two full seconds to elapse between the appearance of a hazard that requires an immediate response and the initiation of a response from the driver.
Why such a difference from the two-thirds of a second that is the “standard” reaction time? In the real world the driver has a mass of different stimuli competing for his attention (and, of course, he may be attending to something other than his driving). Before he can react to any one of them he has to perceive it and then decide that he needs to react. It’s this sifting and processing of information that takes the time. All of this mental work is absent from the simple reaction time experiments.
And in which drivers do we find the greatest difference between “pure” reaction time and “real” reaction time? Young, novice drivers. Young people generally have faster physical reactions than older folk but older drivers have a wealth of experience, which aids faster interpretation of the world around them.
So why are we still holding onto this illusion that drivers react to hazards in two-thirds of a second when it’s more likely (especially with young drivers in a complex traffic environment) to take around two seconds or more? How many collisions and casualties might be traced back to perpetuation of this myth?
What are we simulating?
I just mentioned using simulators in research. Most people probably associate the word simulator with the kind of thing that airline pilots train on. Perhaps they’ve seen the €160m Mercedes-Benz simulator that takes whole cars or truck cabs inside it.
You don’t normally find anything like that in academia. Research studies published by the world’s top universities have often been conducted on “simulators” that would be out-classed by a PlayStation.
I’m very sceptical of the value of such low-fidelity simulators. In many cases, they have about as much resemblance to real driving as the old Pong computer game (Atari Inc., 1972) has to playing table tennis.
Watching drunks playing video games teaches us little about drunk drivers
As part of my masters studies I had to observe an experiment on the effect of alcohol on “driving ability” and then write an assignment on the topic, for which I obviously had to review the relevant research papers.
There were two things that struck me as, frankly, ludicrous. The first was the very crude nature of the experimental methodology, and the second was the preponderance of such methodology in the research literature.
In the experiment, the tasks which the experimental subjects had to perform bore no resemblance to real driving.
They sat in an ordinary chair at a desk, looked at a trio of fairly small monitor screens at a distance which gave an included horizontal angle of view of about 90 degrees and a vertical angle of perhaps 15 degrees (i.e. less than 10 percent of most people’s total field of view) and operated a small video game-style steering wheel and pedals.
Simple measurements of reaction time were taken for emergency braking (such as when a “pedestrian” stepped into the road directly ahead of the “car”) and for recognition of a peripheral stimulus (which was a symbol flashed onto one of the side screens) while watching the main action—a “divided-attention task”.
The graphics were very low fidelity and there was little life-like movement of elements on the screens (for example, pedestrians depicted on footpaths were immobile, like cardboard cut-outs, apart from those which would move into the vehicle’s path—which made such events easy to predict).
The divided attention tasks were delivered with bold symbols that strongly contrasted with the background in the same place in the driver’s peripheral vision each time (thus they were simple, on-off actions) and were nothing like events that may occur in the periphery in a real driving situation.
But it wasn’t so much the crudeness of what was there that destroyed any semblance of reality. It was what wasn’t there. There was a complete absence of two elements that are continually present in driving: movement and the third dimension.
Can you drive without moving?
The simulator I observed was what’s politely termed a “fixed-base” simulator. The “driver” experienced no motion effects whatsoever. Sensations were primarily visual, with a bit of crude audio thrown in.
In contrast, effective simulators (like that expensive one owned by Mercedes-Benz) have a full-motion base which can tilt in all directions to replicate the sensations of acceleration, deceleration and cornering, and the movement of the car on its suspension. Which would make a big difference to how you’d feel after a few drinks.
Furthermore, the visual simulation was entirely two-dimensional. The “driver” looked at flat screens at a fixed distance. There was no sense of depth as would be experienced when moving through a real-world, three-dimensional environment.
As the “lab rats” consumed more and more alcohol some aspects of their “driving” could be seen to deteriorate noticeably. For example, they became more likely to steer off the edge of the road on bends.
Yet—and here’s the funny thing—in those tasks that were being measured their performance hardly deteriorated, if at all.
When I inspected a pile of research papers on the effects of alcohol on drivers’ vision dating back to 1938, I found that only half reported some deterioration in visual acuity from the ingestion of moderate amounts of alcohol while the other half found no effect.
But when I looked at the methods employed I found that they all used very simple, fixed-base simulators or, especially in the earlier studies, even simpler devices than that. The experiment I observed could have shown some effect or no effect, depending on which elements of performance on the simulator were measured.
In fact I found only two relevant studies in which the researchers took their measurements while the experimental subjects were moving. As Frank Schmäl and his colleagues put it in their study published in 2000 on the effect of alcohol on dynamic visual acuity (i.e. the ability to fixate on a target from a moving base):
“One of the most important functions of the vestibulo-ocular system* is the stabilization of visual acuity during motion. Therefore dynamic visual acuity (DVA) is the more natural test condition because during walking, running, riding a bicycle, and driving a vehicle the observer and/or the target will be in motion.”
*The linking of the balance system and the eyes.
What Schmäl and his co-authors are describing is like vision’s automatic gyroscope: the eyes can remain locked on a target even while the eyes themselves are being subjected to quite large and rapid displacements in space (check it for yourself by moving your head around while continuing to read this).
When the head moves up, the eyes move down to compensate; when the head moves left, the eyes move right, etc. This is a completely automatic process.
The researchers had subjects track a target while they were oscillating up and down on a specially-made chair. It was found that even moderate alcohol intoxication caused a delay between body movement and the compensatory eye movement. In other words, accurate tracking was lost.
You drive through a three-dimensional world, don’t you?
Motion parallax, which is the perception of depth from small lateral movements of the eyes, relies on slow eye movement which is affected by alcohol intoxication. Such was the finding of the one and only study I found in which researchers chose to investigate this effect.
The report, by Nawrot, Nordenstrom and Olsen (published in 2004), concluded:
“The effects of ethanol [alcohol] intoxication on eye movement may have an underappreciated impact on the visual mechanisms necessary for successful locomotion through a cluttered and hazard-ﬁlled environment. Although the current study suggests that intoxicated drivers may have difficulty determining the relative position of obstacles using motion parallax, this may be only one part of a broader, but poorly understood, set of visual perceptual problems caused by ethanol’s effect on the eye movement system.”
Motion parallax cannot be examined when subjects focus on a flat screen. There is no depth in a two-dimensional image, only the illusion of depth which cannot be altered by the movement of the viewer. Depth perception is an active process, not passive.
(You may be thinking, “What about 3D TV screens or cinema projection systems?” While they have two images corresponding to the positions of two human eyes, those eyes are still fixed in the position the camera was in when it shot the material; the viewer has no means to change the viewpoint at will.)
One last word (rant) on simulators
Earlier I wrote “…behaviour is dependent on context. An artificial context tends to encourage artificial behaviour.” No matter how “realistic” any expensive simulator may be, it always lacks one vital factor found in the context of real-life driving. To the best of my knowledge, nobody has ever been killed in a simulator. In a simulator you can wander into the oncoming traffic lane and slam into a truck with a closing speed of 120mph and you don’t even get a sore neck.
Even if the experimental subject apparently becomes immersed in the illusion of the virtual reality that has been created, deep down that person knows that, no matter what sort of stupid actions he performs, he will come to no harm. Now this may be entirely at a subconscious level, but it will make a difference.
If you don’t believe me, try this thought experiment. Imagine that I support a stout and rigid wooden beam measuring 5 metres long and 10 centimetres wide (about 16 feet by 4 inches) on platforms at each end that are just high enough to prevent the beam from touching the ground when you stand in the middle and it sags a bit. Then I ask you to walk along the full length of the beam. Unless you have some abnormal balance problem, it’s likely that you would find the task very easy. In fact, you’d probably adopt a pretty casual attitude towards it.
Then I take you up a tower and ask you to walk along an identical beam that spans the gap between this and another tower, 30 metres (nearly 100 feet) off the ground. And let’s assume it’s a totally windless day so you can’t be blown off balance. How do you approach the task now? Still casual? In fact, are you prepared to perform the task at all?
So what’s different? The task is identical. The characteristics of the beam are identical—the same degree of sag and springiness as you walk along it. It’s the context that’s different. And related to this context is the knowledge that if you make a mistake—and there’s no reason that you should; you’ve already performed this task and it was dead easy, remember—you’re probably going to die.
The beam just above ground level was our simulator. What has it taught us about how people walk along beams at high level?
Suppliers in the road-risk management industry may try to persuade you that their interventions are “scientifically proven.” It’s useful to bear in mind that any claims containing the phrase “scientifically proven” are probably hype. Real scientific researchers are exceedingly reluctant to claim they’ve proven anything. Instead, they use phrases like, “There are strong indications that…” or, “Our findings would suggest…”
If a supplier is making such claims, ask to see the original research. If they can’t supply it, either they don’t know what they’re talking about or they’re just bullshitting you. Deal with them accordingly.
If they do provide a research paper or two, you’ll probably find that you can’t understand at least three-quarters of what you read (don’t worry, it’s not you; that’s just the way the damned things are written). The next step is to find someone with an advanced science degree and working knowledge of statistics to translate the gobbledegook into plain English. Having gone to all that trouble, what you’ll find is likely to fall into one of the following three categories.
(1) When you consider the research in the light of what I’ve written in this article, you realise it’s junk research. Therefore it has no useful application in the real world.
(2) Even if the research being quoted is good, solid stuff, a common “scam” is that it isn’t really related to the thing they want you to buy. Somebody whose background is in marketing rather than science either thought that it was related (inadvertent deception) or knew that it wasn’t really related but thought it was close enough to pull the wool over customers’ eyes (deliberate deception).
Or (3) you’ll find that the suppliers have cherry-picked favourable bits from the research but not given you the full picture.
As an example of how cherry-picking works, let’s suppose that a supplier of telemetry makes reference to a research paper in its marketing materials and presents a graph showing that a known risk factor reduces noticeably and rapidly within a few months of using their product.
What they don’t tell you is that they only reproduced the first part of the graph as originally published in the study. The full graph (and the text of the research paper) shows that, after an initial dip, the risk factor gradually returned to the original level over the next year or so—a common phenomenon known as regression to the mean.
The implication in the marketing is that the lower level of risk and its attendant cost savings will be maintained, thus recovering the investment in the product, rather than having just a short-lived improvement. It’s only by checking against the original research that you’ll know if cherry-picking is going on.
The lessons to be learned from all of this are…
Driver behaviour research has revealed a great deal about the abilities, limitations and flaws of human beings as drivers of motor vehicles. Unfortunately, as is often found in the social sciences, the truly valuable research is often buried under piles of junk.
The most common form of junk research is that which measures what is easiest and cheapest to measure rather than what is important and relevant to measure.
Taken in isolation, junk research is just a wasted opportunity to do something useful but, where it is presented as revealing something important about the real world, it is, at best, misleading and, in the worst cases, potentially dangerous.
The next time you see a news story along the lines of “New research indicates that drivers…” I suggest you question how the researchers arrived at their conclusions rather than take them at face value. You’ll find a good deal of what gets reported in the media as research isn’t proper research at all but merely surveys.
Unless you can see that the methods used were truly representative of driving in the real world, it’s quite likely that someone’s peddling junk.