How to Validate a Driver Drowsiness Detection System

From mid-2024, all new models of passenger vehicles in Europe require a driver monitoring system (DMS) that can detect drowsiness. Engineering teams across the automotive industry have scrambled to learn about the science of drowsiness and develop systems that can detect it in drivers. Each of these systems then needs to be validated and shown to accurately detect drowsiness. One recurring problem we see in many teams is erroneous test protocols.

Within Optalert, we are frequently astounded that so many tests are not oriented around the only ground truth that matters: driver impairment.

This article begins with a table listing common mistakes to watch out for when assessing a given test protocol. It then outlines in more detail how to devise a protocol that avoids these traps, in line with decades of meticulous research from sleep science laboratories.

Longer-term we believe the industry should agree upon a standardised test protocol. This will allow systems to be compared in terms of performance. Until a broader consensus is reached, adhere to the principles outlined below to ensure a system is measuring what it should.

Common mistakes in validation test protocols

What to watch out for
What should be done?
Subjects are not sufficiently sleep deprived.
If people are not sleep deprived to the point at which they are impaired and exhibiting an increase in performance failures, you cannot test whether a system detects impairment.
Healthy, well-rested subjects must be kept awake to the point at which they exhibit significant impairment.
There is no safety driver in the passenger seat in an on-road test.
Either the test is not pushing subjects to dangerous levels of impairment, or it is an egregious breach of ethics in terms of danger to the driver and other road users.
With truly impaired subjects in a driving test, a raft of safety measures is required for when they fail.
A test validates against a ground truth that is not impairment.
A system must detect driver impairment, which is the increase in performance failures. It is not the KSS score or some other proxy metric that correlates loosely with impairment.
Performance failures include not stopping in response to a car appearing or wheels drifting out of lane.

A DMS should measure impairment

  • A drowsiness detection system exists to prevent accidents caused by driver drowsiness.
  • To prevent accidents, it must be optimised to detect impairment.
  • Impairment is defined as the increase in relative risk of the driver exhibiting performance failures.
  • Any test to measure the effectiveness of drowsiness detection within a DMS must measure against impairment and not the driver’s subjective feeling of tiredness, as measured by the Karolinska Sleepiness Scale (KSS) or any other subjective measure.

Our recommendations below provide a clear overview of how to thoroughly administer such a validation test, drawing from preeminent research in this field.

Glossary of terms
Perfomance failure
An instance of a test subject failing to perform the assigned task within a test.
Relative risk
The ratio of the probability of a performance failure at a given level of drowsiness to the probability of a performance failure in an alert state.
The relative risk at a given level of drowsiness.
Ground truth
The observation or measurement used to assess performance failure.
Error of omission
Lack of response to a visual stimulus in a psychomotor vigilance test within 2,000 milliseconds.
Prolonged wakefulness
Extension of the time a subject remains awake beyond the usual time they would go to sleep. It is usually deemed to be 18 to 36 hours awake.

Test subjects must be pushed to the point of failure

To measure a system’s ability to detect impairment, subjects must be…impaired. This involves a considerable period of extended wakefulness. In general, after 17 hours of extended wakefulness relative risk begins increasing in psychomotor vigilance tests. Driving tasks usually only reveal an increase in relative risk after 24 hours of extended wakefulness, although it varies across subjects. High impact research in the field has tended to keep subjects awake for 30 hours or even 32-34 hours before driving tasks.

If a well-rested person is kept up for a few hours after their bedtime, their ability to drive will likely be unimpaired or only very slightly impaired. They cannot self-assess with KSS or any other subjective measure in this regard. The only relevant ground truth is an increase in performance failures.

One dead giveaway of a test involving unimpaired subjects is if there is no safety driver in the passenger seat. If this is the case, the test is either egregiously unethical or subjects are not truly impaired.

Verify that sleep deprived subjects really have been kept awake

It is critical that a test ensures sleep deprived subjects are indeed kept awake. Carefully controlled research verifies that the impaired participants did not sleep in their extended wakeful period. Methods for verification include wearable accelerometer technology or direct observation by an invigilator. A sleep diary is less reliable because it can be falsified.

Laboratory test versus on-track validation

Two types of testing are employed for calibrating, testing, or validating a drowsiness detection system: laboratory tests and on-track validation. Ordinarily a team runs many more laboratory tests due to the lower cost and effort involved in their administration.

Laboratory test
On-track validation
Subjects sit at a computer and undertake a psychomotor vigilance task at regular intervals.
Subjects drive on a test track and their performance failures are recorded.
What is being measured?
Fundamental cognitive impairment
Domain-specific (driving) impairment
Safety protocols required
Ensure participants are returned home safely (they cannot drive or catch public transport).
A safety driver with dual controls should be in the passenger seat.
Ensure participants are returned home safely (they cannot drive or catch public transport).
Can be used to either calibrate or test a DMS.
Final step to validate or compare systems.

Laboratory tests

The Johns Test of Vigilance (JTV) is the psychomotor vigilance task that correlates most closely with performance failures in driving tasks. Over two decades ago, Optalert’s founder Dr. Murray Johns found that the impairment caused by drowsiness presents far more often as simply not responding to a stimulus as opposed to a gradual increase in response time. Consequently, performance failure in the JTV is defined as non-response within 2,000 milliseconds.

Any laboratory test attempting to simulate driving should adopt this definition of performance failure, as it has been extensively validated across decades of research. It is also important to note that extremely rapid responses should be excluded as they are considered anticipatory. The exact timing depends on the complexity of the task, but usually sits somewhere between 50 and 150 milliseconds.

Ground truth
Unimpaired: Sessions with fewer than 5% errors of omission
Impaired: Sessions with greater than or equal to 5% errors of omission
Quiet environment with no distractions
10 people diverse across age, sex, stature, skin complexion, and eyelid aperture
Participants to be well rested prior to start of test.
  • 2 × 15-minute vigilance tests in the morning an hour apart
  • 1 × 15-minute vigilance test each hour, starting from 18 hours to 34 hours awake
Study conclusion
When a participant reaches 25% errors of omission or 34 hours of prolonged wakefulness, whichever comes first.
Binary classification (impaired or unimpaired) is compared between the DMS and the ground truth. This yields sensitivity, specificity, and overall accuracy.

On-track validation

When conducting on-track validation of the drowsiness detection within a DMS, numerous definitions of performance failure could be adopted. Practical options include:

  • not braking when a traffic light turns red;
  • not braking or taking evasive action when another vehicle or obstacle appears;
  • drifting out of lane (i.e., not responding to lane markings); or
  • any number of other errors of omitting to respond to a stimulus.

Poor examples of performance failures include:

  • yawning;
  • slumping in the seat;
  • loosening the grip on the steering wheel;
  • reporting feeling subjectively tired;
  • an external “expert” visually assessing the person as drowsy; or
  • any other measure that would not directly relate to the vehicle having an accident in real-world conditions.

The table shows an example of a protocol, although as noted there are a range of performance failures that could be selected that accurately map to impairment in real-world driving tasks.

Ground truth
Unimpaired: No two or more lane departures occur within 15 minutes of each other
Impaired: Two or more lane departures occur within 15 minutes of each other
Co-driver for safety, no interaction with driver, maximum speed of 50 km/h
10 people diverse across age, sex, stature, skin complexion, and eyelid aperture
Participants are subjected to 34 hours of prolonged wakefulness.
They then drive a vehicle for one to two hours on a test track.
Lane departures (two wheels out of lane) are recorded as a performance failure.
Study conclusion
A driving session concludes when the co-driver finds it necessary to intervene or the driver complains of feeling too drowsy to continue safely.
A true positive occurs when the DMS sounds an alert no more than 15 minutes prior to the second lane departure in a performance failure event. This yields sensitivity, specificity, and overall accuracy.

Regardless of which performance failure is selected, the DMS must detect impairment before it occurs. This shows it would have prevented the performance failure in real-world conditions.

In addition, the driver must not be roused throughout the drive by either the safety co-driver or any devices in-vehicle. The DMS must not sound any alerts that would wake the driver.

Cybernetic artificial intelligence in three-dimensional logic space on quantum computing communication of abstract Plexus elements array. 3D illustration concept loop for music, logic, and meditation.

Optalert can help you validate your system

We encourage automotive OEMs and tier 1s to contact us for more information on how to rigorously validate the drowsiness detection within a DMS.

We are eager to support the industry to employ sound science and ensure we are doing all we can to keep drivers and other road users safe.