From mid-2024, all new models of passenger vehicles in Europe require a driver monitoring system (DMS) that can detect drowsiness. Engineering teams across the automotive industry have scrambled to learn about the science of drowsiness and develop systems that can detect it in drivers. Each of these systems then needs to be validated and shown to accurately detect drowsiness. One recurring problem we see in many teams is erroneous test protocols.
Within Optalert, we are frequently astounded that so many tests are not oriented around the only ground truth that matters: driver impairment.
This article begins with a table listing common mistakes to watch out for when assessing a given test protocol. It then outlines in more detail how to devise a protocol that avoids these traps, in line with decades of meticulous research from sleep science laboratories.
Longer-term we believe the industry should agree upon a standardised test protocol. This will allow systems to be compared in terms of performance. Until a broader consensus is reached, adhere to the principles outlined below to ensure a system is measuring what it should.
What to watch out for | Why? | What should be done? |
---|---|---|
Subjects are not sufficiently sleep deprived. | If people are not sleep deprived to the point at which they are impaired and exhibiting an increase in performance failures, you cannot test whether a system detects impairment. | Healthy, well-rested subjects must be kept awake to the point at which they exhibit significant impairment. |
There is no safety driver in the passenger seat in an on-road test. | Either the test is not pushing subjects to dangerous levels of impairment, or it is an egregious breach of ethics in terms of danger to the driver and other road users. | With truly impaired subjects in a driving test, a raft of safety measures is required for when they fail. |
A test validates against a ground truth that is not impairment. | A system must detect driver impairment, which is the increase in performance failures. It is not the KSS score or some other proxy metric that correlates loosely with impairment. | Performance failures include not stopping in response to a car appearing or wheels drifting out of lane. |
Our recommendations below provide a clear overview of how to thoroughly administer such a validation test, drawing from preeminent research in this field.
Glossary of terms | |
---|---|
Perfomance failure | An instance of a test subject failing to perform the assigned task within a test. |
Relative risk | The ratio of the probability of a performance failure at a given level of drowsiness to the probability of a performance failure in an alert state. |
Impairment | The relative risk at a given level of drowsiness. |
Ground truth | The observation or measurement used to assess performance failure. |
Error of omission | Lack of response to a visual stimulus in a psychomotor vigilance test within 2,000 milliseconds. |
Prolonged wakefulness | Extension of the time a subject remains awake beyond the usual time they would go to sleep. It is usually deemed to be 18 to 36 hours awake. |
To measure a system’s ability to detect impairment, subjects must be…impaired. This involves a considerable period of extended wakefulness. In general, after 17 hours of extended wakefulness relative risk begins increasing in psychomotor vigilance tests. Driving tasks usually only reveal an increase in relative risk after 24 hours of extended wakefulness, although it varies across subjects. High impact research in the field has tended to keep subjects awake for 30 hours or even 32-34 hours before driving tasks.
If a well-rested person is kept up for a few hours after their bedtime, their ability to drive will likely be unimpaired or only very slightly impaired. They cannot self-assess with KSS or any other subjective measure in this regard. The only relevant ground truth is an increase in performance failures.
One dead giveaway of a test involving unimpaired subjects is if there is no safety driver in the passenger seat. If this is the case, the test is either egregiously unethical or subjects are not truly impaired.
It is critical that a test ensures sleep deprived subjects are indeed kept awake. Carefully controlled research verifies that the impaired participants did not sleep in their extended wakeful period. Methods for verification include wearable accelerometer technology or direct observation by an invigilator. A sleep diary is less reliable because it can be falsified.
Two types of testing are employed for calibrating, testing, or validating a drowsiness detection system: laboratory tests and on-track validation. Ordinarily a team runs many more laboratory tests due to the lower cost and effort involved in their administration.
Laboratory test | On-track validation | |
---|---|---|
Description | Subjects sit at a computer and undertake a psychomotor vigilance task at regular intervals. | Subjects drive on a test track and their performance failures are recorded. |
What is being measured? | Fundamental cognitive impairment | Domain-specific (driving) impairment |
Safety protocols required | Ensure participants are returned home safely (they cannot drive or catch public transport). | A safety driver with dual controls should be in the passenger seat. Ensure participants are returned home safely (they cannot drive or catch public transport). |
Purpose | Can be used to either calibrate or test a DMS. | Final step to validate or compare systems. |
The Johns Test of Vigilance (JTV) is the psychomotor vigilance task that correlates most closely with performance failures in driving tasks. Over two decades ago, Optalert’s founder Dr. Murray Johns found that the impairment caused by drowsiness presents far more often as simply not responding to a stimulus as opposed to a gradual increase in response time. Consequently, performance failure in the JTV is defined as non-response within 2,000 milliseconds.
Any laboratory test attempting to simulate driving should adopt this definition of performance failure, as it has been extensively validated across decades of research. It is also important to note that extremely rapid responses should be excluded as they are considered anticipatory. The exact timing depends on the complexity of the task, but usually sits somewhere between 50 and 150 milliseconds.
Ground truth | Unimpaired: Sessions with fewer than 5% errors of omission Impaired: Sessions with greater than or equal to 5% errors of omission |
---|---|
Environment | Quiet environment with no distractions |
Participants | 10 people diverse across age, sex, stature, skin complexion, and eyelid aperture |
Regimen | Participants to be well rested prior to start of test.
|
Study conclusion | When a participant reaches 25% errors of omission or 34 hours of prolonged wakefulness, whichever comes first. |
Analysis | Binary classification (impaired or unimpaired) is compared between the DMS and the ground truth. This yields sensitivity, specificity, and overall accuracy. |
When conducting on-track validation of the drowsiness detection within a DMS, numerous definitions of performance failure could be adopted. Practical options include:
Poor examples of performance failures include:
The table shows an example of a protocol, although as noted there are a range of performance failures that could be selected that accurately map to impairment in real-world driving tasks.
Ground truth | Unimpaired: No two or more lane departures occur within 15 minutes of each other Impaired: Two or more lane departures occur within 15 minutes of each other |
---|---|
Environment | Co-driver for safety, no interaction with driver, maximum speed of 50 km/h |
Participants | 10 people diverse across age, sex, stature, skin complexion, and eyelid aperture |
Regimen | Participants are subjected to 34 hours of prolonged wakefulness. They then drive a vehicle for one to two hours on a test track. Lane departures (two wheels out of lane) are recorded as a performance failure. |
Study conclusion | A driving session concludes when the co-driver finds it necessary to intervene or the driver complains of feeling too drowsy to continue safely. |
Analysis | A true positive occurs when the DMS sounds an alert no more than 15 minutes prior to the second lane departure in a performance failure event. This yields sensitivity, specificity, and overall accuracy. |
Regardless of which performance failure is selected, the DMS must detect impairment before it occurs. This shows it would have prevented the performance failure in real-world conditions.
In addition, the driver must not be roused throughout the drive by either the safety co-driver or any devices in-vehicle. The DMS must not sound any alerts that would wake the driver.
We encourage automotive OEMs and tier 1s to contact us for more information on how to rigorously validate the drowsiness detection within a DMS.
We are eager to support the industry to employ sound science and ensure we are doing all we can to keep drivers and other road users safe.