©2017 by Consortium for Safer AI.


Please reload


Please reload

Does Standards Safety Testing of AI Systems need to become more like Intelligence Tests?

Civilization advances by extending the number of important operations which we can perform without thinking about it. 


Alfred North Whitehead, English Mathematician



Our ability to design, make and use tools has been the catalyst for our dominance as a species enabled by the emergent property of our brain known as intelligence.  Until recently, our tools have been mainly complementing, magnifying or supplanting our physical strength and dexterity.  With AI, the intention is to do the same with our intelligence.  Like everything humans do, we move ahead whether we understand fully the underlying concept or the consequences of a new tool, preferring to tout the benefits.  The seeming exponential nature of our technological development and their impact are only partly due to the scaling efficiencies and power of those technologies.  Most of it is because we choose to go along.  A fundamental understanding of the workings of our intelligence [1] and those of other forms of intelligence designed by our intelligence is critical as the risks of creating and introducing non-human intelligence into our environment have a great potential for unintended consequences. 


We use the following definition of intelligence: the ability of an agent to make decisions and act on those decisions in a manner that helps the agent to move closer towards achieving a goal.  Of course, part of intelligence is the selection of a goal, too [2].  Independent goal selection is not a feature of current AI technologies.  The way that AI is being incorporated into products today is mainly to replace the intelligence of the user of a product (see figure below).  That is the basic concept behind autonomous anything, from vacuum cleaners to vehicles.  Now the word autonomous is a bit of a misnomer since it implies independent thinking.  We are unlikely to want a truly autonomous product.  For example, I would not want to take a ride in a truly autonomous vehicle, one that might decide to change the destination from the one I have chosen.





It is this aspect of AI in our products, reducing the role of the user of a product, which is the basis for claims that AI-enabled products can help advance safety to levels not possible as long as humans are in the loop.  Humans make mistakes, quite often.  AI in our products poses a tremendous challenge to traditional safety standards testing.  Let us take the simple example of a vacuum cleaner.  If my wife is sitting in the living room while I operate a traditional vacuum cleaner, it is my responsibility to watch out that I do not hit her toes with the vacuum.  And should I do so, she will blame me, not the vacuum itself (of course, I will curse the vacuum!) or the manufacturer of the vacuum (I will probably curse them, too).  For a traditional product, validation testing - both product development and standards related - need not worry as much about imagining all the possible unsafe scenarios that could be created due an error by the human user.  Yet, with the introduction of AI into this vacuum cleaner, something that was the responsibility of the user has now shifted to the device.  This shift has a monumental effect on testing protocols necessary to establish safety levels.  For example, the AI-vacuum may need to be designed to avoid contact with objects or ensure that speeds are low enough that no one can be injured.  And new safety testing protocols would need to be written to assess the ability of the vacuum to perform under a variety of circumstances in meeting agreed upon safety levels.


So imagine if the standards safety testing for vacuum products included people, the actual users, as part of the test.  The test would be evaluating the combined device-user.  Therefore, the test is not only assessing the safety performance of the device but now is evaluating the intelligence of the user in utilizing the device in a safe manner.  Fora very small fraction of traditional products, we do test the user’s safety performance but in a separate and limited situation.  One example is that of a car driver.  The car, as a product, is subjected to pre-defined safety tests before it is allowed to be sold commercially.  The user (driver) must pass a short driving test so that they are granted a license (by a government agency) to operate the commercially available car.  Though for most products, users simply need to purchase the product and use it without any tests of their intelligence in using the product.   This is what is changing as AI is incorporated into more of the products we use.


All this suggests that standard safety tests will need to evolve into intelligence tests of a sort for AI enabled products.  And herein is the challenge.  As we have been learning over the last few decades, human intelligence has flaws or biases.  Most of these biases [3] are a result of heuristics that our brain uses to process information more quickly to improve our chances of survival in the wild.  AI is a product of this flawed, yet very effective, human brain.  And like all human-made products, it too will be flawed.  Once you try to evaluate the intelligence of an agent, the key challenge is how thoroughly and practically can we assess the full potential for the agent to make safe decisions in an uncertain dynamic environments.  For a vacuum cleaner, this may be a practical assessment requiring a few variables and scenarios, though not complete, sufficient to ensure an acceptable level of safety performance.  However, for autonomous vehicles, the challenge is much greater as the decision making space is much larger, encompassing a large and possibly impractical number of variables and scenarios for a high confidence safety assessment.  Now a large portion of the challenge is assessing the decision making skills of an intelligent agent in a dynamic uncertain environment, yet, it is also important to assess all the sources of inputs into the intelligent agent (sensors) and the actuation mechanisms for a complete evaluation.


So if it is that the nature of traditional product testing needs to change as products become more intelligent and are required to use this intelligence under a variety of uncertain dynamic conditions, what are the risk factors?  Some general risk factors associated with assessing the decision making capabilities of an intelligent agent in products meant for commercialization could be that:


Decision Space: the decision space could be much larger than what is practically possible to test to establish proper confidence levels in safety performance.


Learning on the Job: the intelligent agent is learning while being used and therefore the underlying decision making algorithms are changing.


Sensing and Perception: the design of the testing protocols must also consider the quality and quantity of inputs that help the intelligent agent assess a situation and make a decision.


This is a very challenging problem and it does not include risk factors associated with the specific form of the intelligent agent [4].    It is unlikely that any one company or industry can address this challenge on its own [5].  These risks can only be understood and mitigated by a collaboration that goes beyond any one industry, private and public agencies, and even national borders.






Please reload

Recent Posts

Please reload