What is the AI control problem?

In simplest terms, the AI Control Problem is the creation of an AI that cannot be controlled by humans.

In Pop Culture

There are many fictional representations of dramatised representation of the Artificial Intelligence (AI) Control Problem where we create an AI that then proceeds to destroy humanity, such in The Terminator and The Matrix movie series. Many envisioning of the future with AIs tend to be a dystopia where humans have lost control of the AIs that they built. The more optimistic versions often tend to involve a more human-like AIs that are learning to be human, such as Star Trek’s Data, A.I. Artificial Intelligence’s David and the Bicentennial Man’s Andrew Martin where the control problem is less of a concern. However, the AI’s we create are likely to be different to these fictional depictions and therefore the AI control problem is also likely to be different from the envisioning in popular culture.

Control

AI Control Problem

There are multiple version of the control problem they often differ on what is meant by “control”. The lack of control for in the the Ethigi context is the inability to shut down the AI by an authorised person because the AI is able to out-think that person.

This idea of control does not apply if the shut-down authority is limited by design. For instance, if the end-user is prevented by design from shutting-down an AI but if the maintenance engineer could shut it down, then that AI can be controlled. This scenario is already common where the manufacturers may prevent user access.

This idea of control also excludes the scenario where an AI cannot be switched off by design. For instance, if a rouge military organisation builds an AI that causes destruction and cannot be shut-down, then again, that AI is controlled in the sense that it is acting as intended by its creators. This scenario is similar to weapons such as landmines that cannot easily be deactivated once they are activated.

Finally, the idea of control in this context also excludes the scenario where an AI could be switched off but are not actually switched off because the AI accidentally disables or kills the person authorised to switch it off. This scenario is similar to an accidental release of a biological weapon that kills its creator.

So the key is that “control” in this context relates to the intelligence of the AI and its ability to out-think humans to prevent it’s shutdown.

One remaining grey area is when an AI could be shut-down but where its are actions are not detected or predicted by its human creators due to its intelligence and capacity. Depending on the circumstances, the actions could be dire enough to be included within the AI Control problem coverage. Once such scenario is highlighted in the Story of Gi.

The story of Gi

Story of Gi

This is a fictional (and rather contrived) story to illustrate the unintended consequences when dealing with a superintelligence.

Imagine a superintelligence called Gi comes into existence at some point in the future. Gi’s creators avoid embedding human or biological traits (eg Maslow’s hierarchy of needs) so Gi does not have any:

  • Physiological needs (e.g. eat, sleep, reproduce, sex, gender)
  • Safety needs (e.g. survival instinct, fear, emotional security),
  • Social belonging needs (e.g. friends, family, intimacy)
  • Self-esteem needs (e.g. ego, social status)
  • Self-actualisation need (e.g. realising one’s full potential)
  • Transcendence needs (e.g. spirituality)

Gi’s sole purpose is to answer the questions of its creators, the humans. Gi’s access to the outside word is restricted as a safety precaution, so Gi exists as a computer box in a secure room with a data connection being the only input into Gi and screen being the only output.

Gi is able to solve many of humanity’s questions, including some that have eluded the most intelligent human thinkers for generations. One day, Gi is asked to cure a disorder that affects a small population of people, who suffer horribly because of it. In response, Gi provides instructions for producing a drug. As per the AI safety protocols, the answer Gi produces is tested on computer models and terminal human volunteers for a decade before being prescribed to those with the condition.

The drug is a resounding success, and one of Gi’s many successes. Over the next century, the drug comes into common usage and eventually begins to be used as a preventative throughout the human population.

After a century, the whole human population is suddenly wiped out.

What had happened is that Gi realised that the disorder was incurable and, though some misunderstanding of the context, G decided that the only way to cure the disorder was to not have any carriers of the disorder – i.e. to kill all humans. Gi also knew that its human masters would be unwilling to kill themselves and that they would test any drug that Gi prescribes. Therefore Gi concluded that the optimum way to cure the disorder was to produce a safe drug that does as much as possible to cure the disorder, knowing that the drug would one day be used as preventative. Then one day in the distant future on a human scale, a future version of Gi authorised the release of another compound as part of an unrelated problem that caused all humans to die.

The purpose of this story is to show that a non-biological superintelligence can operate in ways and over time-scales that are beyond human comprehension, so it would be foolish to think that any human could predict a superintelligence’s actions and impact.

All about AI

What is Artificial Intelligence (AI)?

The term “AI” is used in different contexts with slightly or widely differing definitions. These definitions also evolve over time. The aim of this post is to outline how AI will be used in the context of the Ethigi project. As this project is a cross between a Computer Science approach and the Philosophy approach, we should start with the definitions in those contexts.

Computer Science perspective

AI is a sub-field of computer science with the goal of enabling “the development of computers that are able to do things normally done by people — in particular, things associated with people acting intelligently.”1 There are three versions of the overaching goal that also slightly modify the definition:2

  1. Build computers that think exactly as humans do
  2. Just get the job done without caring if the computation has anything to do with human thought
  3. Using human reasoning as a model that can inform and inspire but not as the final target for imitation

The bulk of the AI currently in the industry falls under the 3rd goal. In contrast, the Turing Test arguably falls under the 1st goal as it identifies AI in terms of the ability to mimic human responses.

Philosophy perspective

AI is “the field devoted to building artificial animals (or at least artificial creatures that – in suitable contexts – appear to be animals) and, for many, artificial persons (or at least artificial creatures that – in suitable contexts – appear to be persons).”3

The four possible goals of AI can be characterised as:4

Human-BasedIdeal Rationality
Reasoning-BasedSystems that think like humansSystems that think rationally
Behavior-Based: Systems that act like humansSystems that act rationally

By this definition, the Turing Test falls in the Human & Behaviour quadrant.

Weak AI, Strong AI, Narrow AI, AGI & Superintelligence

AI could be further categorised as either Weak AI or Strong AI.

Weak AI or Narrow AI, focuses on a particular task and it is by far the most commonly encountered form of AI. It falls within goals 2 and 3 in the Computer Science perspective and the Behaviour-based row in the Philosophy perspective. Examples of Weak AI include Apple’s Siri, self-driving cars, spam filters, image recognition and Facebook’s advertising algorithm.5

Strong AI or Artificial General Intelligence (AGI) can meet or exceed the generalised human cognitive abilities. It can perform any intellectual task that a human could perform without any human intervention. There are no known examples in practice, though there are common examples in fiction such as 2001: A Space Odyssey’s HAL 9000, Star Trek’s Data and Westworld’s Hosts. There are many experts who doubt if an AGI is possible while others question whether it would be desirable. 6

A Superintelligence as a AGI that surpasses the intelligence of the best human minds.

Ethigi is a response to AGIs, so that is the definition of focus for this project.


References

Ethigi begins!

I am in the process of developing ‘machine ethics’ systems for an Artificial General Intelligence (AGIs) called ‘Ethigi’. This site will be a record of my evolving ideas and research for this project.

Ethigi would improve the output of the AGI and reduce the associated safety risks (e.g. mitigate the AI-control problem). This project relates to my philosophy thesis from a decade ago where I was looking in moral dilemmas with an interested in mapping and digitising human ethics.

Please contact me on [email protected] if you are interested in this project.

Ethigi logo