01 Aug 2020

Many of the facts and events that we observe can have multiple and even inconsistent interpretations yet people are generally able to make sense of it all. So how does an AI like Zoea deal with this kind of messy complexity?

There are a number of different ways that we can draw inferences from our observations. Mathematicians and philosophers often reason about the world using deduction. Deduction predicts an outcome from a set of observations based on existing rules. The classic example is the rule that 'all swans are white'. So if we read about a swan then using this rule we assume it will be white. Deduction guarantees the truth of the conclusion provided our knowledge of the conditions is accurate and complete. This is also the main problem with deduction. In the real world swans aren't always white.

Induction is another way of reasoning where we create new rules from a set of observations. For example, suppose we see a lot of cars and every one of them is black. In this case induction would allow us to create the rule that 'all cars are black'. Induction is how scientists often reason but it can also perform poorly with unexpected observations or outcomes.

Abduction tries to find the best explanation for an observation given a set of existing rules. For example if an electric light stops working then it is most likely to be a fault with the bulb - but it could also be a power cut or a mouse chewing through the wire. Abduction doesn't necessarily dictate a single outcome nor does it guarantee that the chosen explanation is correct. However these apparent drawbacks are also advantages. The real world is messy and there can be multiple possible explanations for a set of observations. This is why abduction is often used by detectives. It involves evidence, clues and conjecture rather than just facts and rules.

When the Zoea compiler generates a program from a set of test cases it is producing an explanation. The generated program is the best hypothesis that Zoea can come up with that accounts for how all of the inputs are transformed into outputs in every test case. In some simple examples the code is the same for every test case but most programs are more complex.

All non-trivial programs have more than one code path. Each test case encodes the mapping from the input to the output for a single code path as well as at least part of the logical condition under which that code path is selected. Zoea analyses each test case to understand what data manipulation is carried out. When it finds a plausible piece of code for a single test case it also tries that code against all of the other cases.

This results in a set of code fragments and for each fragment a list of applicable test cases. The complete program will include the smallest subset of fragments that covers all of the test cases. Finding this subset is what is called a set-cover problem. Set-cover can take a lot of effort if we have many fragments as the number of possible combinations quickly becomes very large. However we can get an answer more quickly if we start with those with the largest applicable lists and then try to fill the gaps.

The code fragments corresponding to different code paths still need to be combined using some conditional logic. This is usually expressed in terms of absolute or relative comparison of inputs or derived values across the different test cases. A very similar set-cover approach also works to identify the simplest logical expressions that select each of the test cases.

Set-cover is used in a number of other places within Zoea. As such it plays a fundamental role in our abductive reasoning approach.