News-220130

30 Jan 2022

Zoea contains a lot of knowledge about programming language concepts and how they can be utilised to produce code. It turns out that the way this knowledge is represented and structured is surprisingly simple.


Coding knowledge in Zoea is organised as a set of software components called knowledge sources. There are many different knowledge sources in Zoea and each one deals with a particular kind of software element such as a statement, condition, control or data structure. Each knowledge source individually figures out where and how it can be applied, and collectively all knowledge sources work together to determine the overall structure of the required program.


Knowledge sources only communicate with the user and with each other through a central shared database called the blackboard. In broad terms the blackboard is used to track the automatic transformation of a set of test cases into the equivalent code. In order to produce a solution different knowledge sources come up with many hypotheses about different fragments of the required code. These hypotheses are expressed and verified through the creation of synthetic test cases. Effectively the knowledge sources break each problem down into many smaller sub-problems that can be solved more easily.


The input to a Zoea knowledge source is always a set of test cases. These might be the test cases provided by the user or synthetic test cases created by another knowledge source - it makes no difference either way. The first thing that every knowledge source does is check whether it is relevant in any way to the current test cases. This step is called the trigger and minimally it involves checking the number and data types of the inputs and outputs in the test case. For example, a knowledge source that deals with arrays will want to see at least one example of data that looks like an array in the input. If a knowledge source determines it is not appropriate for the current test case then it does no further work. However, for any given test case there will always be a subset of knowledge sources that are appropriate.


Once a knowledge source has established that it is suitable for a given set of test cases then the next step is to identify one or more combinations of input and output elements to focus on. For example, say we have a knowledge source that accepts two numbers as input and we provide test cases with three numeric inputs. In such cases the knowledge source will consider different combinations of the inputs.


Every knowledge source in Zoea corresponds to a piece of code. This code is generally just a fragment of the complete solution for the current set of test cases. It also represents a hypothesis in the sense that it may or may not be the correct piece of code. Assuming it is correct then the knowledge source must also provide a way to generate the rest of the code that is required. This is done by producing a new set of synthetic test cases that are derived from input test cases. For each knowledge source the way in which this is done is both simple and mechanical. For example, a knowledge source that deals with array inputs might create a new synthetic test case for each element of the array. We call this the synthetic test case mapping. Assuming that the code for this knowledge source is correct then the synthetic test cases represent the problem of generating the rest of the required code.


Synthetic test cases are processed by Zoea in exactly the same way as regular test cases. This means that our synthetic test case will be processed by all the relevant Zoea knowledge sources and eventually - assuming the current hypothesis is correct - we will see some code solutions for the synthetic test cases appear on the blackboard. When this happens then the last thing that our knowledge source must do is to plug the code for the synthetic test case into the code template for the knowledge source. Often this will involve some transformation of the STC code but again this is generally a simple and a mechanical process.


We now have a complete piece of code that will turn at least some of the inputs into one of the outputs in at least one of the test cases provided. In itself this usually isn't the complete solution but it is a long way down the road towards achieving it. The rest is another story.