genome compiler
Relationship with the state of the art

GP is a computationally demanding task, so its application to complex problems is limited by the available computational power. It is, therefore, imperative to increase the efficiency of GP.

This has led some researchers to focus on "Low Cost EC's", and propose a variety of approaches to enhance the speed of GP [Ochoa 97]. However this is frequently achieved at the expense of the evolved solutions' quality.

Recently, two methods were proposed that don't hinder the efficacy of GP:

- Directly evolving machine-code programs [Nordin 94];

- Online compilation of the individuals [Fukunaga 98].

In spite of the huge speed enhancements achievable with these approaches, they haven't become popular mainly because they are difficult to implement and to adapt to different problems. The recent release of Discipulus, a commercial implementation of Nordin's approach, may bring efficient GP to a broader number of researchers, but at the cost of $3995 for one year licence.

In standard GP approaches, the evaluation of an individual implies the transversal of its tree and calling, for each node, the appropriate function [Koza 92]. Usually these functions are very simple and may be evaluated through a single machine-code instruction. By evolving directly machine-code programs, Nordin avoids the interpreting stage, thus multiplying the speed of the system by 60. Fukunaga's system, in spite of the overhead involved in the compilation stage, is 50 times faster than standard ones in tasks where the individuals must be evaluated for several fitness cases [Fukunaga 98].

The gains of efficiency of these systems are mainly due to the removal of unnecessary function calls, not by an increased efficiency in the execution of the operations of the f-set. Thus, if we increase the complexity of the f-set functions, the speed improvements will decrease.

One of the objectives of our project is to overcome this drawback. The basic idea is to decrease the number of operations required to evaluate the individual.

To achieve this, we will resort to a collection of techniques. Some of them are widely known (e.g. optimisation, intron detection) but have never been applied in this context.

In Machado [99] we proposed the caching of the results of the execution of individual's sub-trees, and developed caching algorithms suited to GP demands. When used alone this approach made a standard C implementation 3 times faster, when we used it in conjunction with T-functions [Machado 99] the system became 17 times faster.

Our system has potential to outperform previous approaches in the above mentioned situation (complex f-set). We also believe that it will outperform them in complex domains. As noticed by Banzaf [97] the number of introns (and so the number of unnecessary operations) grows exponentially, hence the performance of the Fukunaga's and Nordin's systems will degrade, when compared with our system, as the number of population increases.

It is important to notice that these approaches weren't thoroughly tested, e.g. the only performance measures of Fukunaga's system relate with the symbolic regression of f(x)=x^8. We intend to conduct a comprehensive set of experiments in order to assess the performance of our system, identify its strengths and shortcomings, and evaluate its usability.

Our system will allow the inclusion of user written C functions in the f-set, which is extremely useful both in terms of usability and applicability.