# Hierarchical Genetic Algorithms

Pascal Francq

June 24, 2013 (May 14, 2010)

Abstract

The Hierarchical Genetic Algorithms (HGA) are a kind of Genetic Algorithms (GA) applicable to solve hierarchical problems.

## 1 Definition

The Hierarchical Genetic Algorithms (HGA) were developed to solve a particular class of hierarchical problems: the tree to build refers to the set theory. In fact, HGA are a genetic framework for placement problems, i.e. each particular problem needs its own customization. As the name suggests, HGA are an extension of the conventional Genetic Algorithms.
Starting from a set of objects, , each object being described by a set a features, the HGA build a hierarchy of nodes where these objects are assigned in order to optimize a given cost function. An example is the building of taxonomy based on concepts extracted from a document corpus.

## 2 Encoding

In the HGA, the chromosomes represents a given tree. The coding scheme includes a node element and an object element. In practice, each chromosomes is represented by:
1. An array of nodes, each node containing the index of the first child node and the number of its child node.
2. An array holds the assignation of each object.
Figure 1↓ proposes an example of a tree defining 6 nodes of 9 objects.
The corresponding chromosome is presented at Figure 2↓. The “left part” of the chromosome represents the description of the hierarchy of nodes. The array begins with the two top nodes ( and ). Each class is described as a pair of . For example, is defined by since the index of the first sub-node () is 1 and it has two child nodes ( and ). The “right part” of the chromosome represents the assignment of the different object. For example, object is assigned to .

## 3 Evaluation

The evaluation of the chromosomes is an important feature of a genetic algorithm. Since the goal is to obtain a balanced tree, an idea is to minimize the average length of the path to reach each object. To compute the length of a given path, I propose to sum the depth of the path (the number of nodes to traverse) with the number of choices at each intermediate node. Let us define as the number of child nodes of ( is the number of top nodes) and as the number of objects assigned to node . Moreover, let us define a cost, , for each node, :
where is a function returning the parent node of and is the root node.
The cost function, , to minimize is defined as:
where is a function returning the node where object is assigned. Figures 3↓ and 4↓ show two examples of tree. When comparing and , it seems clear that is more unbalanced than .
Let us compute the cost function for these trees:
As expected, .

## 4 Heuristic

Since the GA are meta-heuristics, the heuristics play an important role and must be adapted for the particularity of the problem. This section presents the heuristic used by the HGA. Algorithm 1↓ shows the generic idea : the objects are randomly selected and a node in the tree is searched that can hold the current object.
Require:
Tree

Build a random array of unassigned objects of ,
for ():

Algorithm 1 The heuristic, .
Algorithm 2↓ describes the method that searches for a node that will hold an object ( is a function returning the list of attributes of an object or a node). The basic idea is to find the “first highest” node that can hold a given object. The method parses each level in the hierarchy to find a node that can hold a given object. To be selected, a node must share at least one attribute with the object. Three situation may occur:
1. The node has exactly the same attributes as the object : The object will be attached to it.
2. All the attributes of the node are attributes of the object : The method looks further in the branch of the tree corresponding to it.
3. Some attributes of the node are attributes of the object : an intermediate node with the common attributes must be created.
Of course, if no node shares some attributes with the object, a new root node must be created.
Require:
Object to assigned and a tree

{Start at the root node}
loop:
Build a random array containing all the child nodes of ,
False

for ():

if ():
continue
if (): {Good branch was found}
if ():
return

True
break

if ():

if ( True):
break

if (Node=0):
return

{ contains some attributes not representing the object.}

if ( and ()):
return
{Create a new node attached to }
{ has as child}
return { has a new node containing the object}
Algorithm 2 Find a node for object , .

## 5 Crossover

The role of the crossover operator is to construct a new chromosome using two parents. If the operator mix “good patterns” from both parents, the child chromosome should be a “good” one. The crossover proposed for the HGA combines the trees of both parent based on a crossing site defined as two nodes (one in each parent) having the same attributes. It is then possible to replace the branch of one tree with the other. In practice, the operator verifies that an object is not assigned twice and run the heuristic if there remains unassigned objects. Algorithm 3↓ describes the crossover operator ( is a function returning the list of objects attached to a node).
Require:
Two parent trees, and , and a child tree,

Build a random array containing all the nodes of ,
Build a random array containing all the nodes of ,

for ():
for ():
if ():

break

if ():

else:
print “Error: no crossing site”
Algorithm 3 Crossover of the HGA
An important element is that, if a crossing site is not found (there are not two nodes with the same attributes from each tree), the crossover cannot be done.

## 6 Mutation

Te role of the mutation operator is modify a given chromosome to explore new portions of the search space. Most of the time, it is implemented as a little modification of the chromosome. I apply this idea by choosing randomly a node and delete it, and using then the heuristic to attach the unassigned objects (Algorithm 4↓).
Require:
Tree

Choose randomly a node,

Algorithm 4 Mutation of the HGA

## 7 Inversion

The inversion, whose role consists in presenting differently a given chromosome without changing its corresponding information, is not implemented in the HGA, because all the operators are based on the “interpretation” of the chromosomes rather than on the encoding of the chromosomes.