Building GALILEI Platform

Pascal Francq

November 23, 2014 (January 11, 2011)

Abstract

This document describes how the download, compile and install the GALILEI platform.
Several steps must be follow to download, compile and use the GALILEI platform (libraries, plug-ins and applications). All the source code is available through our subversion server svn.otlet-institute.org.

1 The GALILEI Platform

The GALILEI platform is a open source implementation of the GALILEI framework. Currently, the platform is divided in three layers:
  1. The GALILEI library (section 2↓) provides an C++ API to manage objects (such as documents or profiles) and processes (such as the document clustering).
  2. Plug-ins (section 3↓) implements several algorithms to solve specific problems (for example particular genetic algorithms for document or profile clustering). Users can develop their own plug-ins to implement new algorithms.
  3. Two applications (section 4↓) are available to pilot the GALILEI platform: one with a graphical user interface (KGALILEICenter) and one that can be used for batch jobs (upgalilei).
All official libraries and applications related to the GALILEI platform are managed through cmake. The GALILEI library classes are defined in the GALILEI namespace.

2 The GALILEI library

The GALILEI library depends only from the libraries of the R Project. So the first step is to compile these libraries. Next, you must build the GALILEI library. The Qt toolkit (with development files) is necessary if you want to compile graphical widgets. You can download the source code from our subversion server:
svn co svn://svn.otlet-institute.org/home/subversion/galilei/trunk galilei
To build the GALILEI library, you must go to the directory, create a sub-directory (for example ’build’) and go into it. In this latest directory, you must type:
cmake .. [OPTIONS] 
make or make VERBOSE=1
Two interesting options are:
-DCMAKE_BUILD_TYPE=Debug This option generates all the necessary debugging symbols.
-Ddisable-qt=true This option disables the support for Qt.
As for the R libraries, you can use the GALILEI library in two different ways:
  1. The library and the include files are installed (for example in /usr/include/r and /usr/lib). This can be done after the compilation with the command:
    sudo make install
    
  2. The library can be used without to be installed. You may create an environment variable GALILEI_LIB pointing to the root directory containing the GALILEI library.
The simplest method to include the GALILEI library in your application is to follow the steps described for using the R libraries. In practice, you have to add a line in the “prj.cmake” file:
R_LOAD_DEPENDENCY("GALILEI" "GALILEI_LIB" "galilei")
In the code sub-directory of your application, you must specify in the “CMakeLists.txt” file that it must be linked with the GALILEI library:
TARGET_LINK_LIBRARIES(testgalilei rcore rmath roptimization galilei)

3 The Plug-ins

Several plug-in projects are available, each project provides one or several plug-ins. The compilation procedure is identical for all the plug-ins. The Qt toolkit is necessary to compile the graphical parts (configuration dialog boxes). To download the source code of a project “prj”:
svn co svn://svn.otlet-institute.org/home/subversion/prj/trunk prj
To build the project, you must go to the directory, create a sub-directory (for example ’build’) and go into it. In this latest directory, you must type:
cmake .. [OPTIONS] 
make or make VERBOSE=1
You may specify two options:
-DCMAKE_BUILD_TYPE=Debug This option generates all the necessary debugging symbols.
-Ddisable-qt=true This option disables the support for Qt.
All plug-ins don’t need to be installed. The GALILEI platform manages a list of directories where it must search recursively for plug-ins. Moreover, there some dependencies between plug-ins. Actually, the "official" plug-in projects are:
clustering-evaluation It proposes a plug-in that implements three measures to evaluate the different clustering algorithms for validation purposes: the adjusted Rand index, the recall and the precision. This plug-in is only useful for researchers.
featureseval It provides plug-ins to measures that evaluate the distribution weight of a given concept in the a set of objects (documents, profiles, etc), such as the idf factor. This plug-in needs that a current plug-in is selected for the measure of the category “Features Evaluation” (for example featureeval).
feedback It provides a plug-in that computes the profile descriptions. This plug-in needs that a current plug-in is selected for the measure of the category “Features Evaluation” (for example featureeval).
filters It supplies several plug-ins, each plug-in provides a filter (e-mail, HTML, MS-DOC, PDF, PostScript, RTF and plain text) used to build the document descriptions. To compile this project, the libwv2 library must be installed.
gca It affords plug-ins for document and profile clustering. Two plug-in uses the Similarity-based Grouping Genetic Algorithm, and the two others the Nearest Neighbors Grouping Genetic Algorithm. These plug-ins need that a current plug-in is selected for the measure of the categories “XXX Agreements”, “XXX Disagreements” and ““XXX Similarities” where XXX is an object type that can be clustered (Document, Profile, etc.). The plug-in multi-space implements such measures.
gmysql It proposes a plug-in to manage a MySQL database server. It is the only storage medium actually supported.
gravitation It provides two plug-ins to compute the community and topic descriptions. This plug-in needs that a current plug-in is selected for the measure of the category “Features Evaluation” (for example featureeval).
featureseval It supplies a plug-in that implements two concept weighting models : an adaptation of the tf/idf model and an adaptation of the log/entropy model.
kmeans It affords two plug-ins that implements a k-Means algorithm to cluster documents and profiles.
langs It proposes various plug-ins for different languages (Arabic, German, Danish, English, Spanish, Finnish, French, Hungarian, Italian, Dutch, Norwegian, Portuguese, Romanian, Russian, Swedish and Turkish). Each language plug-in provides a stemming algorithm a set of stopwords.
metaenginesum This plug-in proposes a simple method to aggregate the results from several engines [A]  [A] In the GALILEI platform, a “search engine” is a plug-in that takes a query as argument a returns a list of fragments, each fragment having a score in [0,1]. One may develop a plug-in that uses a popular online search engine to do searches on the Web, but he must handle the results (for example, ensuring that each document retrieved is properly created in the GALILEI platform by the plug-in) and grant an agreement from the company providing that search engine.: the score of a document fragment retrieve is a linear combination of its scores from all search engines.
multi-space It supplies several plug-ins, each plug-in implements the similarity measure of the tensor space model for the documents, the profiles and/or the communities. This plug-in needs that a current plug-in is selected for the measure of the category “Features Evaluation” (for example featureeval).
statfeatures It proposes a plug-in to compute some statistics on the concepts extracted from the documents. This plug-in is only useful for researchers. This plug-in needs that a current plug-in is selected for the measure of the category “Features Evaluation” (for example featureeval).
statsims It affords a plug-in to compute statistics on the objects similarities. This plug-in is only useful for researchers. This plug-in needs that a current plug-in is selected for the measure of the category “Features Evaluation” (for example featureeval). It also needs that a current plug-in is selected for the measure of the categories “XXX Agreements”, “XXX Disagreements” and ““XXX Similarities” where XXX is an object type that can be clustered (Document, Profile, etc.). The plug-in multi-space implements such measures.
subslevel It supplies a plug-in that implements a simple method to rank documents and profiles in a community.
sugs It proposes a plug-in that implements a simple document suggestion method for profiles.
textanalyze It proposes different plug-ins that extract tokens from textual content, reduce the indexing space by stemming them and, eventually, filter them. These plug-ins are used when the document descriptions must be computed.
xmlengine The plug-ins implements an algorithm that retrieve and rank a set of fragments from a corpus of structured text (such as XML documents). This plug-in needs that a current plug-in is selected for the measure of the category “Features Evaluation” (for example featureeval).

4 The applications

Actually, the GALILEI platform provides two applications:
QGALILEI QGALILEI is a Qt-based application that is supposed to help developers to monitor a GALILEI instance. You can get the source code from our subversion server:
  • svn co svn://svn.otlet-institute.org/home/subversion/qgalilei/trunk qgalilei
    
UpGALILEI UpGALILEI is a program that runs a script (using the internal scripting language). It may be useful to launch tests or automatic processes. The source code can be downloaded from our subversion server:
  • svn co svn://svn.otlet-institute.org/home/subversion/upgalilei/trunk upgalilei
    
To build an application, you must go to the directory, create a sub-directory (for example ’build’) and go into it. In this latest directory, you must type:
cmake .. [OPTIONS] 
make or make VERBOSE=1
You can add the option -DCMAKE_BUILD_TYPE=Debug to add the debugging symbols to your application. To install the application, you must add the additional command:
sudo make install