Optimization Problems

Pascal Francq

January 23, 2012 (April 20, 2011)

Abstract

The area of interest of applied computer science is the solving of problems. There are different kinds of problems, but the ones that we are interested in are the optimization problems where the aim is to find the best solution among all possible ones.

1 Definition

An optimization problem consists to find the best solution among all possible ones. For example, in the Bin Packing Problem (BPP) the aim is to find the right number of boxes of a given size to store a set of objects of given sizes; optimization involves, for example, finding the smallest number of boxes.
It is important to make two distinctions. First, between a problem which refers to a general class, e.g. “Bin Packing”, and an instance representing a special type of a problem, e.g. “the Bin Packing problem involving size 5 boxes for 25 objects of different sizes”. Secondly, two categories of problem classes exist: the abstract problem classes and the concrete problem classes. As its name suggests, the second category refers to problems that have a “concrete existing”, i.e. problems for which instances can be created. The BPP corresponds to this category. Together, it is part also of a more abstract class: grouping problems. With abstract problem classes only, it is impossible to define instances. In fact, as shown in Figure 1↓, abstract and concrete problem classes form a hierarchy of optimizing problems.
figs/OptimizationProblems.svg
Figure 1 Some optimization problems.
An optimization problem can be defined as a finite set of variables, where the correct values for the variables specify the optimal solution. If the variables range over real numbers, the problem is called continuous, and if they can only take a finite set of distinct values, the problem is called combinatorial. In the case of the grouping of user’s profiles, we are dealing with combinatorial optimization problems because the number of communities of interests is finite.
A combinatorial optimization problem is defined [1] as the set of all the instances of the problem, with each instance, , being defined by a pair , where is called the search space, i.e. the set of all possible solutions for an instance, and is a cost function calculated for each solution of and used to determine the performances of each solution.

2 Algorithms

To solve problems it is necessary to develop methods, often called algorithms in computer science, that describe the set of actions to be performed under given circumstances. In one of the definitions found in the literature [2], an algorithm is stated as the list of precise rules that specify “what to do” under all possible conditions. This definition includes the one of the Turing Machine [3], which is an abstract representation of a computing device. Another definition is to describe an algorithm as a finite set of instructions (evaluations and assignations), which leads to a solution.
The complexity, , of an algorithm defines a relationship between the size of an instance, such as the number of objects in the Bin Packing Problem, and the necessary resources to solve it, i.e. the amount of memory and number of CPU cycles required. A complexity of for example signifies that the resources required evolute as the square of the size of the instance, i.e. an instance two times larger than another one needs four times more resources.
An important category of problems consists of the NP-hard ones, for which no polynomial time algorithm has been found so far. With these problems, the CPU time increases exponentially with the size of an instance. In other words, when the size of the problem increases, it becomes impossible to compute all the valid solutions. For example, in a Bin Packing Problem involving 5 objects it is possible to compute all the different solutions to determine the best one. Whereas for 500 objects, it is no longer possible. NP-hard problems can only be solved by specific algorithms which try to reach an optimal solution, or at least a solution as close as possible to one optimal in a reasonable time.

3 Heuristics

When dealing with NP-hard problems it is often necessary to use algorithms that do not guarantee an optimal solution. This class of algorithms is known as heuristics.
A heuristic is an “intuitive” way to find a valid and often reasonably good solution for a given problem in a “reasonable” lapse of time, i.e. a heuristic is based on “rules-of-thumb”, ideas that seem to be helpful in some typical instances though, without providing any guarantee of the quality of the solution.
For example, in the Bin Packing Problem, the first-fit descending heuristic which consists in treating objects in descending order of size and putting one into the first group that can take it, is a well-known heuristic giving good results in such cases (depending on the sizes of the bins), though not necessary the optimal one. The biggest problem about heuristics is that it is strongly instance- and problem-dependent, and that the results may be very poor.
When the essential factor is the time of execution, heuristics is the best solution. For example, a first-fit heuristic is used in an operating system to find a memory zone when a program allocates a given amount of data. In this case, the fact that the solution proposed is not the best one is less important than the time needed. But, because of their drawback, heuristics rarely reach the status of being able to provide a solution when the quality of the solution is important, and cannot be considered as a general approach.

4 Meta-heuristics

The two disadvantages of heuristics are that the solutions proposed can often be very low-quality and are strongly instance- and problem-dependent. Computer science has developed several methods to work-around these disadvantages. All these methods use heuristics in some way or another, but enable the entire search space to be search in; this is the reason why the term meta-heuristics is employed. Most meta-heuristics need cost functions which are, most of the time, a mathematical expression that represents the quality of a solution for a given problem. A brief introduction to simulated annealing and tabu search ends this section. Another meta-heuristic known as the genetic algorithms will be presented in the next sections.

5 Multiple Objective Problems

5.1 Definition

Many real-world problems to be optimized depend on multiple objectives to reach. Increasing the performance of a solution for one objective usually leads to a decrease in the performances for the others. In such situations, it is not possible to construct a suitable mathematical model to represent the problem, i.e. it is not possible to find a cost function representing a single measure of quality for a solution. But, a value for each criterion to be optimized can be computed and the difficulty is to choose a solution which “is good for each criterion”. In fact, a solution which optimizes one criterion may be the worst possible one for the others. Moreover, each criterion may have a particular importance which is expressed through a weight assigned to it. These types of problems are known as multi-criteria decision problems.
An example of a multi-criteria problem is the choice of a car. Different criteria are defined, the price (to be minimized), the consumption (to be minimized), the comfort (to be maximized) and the power (to be maximized). Table 1↓ shows the values for different cars; the best value for each criterion is in bold: no solution seems to be better than any other. Moreover, each person may have his or her preferences, for example the power may be very important while the comfort is not.
Price (k€) Cons. (km) Comfort Power (kW
Good
Good
Excellent
Very good
Table 1 Example of a multi-criteria decision problem.

5.2 Exact and Approximate Problems

I define an exact multi-criteria problem as a problem where the aim is characterized by a set of well-defined criteria; each criterion can be accurately expressed by a function and each function must either be maximized or minimized.
But, it is not always possible to express the quality by a set of well-defined measures. As already pointed out by Kleinberg [4], in the field of information retrieval problems, there is a lack in objective functions that are both concretely defined and correspond to human notions of quality. So, I define a,
So, in the case of the problem dealt with in this thesis, we are facing with an approximate multi-criteria problem as a problem where the aim cannot be accurately characterized in any way. A set of criteria is defined to approximate the characteristics of the target without guaranteeing an exact match between the criteria and the characteristics. Each criterion defines a function that must either be maximized or minimized.

5.3 Solve Multiple Objective Problems

A simple approach to solving multiple objective problems is to transform a multi-criteria decision in a single-criterion problem by aggregating the different objectives to form a single one. One of the techniques used is to compute a linear combination of the criteria values. Once this cost function has been constructed, a method is used to solve the problem by optimizing this single function. This type of solution works when the objectives do not compete, i.e. an improvement of one criterion does not lead to the negative influence of any other criteria. Moreover, it is impossible to apply this approach when the criteria do not have the same “physical existence”, i.e. when the price criterion is expressed in k€ and the power criterion in kW.
Another approach widely used in research into multiple objective optimization is due to Vilfredo Pareto [5]: given a set of solutions, a multi-objective decision is used to select a solution in this reduced space search. The concept of the Pareto front is illustrated in Figure 2↓, where five solutions , , , and are represented for a problem with two criteria, and . The solution, , is not dominated by any other solution, i.e. it is not possible to improve one criterion without downgrading another. This solution is called a Pareto optimal. All the Pareto optimum form the so-called Pareto-optimal front.
figs/pareto.svg
Figure 2 The Pareto optimal front (a) and dominance relations in objective space (b).
The PROMETHEE method [6] is another approach to solve multi-criteria problems.

6 Related

References

[1] Christos H. Papadimitriou & Kenneth Steiglitz, Combinatorial Optimization, Algorithms and Complexity, Prentice-Hall, 1982.

[2] Michael Garey & David Johnson, Computers and Intractability — A Guide to the Theory of NP-completeness, W.H. Freeman Co., 1979.

[3] Alan Turing, “On Computable Numbers, With an Application to the Entscheidungsproblem”, In Proceedings of the London Mathematical Society, pp. 230—265, 1936.

[4] Jon Kleinberg, “Authoritative Sources in a Hyperlinked Environment”, Journal of the ACM, 46(5), pp. 604—632, 1998.

[5] Vilfredo Pareto, Cours d’économie politique, F. Rouge, 1988.

[6] Jean-Pierre Brans & Bertrand Mareschal, ”The PROMCALC & GAIA Decision Support System for Multicriteria Decision Aid”, Decision Support Systems, 12(4‑5), pp. 297—310, 1994.