SORT ORDER PROBLEMS IN RELATIONAL DATABASES
Abstract
A relation of degree k can be sorted lexicographically in k! different ways, i.e., according to any one of the possible permutations of the schema of the relation. Such permutations are referred to as sort orders. When evaluating unary and binary relational algebra operators using sort-merge algorithms, sort orders fulfilling the constraints enforced by the operators are chosen for the operand relations. The relations are then sorted according to their assigned sort orders, and the result is obtained by merging. Should the operands already be sorted according to one of the permissible sort orders, then only a merging is required. The sort order of the result will depend on the sort orders of the operands.
When evaluating whole relational algebra expressions, the result of one operation will be used as an operand to the next. It is desirable to choose sort orders in such a way that the result of one operation will automatically fulfill the requirements of the next. In general, one would like to find a minimal number of operators in the expression for which this cannot be obtained, bearing in mind the overall goal of minimizing the total work.
We show that this problem is NP-hard, and that the corresponding decision problem is NP-complete. However, most simplifications of the original problem give rise to efficient algorithms. In fact, most frequently occurring queries can be analyzed in linear time in the size of the query. This is due to the fact that only a very limited number of subsets of all permutations of schemas can be encountered in the algorithms, which means that compact representations for these subsets can be found.