Binary Line Generalisation Tree

The algorithm

The Douglas-Peucker algorithm is the most popular method to reduce the number of vertices in a digital polyline, and the BLG-tree is one way to implement this algorithm. The Douglas-Peucker algorithm calculates the point with the largest distance to a line drawn between the startingpoint and the endingpoint (see figure below), and repeats recursively to both sides with the point with the largest distance to a line between the start-point and the found point and the largest distance to a line between the found point and the endingpoint. In a BLG-tree the result of this algorithm is stored in a binary tree with the root beeing the point with the largest distance to a line between the startingpoint and endingpoint. When the algorithm is completed, all the points are stored in the tree along with their error-value, except for the startingpoint and the endingpoint which are implicit.

Example of a polyline and its BLG-tree:

[a] Polyline

[b] BLG-tree

The most coars approximation of a polyline is the line between the startingpoint and the endingpoint of the polyline. The error of this approximation is determined by the distance between this line and the point in the original polyline that has the largest distance to the line. The next approximation is formed by two lines, one from the startingpoint to the point with the largest distance and one from this point to the endingpoint. This will generally be a more accurate representation.

In a BLG-tree, where all the nodes are stored according to their error-value, an approximation is generated by performing a recursive search until the error is so small that the line segment it represents shall not be included in the result. In a "normal" situation, the error-values in the tree are descending so that all the children of one node have error values that are smaller than their parent's. However, this is not always the case.

Example of a polyline leading to increasing error values in a BLG-tree:

[a] Polyline

[b] BLG-tree

The worst-case complexity of building a BLG-tree is O(N²), where N is the number of points in the polyline. The worst-case occurs when the points with the largest distance to the lines always is at one end of the segment of the polyline, so that all the points end up in one subtree at every level of the BLG-tree. The complexity of creating an approximation of a BLG-tree is O(logN + R), where N is the number of points in the polyline and R is the number of points returned.

When creating approximations of a map consisting of more than one polyline, one BLG-tree must be created for each polyline. This means that when an approximation is to be created, all the BLG-trees must be searched. To make this a bit more efficient, sort the BLG-trees according to the error-value stored in the root of each tree. This means that when a tree has an error value that is too small, it is not necessary to search the rest of the trees.

Implementation

Below is the class diagram of the implementation of the BLG-tree. The BLG class creates a BLG-tree consisting of GeoNodes, and the BLG-test class is, as the name implies, a class to test the implementation of the BLG-tree.

In my implementation, the BLG-tree is based on an ArrayList consisting of GeoNodes which is a class I have written to be able to store and retreive information about the points that are read from the inputfile. The BLG_test class uses the Filehandler class to read the points from the inputfile, creates an ArrayList of eachs polyline in the file, and sends the ArrayList to the BLG class to create a BLg-tree.

The BLG class creates a tree if the ArrayList contains three or more GeoNodes. As the startpoint and the stoppoint of each polyline is implicit, and therefore not included in the BLG-tree, three points is the absolute minimum amount of points needed to be able to create a tree. The actual building of the tree quite simply calculates which point in the polyline that is furthest away from a line drawn between the startpoint and the stoppoint, makes this the root of the tree, and repeats recursively to both sides until all the points have been added to the tree.

To calculate the distance between the line and the point, I have used the following procedure:

The line is decided by two points: P1(x₁,y₁) and P2(x₂,y₂)
The point to which the distance is to be determined: P3(x₃,y₃)
Calculates three constants:
A = (y₂-y₁)/(x₂-x₁)
B = -1
C = (-(A * x₁) + y₁)
The distance is calculated by the following equation:
D = |A*x₃ + B*y₃ + C|/(√(A² + B²))

The approximations are generated by performing a search and comparing the error values in the GeoNodes with a given error value as the tree is traversed. As long as the error values in the GeoNodes are larger than the given error value, the GeoNodes are added to the output list. When an error value is too small the recursion in that subtree stops.

Testing

In this section I provide examples of what the results of this algorithm can look like. This program needs an input file and a value to determine the level of detail in the approximation. The first figure below shows the input data used to create the approximations shown in the following three figures. The input file is 230 KB, and the following table shows the amount of data produced by the approximations: (You can zoom in on details in the images by holding in ctrl and dragging the mouse to a square.)

Approximation	none	0.1	0.5	1.0
KiloBytes	230	200	94	42
Number of points	7714	6903	3202	1379

The input data used to generate the approximations.

Approximation with an error value of 0.1

Approximation with an error value of 0.5

Approximation with an error value of 1.0

Problems

The fact that the error values in the BLG-tree are not necessarily decreasing is the most obvious problem with this algorithm. However, in most cases this does not lead to any fatal visible errors in the output. As any result from this algorithm is an approximation of the original data, whether or not a couple of points that should have been included are left out is in most cases irrelevant.

Another problem, that really is more of an implementation problem than an error in the original algorithm, is how the distance between the lines and the points are calculated. In my implementation, and in many others, when the point is placed away from the line, the distance is calculated to a prolongation of the original line. This problem is illustrated in the figure below.

Error in calculating the distance

But as this error generally makes the distance to the point shorter, this error will not lead to any points beeing excluded that should be included, rather the opposite.

My main project is about creating an efficient method for combining generalisation with orthogonal range search on-the-fly, and the question is whether this generalisation algorithm is useful in that context. I have developed a double tree-structure to order the geographical points, and the objective of this module was to determine whether this type of tree could be included in that structure. However, since we need one such tree for each polyline, and the nodes in my structure are organized according to their x- and y-coordinates, with no regards to polylines at all, I am afraid I don't see how this can be done. Another issue is that with the need for many BLG-trees, this algorithm is not that much more efficient than the one I have already implemented.

	GeoNode.java
	BLG.java	BLG_test.java
	HeapSortAlgorithm.java	Filehandler.java