Range searching

In computer science, the range searching problem consists of processing a set S of objects, in order to determine which objects from S intersect with a query object, called the range. For example, if S is a set of points corresponding to the coordinates of several cities, find the subset of cities within a given range of latitudes and longitudes.

The range searching problem and the data structures that solve it are a fundamental topic of computational geometry. Applications of the problem arise in areas such as geographical information systems (GIS), computer-aided design (CAD) and databases.

Variations edit

There are several variations of the problem, and different data structures may be necessary for different variations.^[1] In order to obtain an efficient solution, several aspects of the problem need to be specified:

Object types: Algorithms depend on whether S consists of points, lines, line segments, boxes, polygons.... The simplest and most studied objects to search are points.
Range types: The query ranges also need to be drawn from a predetermined set. Some well-studied sets of ranges, and the names of the respective problems are axis-aligned rectangles (orthogonal range searching), simplices, halfspaces, and spheres/circles.
Query types: If the list of all objects that intersect the query range must be reported, the problem is called range reporting, and the query is called a reporting query. Sometimes, only the number of objects that intersect the range is required. In this case, the problem is called range counting, and the query is called a counting query. The emptiness query reports whether there is at least one object that intersects the range. In the semigroup version, a commutative semigroup (S,+) is specified, each point is assigned a weight from S, and it is required to report the semigroup sum of the weights of the points that intersect the range.
Dynamic range searching vs. static range searching: In the static setting the set S is known in advance. In dynamic setting objects may be inserted or deleted between queries.
Offline range searching: Both the set of objects and the whole set of queries are known in advance.

Data structures edit

Orthogonal range searching edit

A 2D orthogonal range query. In this case, a range reporting query would return the two circled points, a range counting query would return 2, and an emptiness query would return false.

In orthogonal range searching, the set S consists of $n$ points in $d$ dimensions, and the query consists of intervals in each of those dimensions. Thus, the query consists of a multi-dimensional axis-aligned rectangle. With an output size of $k$ , Jon Bentley used a k-d tree to achieve (in Big O notation) $O(n)$ space and $O{\big (}n^{1-{\frac {1}{d}}}+k{\big )}$ query time.^[2] Bentley also proposed using range trees, which improved query time to $O(\log ^{d}n+k)$ but increased space to $O(n\log ^{d-1}n)$ .^[3] Dan Willard used downpointers, a special case of fractional cascading to reduce the query time further to $O(\log ^{d-1}n+k)$ . ^[4]

While the above results were achieved in the pointer machine model, further improvements have been made in the word RAM model of computation in low dimensions (2D, 3D, 4D). Bernard Chazelle used compress range trees to achieve $O(\log n)$ query time and $O(n)$ space for range counting.^[5] Joseph JaJa and others later improved this query time to $O\left({\dfrac {\log n}{\log \log n}}\right)$ for range counting, which matches a lower bound and is thus asymptotically optimal.^[6]

As of 2015, the best results (in low dimensions (2D, 3D, 4D)) for range reporting found by Timothy M. Chan, Kasper Larsen, and Mihai Pătrașcu, also using compressed range trees in the word RAM model of computation, are one of the following:^[7]

$O(n)$ space, $O(\log ^{\epsilon }n+k\log ^{\epsilon }n)$ query time
$O(n\log \log n)$ space, $O(\log \log n+k\log \log n)$ query time
$O(n\log ^{\epsilon }n)$ space, $O(\log \log n+k)$ query time

In the orthogonal case, if one of the bounds is infinity, the query is called three-sided. If two of the bounds are infinity, the query is two-sided, and if none of the bounds are infinity, then the query is four-sided.

Dynamic range searching edit

While in static range searching the set S is known in advance, dynamic range searching, insertions and deletions of points are allowed. In the incremental version of the problem, only insertions are allowed, whereas the decremental version only allows deletions. For the orthogonal case, Kurt Mehlhorn and Stefan Näher created a data structure for dynamic range searching which uses dynamic fractional cascading to achieve $O(n\log n)$ space and $O(\log n\log \log n+k)$ query time.^[8] Both incremental and decremental versions of the problem can be solved with $O(\log n+k)$ query time, but it is unknown whether general dynamic range searching can be done with that query time.

Colored range searching edit

The problem of colored range counting considers the case where points have categorical attributes. If the categories are considered as colors of points in geometric space, then a query is for how many colors appear in a particular range. Prosenjit Gupta and others described a data structure in 1995 which solved 2D orthogonal colored range counting in $O(n^{2}\log ^{2}n)$ space and $O(\log ^{2}n)$ query time.^[9]

Applications edit

In addition to being considered in computational geometry, range searching, and orthogonal range searching in particular, has applications for range queries in databases. Colored range searching is also used for and motivated by searching through categorical data. For example, determining the rows in a database of bank accounts which represent people whose age is between 25 and 40 and who have between $10000 and $20000 might be an orthogonal range reporting problem where age and money are two dimensions.

References edit

^ Agarwal, P. K.; Erickson, J. (1999), "Geometric Range Searching and Its Relatives", in Chazelle, Bernard; Goodman, Jacob; Pollack, Richard (eds.), Advances in Discrete and Computational Geometry: proceedings of the 1996 AMS-IMS-SIAM joint summer research conference, Discrete and Computational Geometry--Ten Years Later, July 14-18, 1996, Mount Holyoke College, Contemporary Mathematics, vol. 223, American Mathematical Society Press, pp. 1–56
^ Bentley, Jon (1975). "Multidimensional binary search trees used for associative searching". Communications of the ACM. 18 (9): 509–517. doi:10.1145/361002.361007. S2CID 13091446.
^ Bentley, Jon (1980). "Multidimensional divide-and-conquer". Communications of the ACM. 23 (4): 214–229. doi:10.1145/358841.358850. S2CID 3997186.
^ Willard, Dan (1985). "New data structures for orthogonal range queries". SIAM Journal on Computing. 14 (1): 232–253. doi:10.1137/0214019.
^ Chazelle, Bernard (1988). "A functional approach to data structures and its use in multidimensional searching". SIAM Journal on Computing. 17 (3): 427–462. CiteSeerX 10.1.1.133.9153. doi:10.1137/0217026.
^ JaJa, Joseph; Mortensen, Christian; Shi, Qingmin (2005). "Space-efficient and fast algorithms for multidimensional dominance reporting and counting". International Symposium on Algorithms and Computation: 558–568.
^ Chan, Timothy; Larsen, Kasper; Pătrașcu, Mihai (2011). "Orthogonal range searching on the RAM, revisited". Symposium on Computational Geometry: 1–10. arXiv:1103.5510.
^ Mehlhorn, Kurt; Näher, Stefan (1990). "Dynamic fractional cascading" (PDF). Algorithmica. 5 (2): 215–241. doi:10.1007/BF01840386. S2CID 7721690.
^ Gupta, Prosenjit; Janardan, Ravi; Smid, Michiel (1995). "Further results on generalized intersection searching problems: Counting, reporting, and dynamization". Journal of Algorithms. 19 (2): 282–317. doi:10.1006/jagm.1995.1038. hdl:11858/00-001M-0000-0014-B721-F.