Talk:Bitmap index

Latest comment: 2 years ago by Locke Cole in topic Removed content
WikiProject iconDatabases C‑class (inactive)
WikiProject iconThis article is within the scope of WikiProject Databases, a project which is currently considered to be inactive.
CThis article has been rated as C-class on Wikipedia's content assessment scale.

Something to note? edit

Should we note that bitmap indexes are good for equality operations in WHERE clauses, but not so good for less than or greater than operations? - Ta bu shi da yu 15:45, 10 November 2007 (UTC)Reply

Huh, in what way aren't they "good" for < or >? -- intgr [talk] 23:20, 10 November 2007 (UTC)Reply
"Bitmap indexes are also not suitable for columns that are primarily queried with less than or greater than comparisons. For example, a salary column that usually appears in WHERE clauses in a comparison to a certain value is better served with a B-tree index. Bitmapped indexes are only useful with equality queries, especially in combination with AND, OR, and NOT operators." Oracle Concepts document. Reread what they do and I'm sure you'll work out why this is the case. - Ta bu shi da yu 08:42, 11 November 2007 (UTC)Reply
Bitmap indexes can be generated for any logical operation; e.g. you can have a bitmap index for the expression 'salary > 10000' and every time you query WHERE salary > 10000, this bitmap will be useful. I have no idea why Oracle's documentation singles out equality operations as the only use case for bitmap indexes — it isn't. -- intgr [talk] 21:12, 11 November 2007 (UTC)Reply
I suppose that it's to do with a higher cardinality. My understanding of bitmap indexes is that the higher the cardinality (or percentage of unique elements) then the less efficient in storing the data in the index. Could be wrong. - Ta bu shi da yu 02:05, 12 November 2007 (UTC)Reply
Ok I see where's the confusion. When the Oracle documentation talks of bitmap indexes, they actually imply that there will be a separate bitmap for every possible value in an indexed column; for example, one for gender='M', another for gender='F' — for a cardinality of two, 2 bitmaps will be created. When you perform either of these queries, the planner recognizes these expressions and uses the appropriate one of the indexes.
In a similar way, you could create bitmap indexes for two expressions salary > 10000, salary > 20000, and there would be just two bitmaps. However, to use this approach, you have to create indexes for both of the expressions manually, because the DBMS cannot know which expressions your queries will be using. And if the query expression doesn't exactly match any of the index expressions, no indexes can be used. -- intgr [talk] 11:34, 12 November 2007 (UTC)Reply
Ah, I follow. Maybe this is something we should add to the article. - Ta bu shi da yu 12:19, 12 November 2007 (UTC)Reply

Oracle uses BBC to compress bitmap index, it stores each compressed bitmap for each distinct value. This kind of bitmap doesn't support range queries well. Chan had introduced range or interval encoding schemes for range queries and two-sided range queries, which can improve range query performance. In the above example," A>10000", if using range encoding, it only needs two bitmap to answer that query. In oracle it may have to use 10000 bitmaps, so it is slow.

zhuo wang —Preceding unsigned comment added by 218.25.35.20 (talk) 07:43, 10 June 2008 (UTC)Reply

Oracle's compressed bitmap indexes in fact works very well even when the indexed column has many distinct values. This is one of the main points of reference # 1.

Compressed bitmap indexes are very efficient for range queries such as "A > 10000", reference #4 and [1] give proofs in terms of theoretical analysis and timing measurements. In fact, the timing measurements published there are exclusively on range queries (not equality queries). Even though it might involve ORing a lot of bitmaps to answer such a query, each bitmap must be very small (after compression) in this case. Since these bitwise logical OR operations are very efficiently, the answer can be computed quickly.

A historical note: in Dr. O'Neil's paper on Model 204, Query 3 involves a range condition "Ownername > 'Z'" -- even the first commercial implementation of bitmap index did not shy away from range conditions. -- user:oaf2 21:02, 26 August 2008 (UTC)Reply

Usage edit

See Prof. Lemire's comment on the usability of bitmap indices. --Ragib (talk) 15:01, 20 August 2008 (UTC)Reply

I just read the article by Vivek Sharma that Daniel Lemire linked to in his post. I think that anyone who is editing this page would do well to read it. Dtunkelang (talk) 15:19, 20 August 2008 (UTC)Reply

Dubious edit

For a table with n columns, the total number of distinct indexes to satisfy all possible queries is n!

Not that it matters much, but shouldn't this be 2n-1? Set of all possible indexes is the set of all column subsets, i.e. a power set (minus one, because you can't have a zero-column index). GregorB (talk) 15:16, 15 April 2009 (UTC)Reply

You also have to consider the order of the columns. According to the PostgreSQL documentation about multicolumn indexes:
The query planner can use a multicolumn index for queries that involve the leftmost column in the index definition plus any number of columns listed to the right of it, without a gap. For example, an index on (a, b, c) can be used in queries involving all of a, b, and c, or in queries involving both a and b, or in queries involving only a, but not in other combinations.
This means that, to be able to answer every possible query using indexes, you need one for each possible column permutation, that is, n! --Pezezin (talk) 21:30, 5 May 2009 (UTC)Reply


Order of columns is relevant not only for PostgreSQL. You cannot easily use an index that accesses columns A,B,C (in that order) to execute a query that uses B and C: http://richardfoote.wordpress.com/2008/03/10/index-skip-scan-does-index-column-order-matter-any-more-warning-sign/ --194.139.55.62 (talk) 05:52, 12 May 2009 (UTC)Reply

You're correct, e.g. SQL Server and Oracle will use the index for any leading set of columns. However, my line of thinking was this: a WHERE clause can constrain an n-column query result in at most 2n-1 ways. For each of these WHERE clauses you need exactly one index, covering all the columns used by the clause. So, even if databases could not use leading sets of columns (i.e. even if your WHERE clause columns had to match precisely those in the index), 2n-1 indexes would still be enough. GregorB (talk) 15:06, 10 June 2009 (UTC)Reply

Correct number is C(n/2, n). See here and here for the strict proof (in Russian) Abolen (talk) 18:17, 2 July 2009 (UTC)Reply

Thanks! I've expanded the formula in the text - being a little rusty, it wasn't really obvious to me what C(n/2, n) meant, a condition likely shared by many readers. GregorB (talk) 18:51, 2 July 2009 (UTC)Reply

Reasons for removing the formula edit

My point for removing the formula is that it's really irrelevant. What needs to be explained, is that the number of indexes grows very quickly with the number of columns, and this can be explained better in words than with the formula given in the article.

Nobody is going to whip out a calculator and wonder "How many indexes will I need to create in order to satisfy all possible queries to this table?". In practice, the number of different possible queries that make sense is always lower than the theoretical number; for instance, rarely will anyone specify *both* a person's name and email address in a query, because either of these already identifies a person almost uniquely. Even if the user specifies both, an index on either the name or email column will satisfy the query quickly enough because the number of results is so small anyway.

Further, you often have date columns that are exclusively used for range queries -- so they always need to be last in any useful index. Not to mention full text indexes. So the formlua is pretty much useless anyway. -- intgr [talk] 13:47, 13 October 2009 (UTC)Reply

While none will create all possible indexes for 20 columns indeed, this formula is still of interest to many users, since they don't know how big the number is after it had "grown very fast". And yes, many will whip out a calculator and count, just to make sure they don't really want to create this many indexes. This is not a guide to the table indexing, this is an encyclopedia article and it is perfectly valid to cover some theoretical concepts here, even if they are not of immediate practical use. Abolen (talk) 16:56, 13 October 2009 (UTC)Reply


Copyedit edit

 Guild of Copy Editors
 This article was copy edited by lfstevens, a member of the Guild of Copy Editors, on March, 2011.

Enjoy.

Adding information/examples of how the bitmap is utilized, not just its structure. edit

So while the article describes the structure of the bitmap index with an example, as well as many enhancements/variations of it in other sections, it doesn't describe how they are utilized. The analogy would be performing a binary search with a tree structure, really exemplifies what the benefits are of the structure. In other words, there is no example that explains how a database engine might utilize a bitmap index to perform a join.

I tried following this linked reference #2, but the link is broken: http://www.dwoptimize.com/2007/06/101010-answer-to-life-universe-and.html

207.156.50.129 (talk) 23:09, 2 March 2012 (UTC)Reply

External links modified edit

Hello fellow Wikipedians,

I have just modified one external link on Bitmap index. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 09:30, 3 November 2016 (UTC)Reply

Removed content edit

This was added a number of years ago with some odd formatting choices by an editor whose only edit was to add this:

Bitmap based representation can also be used for representing a data structure which is labeled and directed attributed multigraph, used for queries in graph databases.Efficient graph management based on bitmap indices article shows how bitmap index representation can be used to manage large dataset (billions of data points) and answer queries related to graph efficiently.

I'm leaving it here in case there's a way to integrate this into the article better. —Locke Coletc 16:09, 9 May 2021 (UTC)Reply