

Alessandro Colantonio 




CONCISE: COmpressed 'N' Composable Integer SEt


Description
Bit arrays, or bitmaps, are used to significantly speed up set operations
in several areas, such as data warehousing, information retrieval, and data
mining, to cite a few. However, bitmaps usually use a large storage space, thus
requiring compression. Here we offer a Java implementation of
CONCISE (COmpressed 'N'
Composable Integer
SEt), a bitmap compression algorithm that,
when compared to classical compression schemes, trades some space to allow for fast bitwise operations without
first decompressing bitmaps. CONCISE can be efficiently used to manipulate sets of integral
numbers in lieu of wellknown data structures such as arrays, lists, hashtables, and selfbalancing binary search
trees. Click here for more details about
CONCISE.
Java Implementation
The most uptodate Java code is available at the SourceForge website.
Among the classes contained within the package, the most important are:
 IntSet:
an interface that specifies several methods for the manipulation of “int” sets,
such as union, intersection, difference, complement, etc..
 ConciseSet: an implementation of the
IntSet interface internally represented by a
CONCISEcompressed bitmap.
 FastSet: an implementation of the
IntSet interface internally represented by an
uncompressed bitmap. In general, when the integer set is sparse (i.e., the ratio between the set cardinality and the maximum integer is below 0.1),
ConciseSet is always the best choice.
For dense integer sets, FastSet can offer better performances.
 ExtendedSet:
an interface that extends java.util.SortedSet<Integer> by specifying several methods for the manipulation of “Integer” sets,
such as union, intersection, difference, complement, powerset, Jaccard
distance, etc..
 IntegerSet:
a class that implements the ExtendedSet interface and
backed by any class implementing the IntSet interface (e.g.,
ConciseSet,
FastSet)
A description for the remaining classes can be found here.
Documentation
 A document that describes the details of the CONCISE algorithm can be found here.
It also offers a comparative analysis among standard java.util collections.
 A JavaDocbased description of the package can be found here.
To be sure that class descriptions are uptodate, it is always better to download the source code from
the SourceForge website and generate a fresh JavaDoc
with the
javadoc tool provided by JDK.
Support
If you have any question/bug report/suggestion, please send me an email at:
Other Info
To verify the performance of current CONCISE implementation, I adopted a very powerful profiler called YourKit.
YourKit is kindly supporting open source projects with its fullfeatured Java Profiler.
YourKit, LLC is the creator of innovative and intelligent tools for profiling
Java and .NET applications. Take a look at YourKit's leading software products:
YourKit Java Profiler and
YourKit .NET Profiler.

Visitors (since March 2009):



Last update:
20110518
