|
|
| Alessandro Colantonio |
|
|
|
 |
|
| CONCISE: COmpressed 'N' Composable Integer SEt
|
|
|
Description
Bit arrays, or bitmaps, are used to significantly speed up set operations
in several areas, such as data warehousing, information retrieval, and data
mining, to cite a few. However, bitmaps usually use a large storage space, thus
requiring compression. Here we offer a Java implementation of
CONCISE (COmpressed 'N'
Composable Integer
SEt), a bitmap compression algorithm that,
when compared to classical compression schemes, trades some space to allow for fast bitwise operations without
first decompressing bitmaps. CONCISE can be efficiently used to manipulate sets of integral
numbers in lieu of well-known data structures such as arrays, lists, hashtables, and self-balancing binary search
trees. Click
here for more details about
CONCISE.
Java Implementation
The most up-to-date Java code is available at the
SourceForge website.
Among the classes contained within the package, the most important are:
- IntSet:
an interface that specifies several methods for the manipulation of “int” sets,
such as union, intersection, difference, complement, etc..
- ConciseSet: an implementation of the
IntSet interface internally represented by a
CONCISE-compressed bitmap.
- FastSet: an implementation of the
IntSet interface internally represented by an
uncompressed bitmap. In general, when the integer set is sparse (i.e., the ratio between the set cardinality and the maximum integer is below 0.1),
ConciseSet is always the best choice.
For dense integer sets, FastSet can offer better performances.
- ExtendedSet:
an interface that extends java.util.SortedSet<Integer> by specifying several methods for the manipulation of “Integer” sets,
such as union, intersection, difference, complement, power-set, Jaccard
distance, etc..
- IntegerSet:
a class that implements the ExtendedSet interface and
backed by any class implementing the IntSet interface (e.g.,
ConciseSet,
FastSet)
A description for the remaining classes can be found
here.
Documentation
- A document that describes the details of the CONCISE algorithm can be found
here.
It also offers a comparative analysis among standard java.util collections.
- A JavaDoc-based description of the package can be found
here.
To be sure that class descriptions are up-to-date, it is always better to download the source code from
the
SourceForge website and generate a fresh JavaDoc
with the
javadoc tool provided by JDK.
Support
If you have any question/bug report/suggestion, please send me an email at:
Other Info
To verify the performance of current CONCISE implementation, I adopted a very powerful profiler called YourKit.
YourKit is kindly supporting open source projects with its full-featured Java Profiler.
YourKit, LLC is the creator of innovative and intelligent tools for profiling
Java and .NET applications. Take a look at YourKit's leading software products:
YourKit Java Profiler and
YourKit .NET Profiler.
|
|
Visitors (since March 2009):
|
|
|
Last update:
2011-05-18
|