An experimental study of an opportunistic index

Paolo Ferragina, Giovanni Manzini

Risultato della ricerca: Capitolo in libro/report/atti di convegnoContributo a conferenzapeer review

Abstract

The size of electronic data is currently growing at a faster rate than computer memory and disk storage capacities. For this reason compression appears always as an attractive choice, if not mandatory. However space overhead is not the only resource to be optimized when managing large data collections; in fact data turn out to be useful only when properly indexed to support search operations that efficiently extract the user-requested information. Approaches to combine compression and indexing techniques are nowadays receiving more and more attention. A first step towards the design of a compressed full-text index achieving guaranteed performance in the worst case has been recently done in [10]. This index combines the compression algorithm proposed by Burrows and Wheeler [5] with the suffix array data structure [16]. The index is opportunistic in that it takes advantage of the compressibility of the input data by decreasing the space occupancy at no significant asymptotic slowdown in the query performance. In this paper we present an implementation of this index and perform an extensive set of experiments on various text collections. The experiments show that our index is compact (its space occupancy is close to the one achieved by the best known compressors), it is fast in counting the number of pattern occurrences, and the cost of their retrieval is reasonable when they are few (i.e., in case of a selective query). In addition, our experiments show that the FM-index is flexible in that it is possible to trade space occupancy for search time by choosing the amount of auxiliary information stored into it. Covriaht

Lingua originaleInglese
Titolo della pubblicazione ospiteProceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms
Pagine269-278
Numero di pagine10
Stato di pubblicazionePubblicato - 2001
Pubblicato esternamente
Evento2001 Operating Section Proceedings, American Gas Association - Dallas, TX, United States
Durata: 30 apr 20011 mag 2001

Serie di pubblicazioni

NomeProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???2001 Operating Section Proceedings, American Gas Association
Paese/TerritorioUnited States
CittàDallas, TX
Periodo30/04/011/05/01

Fingerprint

Entra nei temi di ricerca di 'An experimental study of an opportunistic index'. Insieme formano una fingerprint unica.

Cita questo