Thesis Estrategias de partición en ASAP para Strings con Runs y alfabetos grandes
Loading...
Date
2024-06
Authors
Journal Title
Journal ISSN
Volume Title
Program
Ingeniería Civil Informática
Campus
Campus Casa Central Valparaíso
Abstract
En la actualidad, la compresión de datos y su recuperación son aspectos vitales, especialmente en el contexto de textos largos y repetitivos, que es el enfoque principal de este documento. Se propone la implementación y evaluación de distintas estrategias de particionamiento aplicadas a una variedad de combinaciones de estructuras de datos comprimidas sobre la estructura ASAP. Entre estas estrategias, destaca el rendimiento de la estrategia A4, la cual utiliza "dense partitioning" junto con el uso de las cabeceras de cada run del texto para la estructura que maneja el mapping 𝑚(α). Esta combinación logró resultados superiores al baseline actual, mejorando tanto el manejo del espacio como la velocidad de consulta para las estructuras comprimidas utilizadas. En particular, la estructura ASAP RLMN(INT) RLE demostró el mejor rendimiento.
Currently, data compression and retrieval are vital aspects, especially in the context of long and repetitive texts, which is the main focus of this document. We propose the implementation and evaluation of different partitioning strategies applied to a variety of combinations of compressed data structures based on the ASAP structure. Among these strategies, the performance of the A4 strategy stands out, which uses "dense partitioning" along with the use of headers for each run of the text for the structure that handles the mapping 𝑚(𝛼). This combination achieved results superior to the current baseline, improving both space management and query speed for the compressed structures used. In particular, the ASAP RLMN(INT) RLE structure demonstrated the best performance
Currently, data compression and retrieval are vital aspects, especially in the context of long and repetitive texts, which is the main focus of this document. We propose the implementation and evaluation of different partitioning strategies applied to a variety of combinations of compressed data structures based on the ASAP structure. Among these strategies, the performance of the A4 strategy stands out, which uses "dense partitioning" along with the use of headers for each run of the text for the structure that handles the mapping 𝑚(𝛼). This combination achieved results superior to the current baseline, improving both space management and query speed for the compressed structures used. In particular, the ASAP RLMN(INT) RLE structure demonstrated the best performance
Description
Keywords
Alphabet Partitioning, Estrategia de partición, Wavelet Tree, ASAP, Compresión de datos