Thesis Estructura de datos comprimidas para búsquedas en textos muy repetitivos y alfabetos grandes
Loading...
Date
2022
Journal Title
Journal ISSN
Volume Title
Program
Ingeniería Civil Informática
Departament
Campus
Campus Santiago San Joaquín
Abstract
El manejo de textos o secuencias hoy en día es fundamental en diversas áreas. Estos pueden variar en sus características, ya sea en extensión, repetitividad, tamaño de su alfabeto, etc. En relación con lo anterior, se propone una estructura de datos que permita trabajar con textos que posean alfabetos grandes y sean repetitivos de manera eficiente. La estructura propuesta utiliza técnicas ya conocidas como la BWT y ASAP, esta última es intervenida para que sea eficiente trabajando secuencias de runs. La estructura es comparada con otras representaciones ya conocidas que permiten trabajar textos con los atributos propuestos. Los resultados de los experimentos realizados indican que la solución logra disminuir los tiempos de búsqueda a cambio de utilizar un leve cantidad extra de memoria.
The management of texts or sequences today is essential in various areas. These can vary in their characteristics, whether in length, repeatability, size of their alphabet, etc. In relation to the above, a data structure is proposed that allows working with texts that have large alphabets and are repetitive efficiently. The proposed structure uses already known techniques such as BWT and ASAP, the latter is intervened to be efficient working sequences of runs. The structure is compared with other already known representations that allow texts to be worked with the proposed attributes. The results of the experiments carried out indicate that the solution manages to reduce search times in exchange for using a slight extra amount of memory.
The management of texts or sequences today is essential in various areas. These can vary in their characteristics, whether in length, repeatability, size of their alphabet, etc. In relation to the above, a data structure is proposed that allows working with texts that have large alphabets and are repetitive efficiently. The proposed structure uses already known techniques such as BWT and ASAP, the latter is intervened to be efficient working sequences of runs. The structure is compared with other already known representations that allow texts to be worked with the proposed attributes. The results of the experiments carried out indicate that the solution manages to reduce search times in exchange for using a slight extra amount of memory.
Description
Keywords
Sistemas de almacenamiento, Estructura de datos, Programación estructurada, Algoritmos computacionales
