Thesis Análisis y evaluación de herramientas ETL Open Source
Loading...
Date
2015-11
Authors
Journal Title
Journal ISSN
Volume Title
Program
DEPARTAMENTO DE INFORMÁTICA. INGENIERÍA CIVIL INFORMÁTICA
Campus
Campus Santiago San Joaquín
Abstract
Una parte importante de la implementación de un Data Warehouse reside en el proceso ETL (Extract, Transform and Load), el cual se preocupa, a grandes rasgos, de extraer los datos de una o múltiples fuentes, transformarlos en un formato común y por último cargar los datos ya procesados en el almacenamiento de destino. Es primordial que el ETL tenga un buen rendimiento para poder obtener los datos precisos, en un formato adecuado, en los momentos oportunos. En esta memoria se realiza un análisis y evaluación de un conjunto de herramientas ETL del tipo Open Source, a fin de ayudar a la mejor elección tanto por eficiencia, usabilidad y funcionalidad. Para lograr esto, se estudian distintas formas de medir la eficiencia del proceso y se modelan casos de prueba que se implementan en cada herramienta en estudio. De este modo se logra finalmente, una comparación de los resultados obtenidos en las pruebas, y determinar los puntos fuertes y débiles de cada herramienta.
An important piece of the implementation for a Data Warehouse is handled by ETL (Extract, Transform and Load) components, which are in charge of, roughly, extracting data from one or multiple sources, transforming them into a common format and lastly load the already processed data into the destination. It is crucial the ETL has good performance to obtain precise data, in an adequate format, at the right time. In this Thesis an analysis and evaluation of a group of open source ETL tools is performed to help make a better selection through efficiency, usability and functionality. To achieve this last goal, different ways of measuring the process efficiency will be studied and test cases will be modeled and then implemented in each tool. This way, finally, it will be possible to do a comparison of the results obtained in the tests and determine the strong and weak points of each tool.
An important piece of the implementation for a Data Warehouse is handled by ETL (Extract, Transform and Load) components, which are in charge of, roughly, extracting data from one or multiple sources, transforming them into a common format and lastly load the already processed data into the destination. It is crucial the ETL has good performance to obtain precise data, in an adequate format, at the right time. In this Thesis an analysis and evaluation of a group of open source ETL tools is performed to help make a better selection through efficiency, usability and functionality. To achieve this last goal, different ways of measuring the process efficiency will be studied and test cases will be modeled and then implemented in each tool. This way, finally, it will be possible to do a comparison of the results obtained in the tests and determine the strong and weak points of each tool.
Description
Keywords
Datos de almacenaje, Administración de base de datos, Diseño de base de datos