Thesis NORMALIZACIÓN DE DIRECCIONES PARA GEORREFERENCIACIÓN UTILIZANDO NLP Y MACHINE LEARNING
Loading...
Date
2018-08
Journal Title
Journal ISSN
Volume Title
Program
Campus
Campus San Joaquín, Santiago
Abstract
El presente documento propone una solución al problema de la empresa chilena SimpliRoute.Esta empresa ofrece una plataforma de optimización y creación de rutas de despachos.Uno de los pasos importantes en la plataforma es la carga de las direcciones con las que seconstruyen las rutas. Las coordenadas geográficas de cada dirección se utilizan para estimardistancias y tiempos de entrega en la generación de rutas. Cualquier error al interpretar laposición de una dirección puede afectar el resultado de las rutas, situación que es fácil queocurra debido a la ambigüedad con que las direcciones son escritas e interpretadas. Factorescomo: el uso de abreviaturas, diferentes formatos, uso de sinónimos, falta de elementosimportantes, dificultan el proceso de encontrar sus coordenadas geográficas.Para solucionar esta problemática se propone un sistema de procesamiento de direccionesprevio al cálculo de las coordenadas geográficas. Este sistema se encargará de limpiar,reconocer elementos relevantes y estandarizar cada dirección con el objetivo de disminuir lacantidad de direcciones cuyas posiciones no pueden ser encontradas a causa de los erroresantes mencionados.El sistema consiste de 3 etapas, una etapa de limpieza basada en un sistema de reglas. Unasegunda etapa enfocada en la detección de elementos dentro de una dirección mediante el usode aprendizaje automático supervisado sobre un modelo linear-chain Conditional RandomFields. La última etapa se encarga de estandarizar los elementos encontrados al apoyarse enplantillas de direcciones postales.El sistema propuesto es diseñado, construido y evaluado para luego comparar con la soluciónactual de la empresa SimpliRoute.
This document proposes a solution to the problem of the Chilean company SimpliRoute.This company offers a platform for the creation and optimization of dispatch routes. Oneof the important steps in the platform is the loading of the addresses with which the routesare built. The geographical coordinates of each address are used to estimate distances anddelivery times in the generation of routes. Any error in interpreting the position of an addresscan affect the result of the routes. The situation is common due to the ambiguity with whichthe addresses are written and interpreted. Factors such as: the use of abbreviations, differentformats, use of synonyms, lack of important elements, hinder the process of finding theirgeographic coordinates.To solve this problem, an address processing system is proposed prior to the calculationof geographic coordinates. This system will be in charge of cleaning, recognizing relevantelements and standardizing each address in order to reduce the number of addresses whosepositions can not be found due to the aforementioned errors.The system consists of 3 stages, a cleaning stage based on a system of rules. A second stagefocused on the detection of elements within an address through the use of supervised machinelearning on a linear-chain Conditional Random Fields model. The last step is in charge ofstandardizing the elements found, based on the use of postal address templates.The proposed system is designed, built and evaluated to be finally compared with the company’scurrent solution.
This document proposes a solution to the problem of the Chilean company SimpliRoute.This company offers a platform for the creation and optimization of dispatch routes. Oneof the important steps in the platform is the loading of the addresses with which the routesare built. The geographical coordinates of each address are used to estimate distances anddelivery times in the generation of routes. Any error in interpreting the position of an addresscan affect the result of the routes. The situation is common due to the ambiguity with whichthe addresses are written and interpreted. Factors such as: the use of abbreviations, differentformats, use of synonyms, lack of important elements, hinder the process of finding theirgeographic coordinates.To solve this problem, an address processing system is proposed prior to the calculationof geographic coordinates. This system will be in charge of cleaning, recognizing relevantelements and standardizing each address in order to reduce the number of addresses whosepositions can not be found due to the aforementioned errors.The system consists of 3 stages, a cleaning stage based on a system of rules. A second stagefocused on the detection of elements within an address through the use of supervised machinelearning on a linear-chain Conditional Random Fields model. The last step is in charge ofstandardizing the elements found, based on the use of postal address templates.The proposed system is designed, built and evaluated to be finally compared with the company’scurrent solution.
Description
Catalogado desde la version PDF de la tesis.
Keywords
CRF (CONDITIONAL RANDOM FIELDS), GEOREFERENCIACION, MACHINE LEARNING