Thesis REDES DE PALABRAS EN TEXTOS GENERADOS POR GRAMÁTICAS ESTOCÁSTICAS
Loading...
Date
2011
Authors
Journal Title
Journal ISSN
Volume Title
Program
Campus
Universidad Técnica Federico Santa María UTFSM. Campus San Joaquín
Abstract
Las redes de palabras son estructuras complejas formadas a partir de un texto en
donde los nodos son las palabras que lo componen y las aristas el tipo de relación que
presentan. Se han realizado importantes estudios sobre redes de palabras en textos
literarios, los que sugieren la presencia de un mecanismo universal de la gramática.
En este trabajo se utilizan redes cuyas aristas representan la posición de las palabras
en el texto. Para poder estudiar la influencia que ejercen las gramáticas sobre un texto
y la red de palabras subyacente se utilizan gramáticas estocásticas, las que permiten la
generación de textos aleatorios. Primero, se determina si estos textos cumplen con la
ley de Zipf, propiedad empírica observada en la mayoría de los textos reales incluso
tras un proceso aleatorio. Se realiza un análisis estructural de las redes de palabras
aleatorias con el fin de determinar si sus propiedades se ven influenciadas por las
gramáticas que las generan. Además se lleva a cabo un estudio comparativo entre
distintos tipos de gramáticas pertenecientes a la jerarquía de Chomsky con el fin de
determinar si alguna de las propiedades de las redes da cuenta sobre la complejidad
gramatical. Finalmente, se entrena la gramática del lenguaje Pascal con el fin de
generar código aleatorio y compararlo con las redes de palabras del código original
para determinar si existen propiedades que se conservan en la red tras el
entrenamiento.
Word networks are complex structures formed by a text where the nodes are the words that compose it and the edges represent the type of relationship between them. There have been major studies of word networks on literary texts, which suggest the presence of a universal grammar mechanism. In this work we use networks whose edges represent the position of words in the text. To study the influence of grammars on texts and the underlying word network we used stochastic grammars, which allow random text generation. First, determine if these texts satisfy Zipf's law, an empirically observed property in most real and random texts. We performed a structural analysis on random word networks in order to determine if their properties are influenced by the grammars that generate them. A comparative study was also conducted between different types of grammars belonging to the Chomsky hierarchy in order to determine whether any of the network properties give information on the grammatical complexity. Finally, the Pascal's grammar was trained to generate random code and compare it with the word networks of original code to determine whether there are properties that are preserved in the network after training.
Word networks are complex structures formed by a text where the nodes are the words that compose it and the edges represent the type of relationship between them. There have been major studies of word networks on literary texts, which suggest the presence of a universal grammar mechanism. In this work we use networks whose edges represent the position of words in the text. To study the influence of grammars on texts and the underlying word network we used stochastic grammars, which allow random text generation. First, determine if these texts satisfy Zipf's law, an empirically observed property in most real and random texts. We performed a structural analysis on random word networks in order to determine if their properties are influenced by the grammars that generate them. A comparative study was also conducted between different types of grammars belonging to the Chomsky hierarchy in order to determine whether any of the network properties give information on the grammatical complexity. Finally, the Pascal's grammar was trained to generate random code and compare it with the word networks of original code to determine whether there are properties that are preserved in the network after training.
Description
Digitalizado de su versión en papel
Keywords
GRAMATICAS ESTOCASTICAS, WEB SEMANTICA, REDES DE PALABRAS