Thesis DISTRIBUTED MACHINE LEARNING WITH CONTEXT AWARENESS
Loading...
Date
2014
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Universidad Técnica Federico Santa María
Abstract
In this thesis a Distributed Machine Learning framework to model dis- tributed data with dierent contexts in the task of regression is presented. Dierent context is dened as the change of the underlying laws of probabil- ity in the distributed sources. Most state of the art methods do not take into account the dierent context and assume that the data comes from the same statistical distribution. We propose an aggregation scheme for models that are in the same neighborhood in terms of similarity by means of clustering algorithms, feedfoward neural networks, stacked generalization models and ensemble approaches. Two proposals are presented. The rst one relies on the theoretical statistical distribution that dierent data sets could have, and with an Hy- pothesis Test based on Divergence Measures is able to create neighborhoods of similar distributed sources. The second one, does not rely on a statis- tical distribution beforehand, and creates neighborhoods using well-known distance metrics and clustering algorithms over a discrete representation of the underlying law of probability. Both of the proposals keep in mind the most important restrictions of Distributed Learning problems, by not sharing \raw'''' data between dis- tributed sites, and not having to upload the data to a central site. Experiments with 5 synthetic and 7 real data sets were conducted in order to validate the proposals. The proposed algorithms outperform in most cases other models that follow a traditional approach.'
Description
Catalogado desde la versión PDF de la tesis
Keywords
Citation
Campus
Casa Central, Valparaíso