Show simple item record

dc.contributor.advisorCreixell, Werner
dc.contributor.authorFREDES FRANCO, NICOLÁS IGNACIO
dc.coverage.spatialCasa Central Valparaísoes_CL
dc.date.accessioned2021-12-01T17:38:43Z
dc.date.available2021-12-01T17:38:43Z
dc.date.issued2021-08
dc.identifier.urihttps://hdl.handle.net/11673/52567
dc.description.abstractSince the development of massive sequencing methods, there is a vast gap between the available data of protein sequences and their corresponding experimentally annotated protein functions. Bioinformatics has traditionally approached this asymmetry mainly by using Blast-based algorithms. Recently, deep learning architectures have been developed to predicted protein GO annotations solely from its amino acid sequence or complemented with additional information such as Protein-Protein Interaction (PPI). The former exhibits a lower performance compared with the latter approach that uses extra information. However, features as PPI need to be determined using in vitro or in vivo procedures, limiting its applicability. Furthermore, the deep learning approaches have ignored the possibility of leveraging the GO hierar chical behavior using a hyperbolic neural network, a framework precisely adequate for this kind of data. This thesis proposes a novel Hyperbolic Deep Learning architecture call HyperGO, which predicts the protein GO terms from its amino acid sequence alone. An algorithm based on Alphafold preprocessing is used over the protein sequences to enrich the protein representation information. We hypothesize that this preprocessing can provide context information to the amino acid sequence data as the HyperGO input, keeping its applicability to completely unknown proteins. A transformer encoder calculates the global patterns of preprocessed protein representation. The transformer output is then reshaped and processed by a hyperbolic network that exploits the GO hierarchical nature to predict the protein functions, working in the Poincaré ball space. HyperGO performance is evaluated over a part of SwissProt 2019 using the CAFA scores (Fmax and S min) and AUPR. The results are compared with some traditional bioinformatics methods and DeepGOPlus, achieving better results in S min and AUPR scores for each sub ontology and Fmax for molecular function (MFO).es_CL
dc.format.extent32 Hes_CL
dc.subjectGENE ONTOLOGYes_CL
dc.subjectHYPERBOLIC SPACESes_CL
dc.subjectPROTEIN FUNCTION PREDICTIONes_CL
dc.titleGENE ONTOLOGY PREDICTION ON NON-EUCLIDEAN DOMAINes_CL
dc.typeTesis de Postgrado
dc.description.degreeINGENIERO CIVIL ELECTRÓNICOes_CL
dc.description.degreeMAGISTER EN CIENCIAS DE LA INGENIERIA ELECTRONICAes_CL
dc.contributor.departmentUniversidad Técnica Federico Santa María. Departamento de Electrónicaes_CL
dc.description.programDEPARTAMENTO DE ELECTRÓNICA. INGENIERÍA CIVIL ELECTRÓNICAes_CL
dc.description.programDEPARTAMENTO DE ELECTRÓNICA. MAGÍSTER EN CIENCIAS DE LA INGENIERÍA ELECTRÓNICA (MS)es_CL
dc.identifier.barcode187837839UTFSMes_CL


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record