GENE ONTOLOGY PREDICTION ON NON-EUCLIDEAN DOMAIN

FREDES FRANCO, NICOLÁS IGNACIO

Publication:
GENE ONTOLOGY PREDICTION ON NON-EUCLIDEAN DOMAIN

Files

m19487799-4.pdf(2.19 MB)

Date

2021-08

Authors

FREDES FRANCO, NICOLÁS IGNACIO

Abstract

Since the development of massive sequencing methods, there is a vast gap between the available data of protein sequences and their corresponding experimentally annotated protein functions. Bioinformatics has traditionally approached this asymmetry mainly by using Blast-based algorithms. Recently, deep learning architectures have been developed to predicted protein GO annotations solely from its amino acid sequence or complemented with additional information such as Protein-Protein Interaction (PPI). The former exhibits a lower performance compared with the latter approach that uses extra information. However, features as PPI need to be determined using in vitro or in vivo procedures, limiting its applicability. Furthermore, the deep learning approaches have ignored the possibility of leveraging the GO hierar chical behavior using a hyperbolic neural network, a framework precisely adequate for this kind of data. This thesis proposes a novel Hyperbolic Deep Learning architecture call HyperGO, which predicts the protein GO terms from its amino acid sequence alone. An algorithm based on Alphafold preprocessing is used over the protein sequences to enrich the protein representation information. We hypothesize that this preprocessing can provide context information to the amino acid sequence data as the HyperGO input, keeping its applicability to completely unknown proteins. A transformer encoder calculates the global patterns of preprocessed protein representation. The transformer output is then reshaped and processed by a hyperbolic network that exploits the GO hierarchical nature to predict the protein functions, working in the Poincaré ball space. HyperGO performance is evaluated over a part of SwissProt 2019 using the CAFA scores (Fmax and S min) and AUPR. The results are compared with some traditional bioinformatics methods and DeepGOPlus, achieving better results in S min and AUPR scores for each sub ontology and Fmax for molecular function (MFO).

Keywords

GENE ONTOLOGY , HYPERBOLIC SPACES , PROTEIN FUNCTION PREDICTION

URI

https://hdl.handle.net/11673/52567

Collections

Arq_paso

Full item page

Publication:
GENE ONTOLOGY PREDICTION ON NON-EUCLIDEAN DOMAIN

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Publication: GENE ONTOLOGY PREDICTION ON NON-EUCLIDEAN DOMAIN

Options

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Publication:
GENE ONTOLOGY PREDICTION ON NON-EUCLIDEAN DOMAIN