Thesis Comparative analysis of deep learning models versus machine learning algorithms for spam email detection
dc.contributor.correferente | Ñanculef Alegría, Juan Ricardo | |
dc.contributor.department | Departamento de Informática | |
dc.contributor.guia | Valle Vidal, Carlos Antonio | |
dc.coverage.spatial | Campus Casa Central Valparaíso | |
dc.creator | Vega Rivera, Paulina Valeria | |
dc.date.accessioned | 2025-08-04T14:15:14Z | |
dc.date.available | 2025-08-04T14:15:14Z | |
dc.date.issued | 2025-07 | |
dc.description.abstract | Spam email detection is a critical task in cybersecurity, complicated by constantly evolving spam tactics that are designed to trick the filtering systems. This study compares five machine learning models and nine deep learning models, evaluated using eight performance metrics on a combined dataset of 5,500 email. Statistical tests were applied to the top-performing models to asses significance. Results show that RoBERTa consistently achieves the highest F1 score among all deep learning models, while the fine-tuned GPT models, considered a special case due to being trained on significantly smaller datasets, still perform competitively. Among machine traditional learning models, SVM, NB and RF achieved the highest score, however, they still performed worse than the five Transformer-based models. Overall, the study's goal is to provide a comprehensive benchmark of traditional and modern approaches to spam detection under practical constraints. | en |
dc.description.abstract | La detección de correos spam es una tarea crítica en la ciberseguridad, complicada por la constante evolución de técnicas diseñadas para evadir los filtros. Este estudio compara cinco modelos de aprendizaje automático y nueve de aprendizaje profundo, evaluados con ocho métricas sobre un conjunto de 5,500 correos. Se aplicaron pruebas estadísticas a los mejores modelos para evaluar su significancia. RoBERTa obtuvo consistentemente el F1 score más alto, mientras que los modelos ajustados de GPT, entrenados con menos datos, también lograron resultados competitivos. Entre los modelos tradicionales, SVM, NB y RF fueron los más efectivos, aunque fueron superados por los cinco modelos basados en Transformers. El objetivo general es ofrecer un benchmark amplio entre enfoques tradicionales y modernos de detección de spam bajo condiciones prácticas. | es |
dc.description.program | Ingeniería Civil Informática | |
dc.format.extent | 63 páginas | |
dc.identifier.barcode | 3560900288139 | |
dc.identifier.uri | https://repositorio.usm.cl/handle/123456789/75865 | |
dc.language.iso | en | |
dc.publisher | Universidad Técnica Federico Santa María | |
dc.rights | Attribution 4.0 International | en |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
dc.subject | Deep learning | |
dc.subject | Machine learning | |
dc.subject | Spam detection | |
dc.subject | Cybersecurity | |
dc.subject | Aprendizaje profundo | |
dc.subject | Aprendizaje automático | |
dc.subject | Ciberseguridad | |
dc.subject | Detección de correos no deseados | |
dc.subject.ods | 4 Educación de calidad | |
dc.subject.ods | 9 Industria, innovación e infraestructura | |
dc.subject.ods | 16 Paz, justicia e instituciones sólidas | |
dc.title | Comparative analysis of deep learning models versus machine learning algorithms for spam email detection | |
dspace.entity.type | Tesis |