dc.contributor.advisor | Sánchez-Montañés Isla, Manuel Antonio | |
dc.contributor.advisor | Martín, Alejandro | |
dc.contributor.author | Huertas García, Álvaro | |
dc.contributor.other | UAM. Departamento de Ingeniería Informática | es_ES |
dc.date.accessioned | 2021-04-30T10:16:09Z | |
dc.date.available | 2021-04-30T10:16:09Z | |
dc.date.issued | 2021-02 | |
dc.identifier.uri | http://hdl.handle.net/10486/695067 | en_US |
dc.description | Trabajo Fin de Máster en Bioinformática y Biología Computacional | es_ES |
dc.description.abstract | Information quality in social media is an increasingly important issue and misinformation problem has become even more critical in the current COVID-19 pandemic, leading people exposed
to false and potentially harmful claims and rumours. Civil society organizations, such as the
World Health Organization, have demanded a global call for action to promote access to health
information and mitigate harm from health misinformation. Consequently, this project pursues
countering the spread of COVID-19 infodemic and its potential health hazards.
In this work, we give an overall view of models and methods that have been employed in the
NLP field from its foundations to the latest state-of-the-art approaches. Focusing on deep learning methods, we propose applying multilingual Transformer models based on siamese networks,
also called bi-encoders, combined with ensemble and PCA dimensionality reduction techniques.
The goal is to counter COVID-19 misinformation by analyzing the semantic similarity between
a claim and tweets from a collection gathered from official fact-checkers verified by the International Fact-Checking Network of the Poynter Institute.
It is factual that the number of Internet users increases every year and the language spoken
determines access to information online. For this reason, we give a special effort in the application of multilingual models to tackle misinformation across the globe. Regarding semantic
similarity, we firstly evaluate these multilingual ensemble models and improve the result in the
STS-Benchmark compared to monolingual and single models. Secondly, we enhance the interpretability of the models’ performance through the SentEval toolkit. Lastly, we compare these
models’ performance against biomedical models in TREC-COVID task round 1 using the BM25
Okapi ranking method as the baseline. Moreover, we are interested in understanding the ins
and outs of misinformation. For that purpose, we extend interpretability using machine learning
and deep learning approaches for sentiment analysis and topic modelling. Finally, we developed
a dashboard to ease visualization of the results.
In our view, the results obtained in this project constitute an excellent initial step toward
incorporating multilingualism and will assist researchers and people in countering COVID-19
misinformation. | en_US |
dc.format.extent | 76 p. | es_ES |
dc.format.mimetype | application/pdf | en_US |
dc.language.iso | eng | en_US |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.subject.other | Natural Language Processing | en_US |
dc.subject.other | Machine Learning | en_US |
dc.subject.other | Deep Learning | en_US |
dc.title | Automatic information search for countering covid-19 misinformation through semantic similarity | en_US |
dc.type | masterThesis | en_US |
dc.subject.eciencia | Informática | es_ES |
dc.rights.cc | Reconocimiento – NoComercial – SinObraDerivada | es_ES |
dc.rights.accessRights | openAccess | en_US |
dc.facultadUAM | Escuela Politécnica Superior | |