Automatic information search for countering covid-19 misinformation through semantic similarity

Huertas García, Álvaro

UAM_Biblioteca

dc.contributor.advisor	Sánchez-Montañés Isla, Manuel Antonio
dc.contributor.advisor	Martín, Alejandro
dc.contributor.author	Huertas García, Álvaro
dc.contributor.other	UAM. Departamento de Ingeniería Informática	es_ES
dc.date.accessioned	2021-04-30T10:16:09Z
dc.date.available	2021-04-30T10:16:09Z
dc.date.issued	2021-02
dc.identifier.uri	http://hdl.handle.net/10486/695067	en_US
dc.description	Trabajo Fin de Máster en Bioinformática y Biología Computacional	es_ES
dc.description.abstract	Information quality in social media is an increasingly important issue and misinformation problem has become even more critical in the current COVID-19 pandemic, leading people exposed to false and potentially harmful claims and rumours. Civil society organizations, such as the World Health Organization, have demanded a global call for action to promote access to health information and mitigate harm from health misinformation. Consequently, this project pursues countering the spread of COVID-19 infodemic and its potential health hazards. In this work, we give an overall view of models and methods that have been employed in the NLP field from its foundations to the latest state-of-the-art approaches. Focusing on deep learning methods, we propose applying multilingual Transformer models based on siamese networks, also called bi-encoders, combined with ensemble and PCA dimensionality reduction techniques. The goal is to counter COVID-19 misinformation by analyzing the semantic similarity between a claim and tweets from a collection gathered from official fact-checkers verified by the International Fact-Checking Network of the Poynter Institute. It is factual that the number of Internet users increases every year and the language spoken determines access to information online. For this reason, we give a special effort in the application of multilingual models to tackle misinformation across the globe. Regarding semantic similarity, we firstly evaluate these multilingual ensemble models and improve the result in the STS-Benchmark compared to monolingual and single models. Secondly, we enhance the interpretability of the models’ performance through the SentEval toolkit. Lastly, we compare these models’ performance against biomedical models in TREC-COVID task round 1 using the BM25 Okapi ranking method as the baseline. Moreover, we are interested in understanding the ins and outs of misinformation. For that purpose, we extend interpretability using machine learning and deep learning approaches for sentiment analysis and topic modelling. Finally, we developed a dashboard to ease visualization of the results. In our view, the results obtained in this project constitute an excellent initial step toward incorporating multilingualism and will assist researchers and people in countering COVID-19 misinformation.	en_US
dc.format.extent	76 p.	es_ES
dc.format.mimetype	application/pdf	en_US
dc.language.iso	eng	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.other	Natural Language Processing	en_US
dc.subject.other	Machine Learning	en_US
dc.subject.other	Deep Learning	en_US
dc.title	Automatic information search for countering covid-19 misinformation through semantic similarity	en_US
dc.type	masterThesis	en_US
dc.subject.eciencia	Informática	es_ES
dc.rights.cc	Reconocimiento – NoComercial – SinObraDerivada	es_ES
dc.rights.accessRights	openAccess	en_US
dc.facultadUAM	Escuela Politécnica Superior

Files in this item

Name:: huertas_garcia_alvaro_tfm.pdf
Size:: 10.54Mb
Format:: PDF

This item appears in the following Collection(s)

Trabajos de estudiantes (tesis doctorales, TFMs, TFGs, etc.) [19985]

Show simple item record

Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/

UAM_Biblioteca

Automatic information search for countering covid-19 misinformation through semantic similarity

Files in this item

This item appears in the following Collection(s)

Related items

Analyzing two automatic Latent Semantic Analysis (LSA) assessment methods (Inbuilt Rubric vs. Golden Summary) in summaries extracted from expository texts ﻿

Lung transplantation from uncontrolled and controlled donation after circulatory death: similar outcomes to brain death donors ﻿

Automatic personality assessment through movement analysis ﻿

Analyzing two automatic Latent Semantic Analysis (LSA) assessment methods (Inbuilt Rubric vs. Golden Summary) in summaries extracted from expository texts

Lung transplantation from uncontrolled and controlled donation after circulatory death: similar outcomes to brain death donors

Automatic personality assessment through movement analysis