Application of Multimodal Machine Learning to Visual QuestionAnswering
AdvisorFiérrez Aguilar, Julián
EntityUAM. Departamento de Tecnología Electrónica y de las Comunicaciones
SubjectsVisual question answering; Computer vision; Natural language processing; Telecomunicaciones
NoteMaster’s Degree in ICT Research and Innovation (i2-ICT)
Esta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.
Due to the great advances in Natural Language Processing and Computer Vision in recent yearswith neural networks and attention mechanisms, a great interest in VQA has been awakened,starting to be considered as the ”Visual Turing Test” for modern AI systems, since it is aboutanswering a question from an image, where the system has to learn to understand and reasonabout the image and question shown. One of the main reasons for this great interest is thelarge number of potential applications that these systems allow, such as medical applicationsfor diagnosis through an image, assistants for blind people, e-learning applications, etc.In this Master’s thesis, a study of the state of the art of VQA is proposed, investigatingboth techniques and existing datasets. Finally, a development is carried out in order to try toreproduce the results of the art with the latest VQA models with the aim of being able to applythem and experiment on new datasets.Therefore, in this work, experiments are carried out with a first VQA model, MoViE+MCAN  (winner of the 2020 VQA Challenge), which after observing its non-viability due toresource issues, we switched to the LXMERT Model , which consists of a pre-trained modelin 5 subtasks, which allows us to perform fine-tunnig on several tasks, which in this specificcase is the VQA task on the VQA v2.0  dataset.As the main result of this Thesis we experimentally show that LXMERT provides similarresults to MoViE-MCAN (the best known method for VQA) in the most recent and demandingbenchmarks with less resources starting from the pre-trained model provided by the GitHubrepository .
Google Scholar:Galvé Mateo, Carlos
This item appears in the following Collection(s)
Showing items related by title, author, creator and subject.
Sociodemographic indicators of health status using a machine learning approach and data from the english longitudinal study of aging (ELSA) Engchuan, Worrawat; Dimopoulos, Alexandros C.; Tyrovolas, Stefanos; Caballero, Francisco Félix; Sanchez-Niubo, Albert; Arndt, Holger; Ayuso-Mateos, Jose Luis; Haro, Josep Maria; Chatterji, Somnath; Panagiotakos, Demosthenes B.
Use of machine-learning and load–velocity profiling to estimate 1-Repetition maximums for two variations of the bench-press exercise Balsalobre-Fernández, Carlos; Kipp, Kristof