Desarrollo de un algoritmo Ant Colony Optimization para tareas de clustering en Apache Spark

Ortiz Martín, Alejandro

UAM_Biblioteca

dc.contributor.advisor	González-Pardo, Antonio
dc.contributor.author	Ortiz Martín, Alejandro
dc.contributor.other	UAM. Departamento de Ingeniería Informática	es_ES
dc.date.accessioned	2016-09-28T07:53:08Z
dc.date.available	2016-09-28T07:53:08Z
dc.date.issued	2016-02
dc.identifier.uri	http://hdl.handle.net/10486/673609
dc.description	Master en Ingeniería Informática	es_ES
dc.description.abstract	En los últimos años se ha producido un incremento en la cantidad de datos generada por las redes sociales, logs de software, dispositivos móviles y sensores, entre otros. Dicha cantidad de datos es de tal magnitud que se requieren de nuevos paradigmas de computación para el correcto análisis de la información contenida en ellos. En este entorno ha surgido el área de Big Data que se usa para hacer referencia a los desafíos y ventajas derivadas de la recolección y procesado de grandes cantidades de datos [1]. De una manera más formal, el Big Data se define como la cantidad de datos que exceden las capacidades de cómputo de un determinado sistema en términos de consumo de memoria y/o tiempo[2]. La computación distribuida permite contar con múltiples ordenadores interconectados entre sí formando clusters, consiguiendo una capacidad conjunta mayor que con un único ordenador más potente. En la actualidad existen varios frameworks para el análisis de Big Data que han atraído el interés tanto de la comunidad científica como de la industria. El primer framework es Apache Hadoop [3], desarrollado por Google y que se basa en el enfoque de MapReduce[4]. Sin embargo, el nuevo framework Apache Spark[5], desarrollado por la universidad de Berkeley, se está haciendo bastante popular. Hacer un buen uso de estos frameworks requiere adaptar los algoritmos que se quieran usar a las características del sistema sobre el cual se vayan a desplegar, encontrando puntos de paralelización óptimos que aprovechen las fortalezas de dichos frameworks. La correcta adaptación de los algoritmos a la plataforma de Big Data es un aspecto crucial ya que repercutirá en el rendimiento de dicho algoritmo. Este Trabajo de Fin de Máster se centrará en el estudio y desarrollo algoritmo de clusterización de Ant Colony Optimization (ACOC)[6, 7] sobre la plataforma Apache Spark. Para la correcta validación del sistema desarrollado, se realizarán tareas de clustering sobre varios conjuntos de pruebas sencillos. Una vez que el sistema esté validado y si se dispone de tiempo suficiente, se estudiará el rendimiento del sistema ante un problema de Big Data como pueden ser las tareas de clustering sobre datos de redes sociales como Twitter.	es_ES
dc.description.abstract	In recent years there has been an increase in the amount of data generated by social networks, software logs, mobile devices and sensors, among others. This amount of data is such that require new computing paradigms for proper analysis of the information contained therein. In this environment it has emerged Big Data. This term is used to refer to the challenges and bene ts of collecting and processing large amounts of data[1]. In a more formal way, Big Data is de ned as the amount of data that exceed the computing capabilities of a given system in terms of memory consumption and/or time[2]. Distributed computing allows for multiple interconnected computers together to form clusters, achieving a combined capacity greater than a single more powerful computer. At present there are several frameworks for analysis of Big Data that have attracted the interest of both the scienti c community and industry. The rst one is Apache Hadoop [3], developed by Google and based on the MapReduce approach[4]. However, the new framework Apache Spark[5], developed by the University of Berkeley, is becoming quite popular. Making good use of these frameworks requires adapting the algorithms that want to use the features of the system on which they will be deployed, nding optimal parallelization points that leverage the strengths of these frameworks. The correct implementation of algorithms for Big Data platform is a crucial aspect as it will a ect the performance of the algorithm. This Final Master Thesis will focus on the study and development of clustering algorithm Ant Colony Optimization (ACO)[6, 7] on the Spark Apache platform. Multiple tests by clustering simple tasks will be performed for proper validation of the developed system. Once the system is validated and if time permits, system performance will be studied with Big Data problems, such as data clustering on social networks data like Twitter.	en_US
dc.format.extent	58 pág.	es_ES
dc.format.mimetype	application/pdf	en_US
dc.language.iso	spa	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Desarrollo de un algoritmo Ant Colony Optimization para tareas de clustering en Apache Spark	es_ES
dc.type	masterThesis	en_US
dc.subject.eciencia	Informática	es_ES
dc.rights.cc	Reconocimiento – NoComercial – SinObraDerivada	es_ES
dc.rights.accessRights	openAccess	en_US
dc.facultadUAM	Escuela Politécnica Superior

Files in this item

Name:: Ortiz_Martin_Alejandro_tfm.pdf
Size:: 907.4Kb
Format:: PDF

This item appears in the following Collection(s)

Trabajos de estudiantes (tesis doctorales, TFMs, TFGs, etc.) [19966]

Show simple item record

Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/

UAM_Biblioteca

Desarrollo de un algoritmo Ant Colony Optimization para tareas de clustering en Apache Spark

Files in this item

This item appears in the following Collection(s)

Related items

Diseño de un algoritmo de arrecifes de corales para la resolución de pantallas del videojuego Lemmings ﻿

A method for approximating optimal statistical significances with machine-learned likelihoods ﻿

Home and Ambulatory Artificial Nutrition (NADYA) group report. Home parenteral nutrition in Spain, 2018 ﻿

Diseño de un algoritmo de arrecifes de corales para la resolución de pantallas del videojuego Lemmings

A method for approximating optimal statistical significances with machine-learned likelihoods

Home and Ambulatory Artificial Nutrition (NADYA) group report. Home parenteral nutrition in Spain, 2018