Analysis of flow cytometry data with domain-adversarial autoencoders
Title (trans.)
Análisis de datos de citometría de flujo mediante el uso de domain-adversarial autoencodersAuthor
Dorado Alfaro, SaraEntity
UAM. Departamento de Ingeniería InformáticaDate
2020-09Subjects
Flow cytometry; Batch effects; Autoencoders; Unsupervised learning; Domain adaptation; Clustering; Dimensionality reduction; InformáticaNote
Trabajo fin de máster en Bioinformática y Biología ComputacionalEsta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.
Abstract
Machine Learning is a field of Artificial Intelligence focused on automatic data analysis. In
the era of big data, there appear algorithms that allow the analysis of large quantities of data
efficiently, incorporating more knowledge into our studies. One of the main fields of application
for these algorithms is bioinformatics, where large amounts of high-dimensional data are typically
analyzed. However, one of the main difficulties in the automatic analysis of data with a
biological origin is the inevitable variation that occurs in the experimental conditions, causing
the well-known batch effects. This makes it difficult to integrate data that come from different
experimental sources, thus reducing the simultaneous capacity for analysis and losing relevant
biological information.
Focused on flow cytometry data, in this work we propose a new algorithm in the context of
unsupervised learning with the aim of smoothing the influence of batch effects simultaneously
under an arbitrary number of experimental conditions. Applying state-of-the-art techniques in
Machine Learning, such as domain adaptation and adversarial learning, we present the domainadversarial
autoencoder (DAE). For the validation of the DAE as a domain adaptation or batch
normalization algorithm, in this work we carry out experiments with three data sets. The first
two are simple, artificial datasets composed of beads that have been passed through the cytometer
in a controlled environment. In one of them, the clogging or misalignment of the cytometer is
artificially simulated. In the other, we have the same data analyzed on two different machines.
The third example is a real dataset with dendritic cells of mice that have also been collected on
two different cytometers.
Firstly, we show how these batch effects influence the analysis typically applied by flow
cytometry users, such as clustering with Phenograph or visualization with t-SNE. Secondly,
we see how the DAE manages to efficiently alleviate the batch effects in these examples and
improve the clustering results, achieving a notable increase in the F1-score after the correction.
In addition, we provide with a visual evaluation of the representations in two-dimensional spaces
learnt with a standard autoencoder (SAE), t-SNE and a DAE.
Additionally, in this work we present a novel method to evaluate the quality of the batch
normalization of data using statistical distances. In particular, we use the multidimensional version
of the Kolmogorov-Smirnov distance between distributions. We show that the distribution
of the data in the latent representation of the DAE is very similar when the data comes from
different experiments, presenting a smaller distance than in the case of the SAE, where we do
not provide the algorithm with domain information in the training step.
Therefore, this work allows us to conclude that domain adaptation in flow cytometry data
opens a new line of research, which is focused in developing tools for the integration of data
from different experiments
Files in this item
Google Scholar:Dorado Alfaro, Sara
This item appears in the following Collection(s)
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/
Related items
Showing items related by title, author, creator and subject.
-
Análisis e implementación de diferentes medidas de similitud para un algoritmo global de selección de variables
Dorado Alfaro, Sara
2017-05 -
Low-Rank Approximation and Difusion Maps
Dorado Alfaro, Sara
2018-09