Factor analysis of Internet traffic destinations from similar source networks
Entity
UAM. Departamento de Tecnología Electrónica y de las ComunicacionesPublisher
Emerald Group Publishing LimitedDate
2012Citation
10.1108/10662241211199951
Internet Research 22.1 (2012): 29 - 56
ISSN
1066-2243DOI
10.1108/10662241211199951Funded by
This work has been partially funded by the Spanish Ministry of Education and Science under project ANFORA (TEC2009-13385), European Union CELTIC initiative program under project TRAMMS, European Union project OneLab, and the F.P.U. and F.P.I. Research Fellowship programs of Spain. The authors would also like to thank the anonymous reviewers who helped us to improve the quality of the paper.Project
info:eu-repo/grantAgreement/EC/FP7/224263Editor's Version
http://dx.doi.org/10.1108/10662241211199951Subjects
Content distribution networks; Factor analysis; Geographical characterization; Geography; Heavy-hitters; Internet remote host location; Zipf-Mandelbrot; TelecomunicacionesNote
This article is (©) Emerald Group Publishing and permission has been granted for this version to appear here (http://www.emeraldinsight.com/doi/full/10.1108/10662241211199951). Emerald does not grant permission for this article to be further copied/distributed or hosted elsewhere without the express permission from Emerald Group Publishing Limited.Rights
© 2012 Emerald Group PublishingAbstract
Purpose – This study aims to assess whether similar user populations in the Internet
produce similar geographical traffic destination patterns on a per-country basis.
Design/methodology/approach – We have collected a country-wide NetFlow trace,
which encompasses the whole Spanish academic network, which comprises more than 350
institutions and one million users, during four months. Such trace comprises several similar
campus networks in terms of population size and structure. To compare their behaviors,
we propose a mixture model, which is primarily based on the Zipf-Mandelbrot power law to
capture the heavy-tailed nature of the per-country traffic distribution. Then, factor analysis
is performed to understand the relation between the response variable, number of bytes or
packets per day, with dependent variables such as the source IP network, traffic direction, and
country.
Findings – Surprisingly, the results show that the geographical distribution is strongly
dependent on the source IP network. Furthermore, even though there are thousands of users in
a typical campus network, it turns out that the aggregation level which is required to observe a
stable geographical pattern is even larger. Consequently, our results show a slow convergence
rate to the domain of attraction of the model, specifically, we have found that at least 35 days
worth of data are necessary to reach stability of the model’s estimated parameters.
Practical implications – Based on these findings, conclusions drawn for one network
cannot be directly extrapolated to different ones. Therefore, ISPs’ traffic measurement campaigns
should include an extensive set of networks to cope with the space diversity, and also
encompass a significant period of time due to the large transient time.
Originality/value – Current state of the art includes some analysis of geographical patterns,
but not comparisons between networks with similar populations. Such comparison can
be useful for the design of Content Distribution Networks and the cost-optimization of peering
agreements.
Files in this item
Google Scholar:Mata Marcos, Felipe
-
García Dorado, José Luis
-
Aracil, Javier
-
López de Vergara Méndez, Jorge Enrique
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.