Factor analysis of Internet traffic destinations from similar source networks
EntityUAM. Departamento de Tecnología Electrónica y de las Comunicaciones
PublisherEmerald Group Publishing Limited
10.1108/10662241211199951Internet Research 22.1 (2012): 29 - 56
Funded byThis work has been partially funded by the Spanish Ministry of Education and Science under project ANFORA (TEC2009-13385), European Union CELTIC initiative program under project TRAMMS, European Union project OneLab, and the F.P.U. and F.P.I. Research Fellowship programs of Spain. The authors would also like to thank the anonymous reviewers who helped us to improve the quality of the paper.
SubjectsContent distribution networks; Factor analysis; Geographical characterization; Geography; Heavy-hitters; Internet remote host location; Zipf-Mandelbrot; Telecomunicaciones
NoteThis article is (©) Emerald Group Publishing and permission has been granted for this version to appear here (http://www.emeraldinsight.com/doi/full/10.1108/10662241211199951). Emerald does not grant permission for this article to be further copied/distributed or hosted elsewhere without the express permission from Emerald Group Publishing Limited.
Rights© 2012 Emerald Group Publishing
Purpose – This study aims to assess whether similar user populations in the Internet produce similar geographical traffic destination patterns on a per-country basis. Design/methodology/approach – We have collected a country-wide NetFlow trace, which encompasses the whole Spanish academic network, which comprises more than 350 institutions and one million users, during four months. Such trace comprises several similar campus networks in terms of population size and structure. To compare their behaviors, we propose a mixture model, which is primarily based on the Zipf-Mandelbrot power law to capture the heavy-tailed nature of the per-country traffic distribution. Then, factor analysis is performed to understand the relation between the response variable, number of bytes or packets per day, with dependent variables such as the source IP network, traffic direction, and country. Findings – Surprisingly, the results show that the geographical distribution is strongly dependent on the source IP network. Furthermore, even though there are thousands of users in a typical campus network, it turns out that the aggregation level which is required to observe a stable geographical pattern is even larger. Consequently, our results show a slow convergence rate to the domain of attraction of the model, specifically, we have found that at least 35 days worth of data are necessary to reach stability of the model’s estimated parameters. Practical implications – Based on these findings, conclusions drawn for one network cannot be directly extrapolated to different ones. Therefore, ISPs’ traffic measurement campaigns should include an extensive set of networks to cope with the space diversity, and also encompass a significant period of time due to the large transient time. Originality/value – Current state of the art includes some analysis of geographical patterns, but not comparisons between networks with similar populations. Such comparison can be useful for the design of Content Distribution Networks and the cost-optimization of peering agreements.
Google Scholar:Mata Marcos, Felipe - García Dorado, José Luis - Aracil, Javier - López de Vergara Méndez, Jorge Enrique
This item appears in the following Collection(s)
Showing items related by title, author, creator and subject.