UAM | UAM_Biblioteca | Unified search engine | Scientific Production Portal | UAM Research Data Repository
Biblos-e Archivo
    • español
    • English
  • English 
    • español
    • English
  • Log in
JavaScript is disabled for your browser. Some features of this site may not work without it.

Search Biblos-e Archivo

Advanced Search

Browse

All of Biblos-e ArchivoCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsFacultiesThis CollectionBy Issue DateAuthorsTitlesSubjectsFaculties

My Account

Log inRegister

Statistics

View Usage Statistics

Help

Information about Biblos-e ArchivoI want to submit my workFrequently Asked Questions

UAM_Biblioteca

View Item 
  •   Biblos-e Archivo
  • 1 - Producción científica en acceso abierto de la UAM
  • Producción científica en acceso abierto de la UAM
  • View Item
  •   Biblos-e Archivo
  • 1 - Producción científica en acceso abierto de la UAM
  • Producción científica en acceso abierto de la UAM
  • View Item

Natural language processing for web browsing analytics: Challenges, lessons learned, and opportunities

Author
Perdices Burrero, Danieluntranslated; Ramos, Javier; García Dorado, José Luisuntranslated; González, Iván; López de Vergara Méndez, Jorge Enriqueuntranslated
Entity
UAM. Departamento de Tecnología Electrónica y de las Comunicaciones
Publisher
Elsevier
Date
2021-10-24
Citation
10.1016/j.comnet.2021.108357
Computer Networks 198 (2021): 108357
 
 
 
ISSN
1389-1286
DOI
10.1016/j.comnet.2021.108357
Funded by
This research has been partially funded by the Spanish State Research Agency under the project AgileMon (AEI PID2019-104451RBC21) and by the Spanish Ministry of Science, Innovation and Universities under the program for the training of university lecturers (Grant number: FPU19/05678)
Project
Gobierno de España. PID2019-104451RBC21
Editor's Version
https://doi.org/10.1016/j.comnet.2021.108357
Subjects
Deep learning; Internet monitoring; Natural language processing; Traffic monetization; Users analytics; Web browsing; Telecomunicaciones
URI
http://hdl.handle.net/10486/700706
Rights
© 2021 The Authors

Licencia de Creative Commons
Esta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.

Abstract

In an Internet arena where the search engines and other digital marketing firms’ revenues peak, other actors still have open opportunities to monetize their users’ data. After the convenient anonymization, aggregation, and agreement, the set of websites users visit may result in exploitable data for ISPs. Uses cover from assessing the scope of advertising campaigns to reinforcing user fidelity among other marketing approaches, as well as security issues. However, sniffers based on HTTP, DNS, TLS or flow features do not suffice for this task. Modern websites are designed for preloading and prefetching some contents in addition to embedding banners, social networks’ links, images, and scripts from other websites. This self-triggered traffic makes it confusing to assess which websites users visited on purpose. Moreover, DNS caches prevent some queries of actively visited websites to be even sent. On this limited input, we propose to handle such domains as words and the sequences of domains as documents. This way, it is possible to identify the visited websites by translating this problem to a text classification context and applying the most promising techniques of the natural language processing and neural networks fields. After applying different representation methods such as TF–IDF, Word2vec, Doc2vec, and custom neural networks in diverse scenarios and with several datasets, we can state websites visited on purpose with accuracy figures over 90%, with peaks close to 100%, being processes that are fully automated and free of any human parametrization
Show full item record

Files in this item

Thumbnail
Name
natural_perdices_comput.netw_2021.pdf
Size
1.520Mb
Format
PDF

Refworks Export

Google™ Scholar:Perdices Burrero, Daniel - Ramos, Javier - García Dorado, José Luis - González, Iván - López de Vergara Méndez, Jorge Enrique

This item appears in the following Collection(s)

  • Producción científica en acceso abierto de la UAM [16606]

Related items

Showing items related by title, author, creator and subject.

  • FlexiTop: A flexible and scalable network monitoring system for Over-The-Top services 

    Perdices Burrero, DanielAutoridad UAM; López de Vergara Méndez, Jorge EnriqueAutoridad UAM; Roquero, Paula; Vega, Carlos; Aracil, Javier
    2017-12-31
  • FlexiTop: Sistema escalable y flexible de medidas de calidad para servicios Over-The-Top 

    Perdices Burrero, DanielAutoridad UAM; López de Vergara Méndez, Jorge EnriqueAutoridad UAM; Roquero, Paula; Vega, Carlos; Aracil, Javier
    2017
  • Towards the Automatic and Schedule-Aware Alerting of Internetwork Time Series 

    Perdices Burrero, DanielAutoridad UAM; García Dorado, José LuisAutoridad UAM; Ramos, Javier; De Pool, Rodrigo; Aracil, Javier
    2021-04-15
All the documents from Biblos-e Archivo are protected by copyrights. Some rights reserved.
Universidad Autónoma de Madrid. Biblioteca
Contact Us | Send Feedback
We are onFacebookCanal BiblosYouTubeTwitterPinterestWhatsappInstagram
 

 

All the documents from Biblos-e Archivo are protected by copyrights. Some rights reserved.
Universidad Autónoma de Madrid. Biblioteca
Contact Us | Send Feedback
We are onFacebookCanal BiblosYouTubeTwitterPinterestWhatsappInstagram