Show simple item record

dc.contributor.authorPerdices Burrero, Daniel 
dc.contributor.authorRamos, Javier
dc.contributor.authorGarcía Dorado, José Luis 
dc.contributor.authorGonzález, Iván
dc.contributor.authorLópez de Vergara Méndez, Jorge Enrique 
dc.contributor.otherUAM. Departamento de Tecnología Electrónica y de las Comunicacioneses_ES
dc.date.accessioned2022-03-11T11:44:19Z
dc.date.available2022-03-11T11:44:19Z
dc.date.issued2021-10-24
dc.identifier.citationComputer Networks 198 (2021): 108357en_US
dc.identifier.issn1389-1286es_ES
dc.identifier.urihttp://hdl.handle.net/10486/700706
dc.description.abstractIn an Internet arena where the search engines and other digital marketing firms’ revenues peak, other actors still have open opportunities to monetize their users’ data. After the convenient anonymization, aggregation, and agreement, the set of websites users visit may result in exploitable data for ISPs. Uses cover from assessing the scope of advertising campaigns to reinforcing user fidelity among other marketing approaches, as well as security issues. However, sniffers based on HTTP, DNS, TLS or flow features do not suffice for this task. Modern websites are designed for preloading and prefetching some contents in addition to embedding banners, social networks’ links, images, and scripts from other websites. This self-triggered traffic makes it confusing to assess which websites users visited on purpose. Moreover, DNS caches prevent some queries of actively visited websites to be even sent. On this limited input, we propose to handle such domains as words and the sequences of domains as documents. This way, it is possible to identify the visited websites by translating this problem to a text classification context and applying the most promising techniques of the natural language processing and neural networks fields. After applying different representation methods such as TF–IDF, Word2vec, Doc2vec, and custom neural networks in diverse scenarios and with several datasets, we can state websites visited on purpose with accuracy figures over 90%, with peaks close to 100%, being processes that are fully automated and free of any human parametrizationen_US
dc.description.sponsorshipThis research has been partially funded by the Spanish State Research Agency under the project AgileMon (AEI PID2019-104451RBC21) and by the Spanish Ministry of Science, Innovation and Universities under the program for the training of university lecturers (Grant number: FPU19/05678)en_US
dc.format.extent14 pag.es_ES
dc.format.mimetypeapplication/pdfes_ES
dc.language.isoengen
dc.publisherElsevieren_US
dc.relation.ispartofComputer Networksen_US
dc.rights© 2021 The Authorsen_US
dc.subject.otherDeep learningen_US
dc.subject.otherInternet monitoringen_US
dc.subject.otherNatural language processingen_US
dc.subject.otherTraffic monetizationen_US
dc.subject.otherUsers analyticsen_US
dc.subject.otherWeb browsingen_US
dc.titleNatural language processing for web browsing analytics: Challenges, lessons learned, and opportunitiesen_US
dc.typearticleen_US
dc.subject.ecienciaTelecomunicacioneses_ES
dc.relation.publisherversionhttps://doi.org/10.1016/j.comnet.2021.108357es_ES
dc.identifier.doi10.1016/j.comnet.2021.108357es_ES
dc.identifier.publicationfirstpage108357-1es_ES
dc.identifier.publicationlastpage108357-14es_ES
dc.identifier.publicationvolume198es_ES
dc.relation.projectIDGobierno de España. PID2019-104451RBC21es_ES
dc.type.versioninfo:eu-repo/semantics/publishedVersionen
dc.rights.ccReconocimiento – NoComercial – SinObraDerivada
dc.rights.accessRightsopenAccesses_ES
dc.authorUAMRamos De Santiago, Fco. Javier (261890)
dc.authorUAMGarcía Dorado, José Luis (261729)
dc.authorUAMLópez De Vergara Méndez, Jorge Enrique (261085)
dc.facultadUAMEscuela Politécnica Superior


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record