The strange case of reproducibility versus representativeness in contextual suggestion test collections

Samar, Thaer; Bellogin Kouki, Alejandro; De Vries, Arien P.

UAM_Biblioteca

dc.contributor.author	Samar, Thaer
dc.contributor.author	Bellogin Kouki, Alejandro
dc.contributor.author	De Vries, Arien P.
dc.contributor.other	UAM. Departamento de Ingeniería Informática	es_ES
dc.date.accessioned	2016-10-25T13:05:09Z
dc.date.available	2016-10-25T13:05:09Z
dc.date.issued	2016-06-01
dc.identifier.citation	Information Retrieval Journal 19.3 (2016): 230-255	en_US
dc.identifier.issn	1386-4564 (print)	en_US
dc.identifier.issn	1573-7659 (online)	en_US
dc.identifier.uri	http://hdl.handle.net/10486/674485
dc.description	The final publication is available at Springer via http://dx.doi.org/10.1007/s10791-015-9276-9	en_US
dc.description.abstract	The most common approach to measuring the effectiveness of Information Retrieval systems is by using test collections. The Contextual Suggestion (CS) TREC track provides an evaluation framework for systems that recommend items to users given their geographical context. The specific nature of this track allows the participating teams to identify candidate documents either from the Open Web or from the ClueWeb12 collection, a static version of the web. In the judging pool, the documents from the Open Web and ClueWeb12 collection are distinguished. Hence, each system submission should be based only on one resource, either Open Web (identified by URLs) or ClueWeb12 (identified by ids). To achieve reproducibility, ranking web pages from ClueWeb12 should be the preferred method for scientific evaluation of CS systems, but it has been found that the systems that build their suggestion algorithms on top of input taken from the Open Web achieve consistently a higher effectiveness. Because most of the systems take a rather similar approach to making CSs, this raises the question whether systems built by researchers on top of ClueWeb12 are still representative of those that would work directly on industry-strength web search engines. Do we need to sacrifice reproducibility for the sake of representativeness? We study the difference in effectiveness between Open Web systems and ClueWeb12 systems through analyzing the relevance assessments of documents identified from both the Open Web and ClueWeb12. Then, we identify documents that overlap between the relevance assessments of the Open Web and ClueWeb12, observing a dependency between relevance assessments and the source of the document being taken from the Open Web or from ClueWeb12. After that, we identify documents from the relevance assessments of the Open Web which exist in the ClueWeb12 collection but do not exist in the ClueWeb12 relevance assessments. We use these documents to expand the ClueWeb12 relevance assessments. Our main findings are twofold. First, our empirical analysis of the relevance assessments of 2 years of CS track shows that Open Web documents receive better ratings than ClueWeb12 documents, especially if we look at the documents in the overlap. Second, our approach for selecting candidate documents from ClueWeb12 collection based on information obtained from the Open Web makes an improvement step towards partially bridging the gap in effectiveness between Open Web and ClueWeb12 systems, while at the same time we achieve reproducible results on well-known representative sample of the web.	en_US
dc.description.sponsorship	This research was supported by the Netherlands Organization for Scientific Research (WebART Project, NWO CATCH #640.005.001). Part of this work was supported by the Spanish Ministry of Science and Innovation (TIN2013-47090-C3-2). This work was carried out on the Dutch national e-infrastructure with the support of SURF Foundation.	en_US
dc.format.extent	24 pag.	es_ES
dc.format.mimetype	application/pdf	en
dc.language.iso	eng	en
dc.publisher	Springer Netherlands	en_US
dc.relation.ispartof	Information Retrieval Journal	en_US
dc.rights	© The Author(s) 2015	en_US
dc.subject.other	Reproducibility	en_US
dc.subject.other	Contextual suggestion	en_US
dc.subject.other	Open vs archived web	en_US
dc.subject.other	Test collections evaluation	en_US
dc.subject.other	Filtering and recommendation	en_US
dc.subject.other	Web IR and social media search	en_US
dc.title	The strange case of reproducibility versus representativeness in contextual suggestion test collections	en_US
dc.type	article	en_US
dc.subject.eciencia	Informática	es_ES
dc.relation.publisherversion	http://dx.doi.org/10.1007/s10791-015-9276-9
dc.identifier.doi	10.1007/s10791-015-9276-9
dc.identifier.publicationfirstpage	230
dc.identifier.publicationissue	3
dc.identifier.publicationlastpage	255
dc.identifier.publicationvolume	19
dc.relation.projectID	Gobierno de España. TIN2013-47090-C3-2	es_ES
dc.type.version	info:eu-repo/semantics/acceptedVersion	en
dc.contributor.group	Recuperación de información (ING EPS-008)	es_ES
dc.rights.accessRights	openAccess	en_US
dc.facultadUAM	Escuela Politécnica Superior

Files in this item

Name:: strange_samar_IRJ_2016_ps.pdf
Size:: 5.118Mb
Format:: PDF

This item appears in the following Collection(s)

Producción científica en acceso abierto de la UAM [20391]

Show simple item record

UAM_Biblioteca

The strange case of reproducibility versus representativeness in contextual suggestion test collections

Files in this item

This item appears in the following Collection(s)

Related items

Better contextual suggestions in ClueWeb12 using domain knowledge inferred from the open web ﻿

Improving Contextual Suggestions using Open Web Domain Knowledge ﻿

Workshop on reproducibility and replication in recommender systems evaluation - RepSys ﻿

Better contextual suggestions in ClueWeb12 using domain knowledge inferred from the open web

Improving Contextual Suggestions using Open Web Domain Knowledge

Workshop on reproducibility and replication in recommender systems evaluation - RepSys