`capreolus.benchmark.covid`¶

Module Contents¶

Classes¶

`COVID`	Ongoing TREC-COVID bechmark from https://ir.nist.gov/covidSubmit that uses documents from CORD, the COVID-19 Open Research Dataset (https://www.semanticscholar.org/cord19).
`CovidQA`	Base class for Benchmark modules. The purpose of a Benchmark is to provide the data needed to run an experiment, such as queries, folds, and relevance judgments.

Attributes¶

`logger`
`PACKAGE_PATH`

capreolus.benchmark.covid.logger[source]¶

capreolus.benchmark.covid.PACKAGE_PATH[source]¶

class capreolus.benchmark.covid.COVID(config=None, provide=None, share_dependency_objects=False, build=True)[source]¶

Bases: capreolus.benchmark.Benchmark

Ongoing TREC-COVID bechmark from https://ir.nist.gov/covidSubmit that uses documents from CORD, the COVID-19 Open Research Dataset (https://www.semanticscholar.org/cord19).

module_name = 'covid'[source]¶

dependencies[source]¶

data_dir[source]¶

topic_url = 'https://ir.nist.gov/covidSubmit/data/topics-rnd%d.xml'[source]¶

qrel_url_v1 = 'https://ir.nist.gov/covidSubmit/data/qrels-rnd%d.txt'[source]¶

qrel_url_v2 = 'https://ir.nist.gov/covidSubmit/data/qrels-covid_d%d_j0.5-%d.txt'[source]¶

lastest_round = 5[source]¶

query_type = 'title'[source]¶

config_spec[source]¶

build()[source]¶

download_if_missing()[source]¶

prep_backward_compatible_qrels(tmp_dir, prev_qrels_fn, tgt_qrel_fn)[source]¶

Prepare qrels file for round 3 adaptable to previous rounds:: convert the new docids in qrels-covid_d3_j0.5-3.txt back to its old id remove judgement existed in round1 and round2

Warning: this function should not be used when search / training is done on collection released since round 4, where docids are already updated

Parameters

tmp_dir – pathlib.Path object, sthe directory to store downloaded files
prev_qrels_fn – qrels file which store the qrels from previous rounds (round 1 and round 2)
tgt_qrel_fn – qrels file path where to store the processed round 3 qrels file

xml2trectopic(xmlfile)[source]¶

class capreolus.benchmark.covid.CovidQA(config=None, provide=None, share_dependency_objects=False, build=True)[source]¶

Bases: capreolus.benchmark.Benchmark

Base class for Benchmark modules. The purpose of a Benchmark is to provide the data needed to run an experiment, such as queries, folds, and relevance judgments.

Modules should provide:

a topics dict mapping query ids (qids) to queries
a qrels dict mapping qids to docids and relevance labels
a folds dict mapping a fold name to training, dev (validation), and testing qids
if these can be loaded from files in standard formats, they can be specified by setting the topic_file, qrel_file, and fold_file, respectively, rather than by setting the above attributes directly

module_name = 'covidqa'[source]¶

dependencies[source]¶

url = 'https://raw.githubusercontent.com/castorini/pygaggle/master/data/kaggle-lit-review-%s.json'[source]¶

available_versions = ['0.1', '0.2'][source]¶

datadir[source]¶

config_spec[source]¶

build()[source]¶

download_if_missing()[source]¶

capreolus.benchmark.covid¶

Module Contents¶

Classes¶

Attributes¶

`capreolus.benchmark.covid`¶