diffir.run
¶
Module Contents¶
Functions¶
|
|
|
- class diffir.run.MainTask(dataset='none', queries='none', measure='topk', metric='weighted_tau', topk=3, weight={})[source]¶
-
- create_query_objects(self, run_1, run_2, qids, qid2diff, metric_name, dataset, qid2qrelscores=None)[source]¶
TODO: Need a better name This method takes in 2 runs and a set of qids, and constructs a dict for each qid (format specified below) :param: run_1: TREC run of the format {qid: {docid: score}, …} :param: run_2: TREC run of the format {qid: {docid: score}, …} :param qids: A list of qids (strings) :param dataset: Instance of an ir-datasets object :return: A list of dicts. Each dict has the following format: {
“fields”: {“query_id”: “qid”, “title”: “Title query”, “desc”: “Can be empty”, … everything else in ir-dataset query}, “run_1”: [
- {
“doc_id”: “id of the doc”, “score”: <score>, “relevance”: <comes from qrels>, “weights”: [
[field, start, stop, weight] ^ Need more clarity. Return an empty list for now
]
}
], “run_2”: <same format as run 1>
}
- find_snippet(self, weights, doc)[source]¶
- Parameters
weights – A dict of the form {<field_1>: [(start , end, weight), (start, end, weight), ….], <field_2>: …}
Fields are document fields from ir_datasets, for eg: ‘text’. ‘start’ and ‘end’ are character offsets into the doc :param doc: A large string representing the doc :return: A dict {‘field’: <field_name>, ‘start’: <start>, ‘stop’: <end>, ‘weights’: <weights>}
- create_doc_objects(self, query_objects, dataset)[source]¶
TODO: Need a better name From the given query objects, fetch the used docids from the ir-dataset object :param query_objects: The return type of create_query_objects() :param dataset: An instance of irdatasets :return: A dict of the form: {
<doc_id>: {“doc_id”: <doc_id>, “text”: <content of the doc>, “url”: <url>, … rest from ir-dataset, …
}
- json(self, run_1_fn, run_2_fn=None)[source]¶
Represent the data to be visualized in a json format. The format is specified here: https://github.com/capreolus-ir/diffir-private/issues/5 :params: 2 TREC runs. These dicts of the form {qid: {docid: score}}