diffir.run

Module Contents

Classes

MainTask

Functions

main()

diff(runs, config, cli, web, print_html=True)

diffir.run.main()[source]
diffir.run.diff(runs, config, cli, web, print_html=True)[source]
class diffir.run.MainTask(dataset='none', queries='none', measure='topk', metric='weighted_tau', topk=3, weight={})[source]
module_type = task[source]
module_name = main[source]
compute_qrel_metrics(self)[source]
create_query_objects(self, run_1, run_2, qids, qid2diff, metric_name, dataset, qid2qrelscores=None)[source]

TODO: Need a better name This method takes in 2 runs and a set of qids, and constructs a dict for each qid (format specified below) :param: run_1: TREC run of the format {qid: {docid: score}, …} :param: run_2: TREC run of the format {qid: {docid: score}, …} :param qids: A list of qids (strings) :param dataset: Instance of an ir-datasets object :return: A list of dicts. Each dict has the following format: {

“fields”: {“query_id”: “qid”, “title”: “Title query”, “desc”: “Can be empty”, … everything else in ir-dataset query}, “run_1”: [

{

“doc_id”: “id of the doc”, “score”: <score>, “relevance”: <comes from qrels>, “weights”: [

[field, start, stop, weight] ^ Need more clarity. Return an empty list for now

]

}

], “run_2”: <same format as run 1>

}

create_summary(self, run1_ranked_docs, run2_ranked_docs)[source]
merge_weights(self, run1_for_query, run_2_for_query)[source]
find_snippet(self, weights, doc)[source]
Parameters

weights – A dict of the form {<field_1>: [(start , end, weight), (start, end, weight), ….], <field_2>: …}

Fields are document fields from ir_datasets, for eg: ‘text’. ‘start’ and ‘end’ are character offsets into the doc :param doc: A large string representing the doc :return: A dict {‘field’: <field_name>, ‘start’: <start>, ‘stop’: <end>, ‘weights’: <weights>}

create_doc_objects(self, query_objects, dataset)[source]

TODO: Need a better name From the given query objects, fetch the used docids from the ir-dataset object :param query_objects: The return type of create_query_objects() :param dataset: An instance of irdatasets :return: A dict of the form: {

<doc_id>: {“doc_id”: <doc_id>, “text”: <content of the doc>, “url”: <url>, … rest from ir-dataset, …

}

json(self, run_1_fn, run_2_fn=None)[source]

Represent the data to be visualized in a json format. The format is specified here: https://github.com/capreolus-ir/diffir-private/issues/5 :params: 2 TREC runs. These dicts of the form {qid: {docid: score}}

print_query_to_console(self, q, console)[source]
render_snippet_for_cli(self, doc_id, snp, docs)[source]
cli_display_one_query(self, console, q, start_idx, end_idx, docs, run_1_name)[source]
cli_compare_one_query(self, console, q, start_idx, end_idx, docs, run1_name, run2_name)[source]
cli(self, runs)[source]
web(self, runs)[source]
make_rel_colors(self, dataset)[source]