capreolus.extractor.bertpassage

Module Contents

Classes

BertPassage

Extracts passages from the document to be later consumed by a BERT based model.

Attributes

logger

capreolus.extractor.bertpassage.logger[source]
class capreolus.extractor.bertpassage.BertPassage(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.extractor.Extractor, capreolus.extractor.common.SingleTrainingPassagesMixin

Extracts passages from the document to be later consumed by a BERT based model. Does NOT use all the passages. The first passages is always used. Use the prob config to control the probability of a passage being selected Gotcha: In Tensorflow the train tfrecords have shape (batch_size, maxseqlen) while dev tf records have the shape (batch_size, num_passages, maxseqlen). This is because during inference, we want to pool over the scores of the passages belonging to a doc

module_name = bertpassage[source]
dependencies[source]
config_spec[source]
config_keys_not_in_path = ['usecache'][source]
build()[source]
load_state(qids, docids)[source]
cache_state(qids, docids)[source]
get_tf_feature_description()[source]
create_tf_dev_feature(sample)[source]

Unlike the train feature, the dev set uses all passages. Both the input and the output are dicts with the shape [batch_size, num_passages, maxseqlen]

parse_tf_dev_example(example_proto)[source]
exist()[source]
preprocess(qids, docids, topics)[source]
id2vec(qid, posid, negid=None, label=None, *args, **kwargs)[source]

See parent class for docstring