Extracts passages from the document to be later consumed by a BERT based model.
- class capreolus.extractor.pooled_bertpassage.PooledBertPassage(config=None, provide=None, share_dependency_objects=False, build=True)¶
Extracts passages from the document to be later consumed by a BERT based model. Different from BertPassage in the sense that all the passages from a document “stick together” during training - the resulting feature always have the shape (batch, num_passages, maxseqlen) - and this allows the reranker to pool over passages from the same document during training
- module_name = pooledbertpassage¶
Returns a set of features from a doc. Of the num_passages passages that are present in a document, we use only a subset of it. params: sample - A dict where each entry has the shape [batch_size, num_passages, maxseqlen]
Returns a list of features. Each feature is a dict, and each value in the dict has the shape [batch_size, maxseqlen]. Yes, the output shape is different to the input shape because we sample from the passages.
Unlike the train feature, the dev set uses all passages. Both the input and the output are dicts with the shape [batch_size, num_passages, maxseqlen]
- id2vec(qid, posid, negid=None, label=None, *args, **kwargs)¶
See parent class for docstring