capreolus.extractor.common

Module Contents

Classes

MultipleTrainingPassagesMixin

Prepare and parse TF training feature that contain multiple passage per query.

SingleTrainingPassagesMixin

Prepare and parse TF training feature that contain single passage per query.

Functions

load_pretrained_embeddings(embedding_name)

load_vocab_file(fn)

save_vocab_file(itos, fn)

Attributes

logger

embedding_paths

pad_tok

capreolus.extractor.common.logger[source]
capreolus.extractor.common.embedding_paths[source]
capreolus.extractor.common.pad_tok = '<pad>'[source]
capreolus.extractor.common.load_pretrained_embeddings(embedding_name)[source]
capreolus.extractor.common.load_vocab_file(fn)[source]
capreolus.extractor.common.save_vocab_file(itos, fn)[source]
class capreolus.extractor.common.MultipleTrainingPassagesMixin[source]

Prepare and parse TF training feature that contain multiple passage per query. That is, the “pos_bert_input” features prepared by extractor’s id2vec() function should have 3 dimension

create_tf_train_feature(sample)[source]

Returns a set of features from a doc. Of the num_passages passages that are present in a document, we use only a subset of it. params: sample - A dict where each entry has the shape [batch_size, num_passages, maxseqlen] Returns a list of features. Each feature is a dict, and each value in the dict has the shape [batch_size, maxseqlen]. Yes, the output shape is different to the input shape because we sample from the passages.

parse_tf_train_example(example_proto)[source]
class capreolus.extractor.common.SingleTrainingPassagesMixin[source]

Prepare and parse TF training feature that contain single passage per query. That is, the “pos_bert_input” features prepared by extractor’s id2vec() function should have 2 dimension

create_tf_train_feature(sample)[source]

Returns a set of features from a doc. Of the num_passages passages that are present in a document, we use only a subset of it. params: sample - A dict where each entry has the shape [batch_size, num_passages, maxseqlen]

Returns a list of features. Each feature is a dict, and each value in the dict has the shape [batch_size, maxseqlen]. Yes, the output shape is different to the input shape because we sample from the passages.

parse_tf_train_example(example_proto)[source]