`capreolus.extractor.common`¶

Module Contents¶

Classes¶

`MultipleTrainingPassagesMixin`	Prepare and parse TF training feature that contain multiple passage per query.
`SingleTrainingPassagesMixin`	Prepare and parse TF training feature that contain single passage per query.

Functions¶

`load_pretrained_embeddings`(embedding_name)
`load_vocab_file`(fn)
`save_vocab_file`(itos, fn)

Attributes¶

`logger`
`embedding_paths`
`pad_tok`

capreolus.extractor.common.logger[source]¶

capreolus.extractor.common.embedding_paths[source]¶

capreolus.extractor.common.pad_tok = '<pad>'[source]¶

capreolus.extractor.common.load_pretrained_embeddings(embedding_name)[source]¶

capreolus.extractor.common.load_vocab_file(fn)[source]¶

capreolus.extractor.common.save_vocab_file(itos, fn)[source]¶

class capreolus.extractor.common.MultipleTrainingPassagesMixin[source]¶

Prepare and parse TF training feature that contain multiple passage per query. That is, the “pos_bert_input” features prepared by extractor’s id2vec() function should have 3 dimension

create_tf_train_feature(sample)[source]¶: Returns a set of features from a doc. Of the num_passages passages that are present in a document, we use only a subset of it. params: sample - A dict where each entry has the shape [batch_size, num_passages, maxseqlen] Returns a list of features. Each feature is a dict, and each value in the dict has the shape [batch_size, maxseqlen]. Yes, the output shape is different to the input shape because we sample from the passages.

parse_tf_train_example(example_proto)[source]¶

class capreolus.extractor.common.SingleTrainingPassagesMixin[source]¶

Prepare and parse TF training feature that contain single passage per query. That is, the “pos_bert_input” features prepared by extractor’s id2vec() function should have 2 dimension

create_tf_train_feature(sample)[source]¶

Returns a set of features from a doc. Of the num_passages passages that are present in a document, we use only a subset of it. params: sample - A dict where each entry has the shape [batch_size, num_passages, maxseqlen]

Returns a list of features. Each feature is a dict, and each value in the dict has the shape [batch_size, maxseqlen]. Yes, the output shape is different to the input shape because we sample from the passages.

parse_tf_train_example(example_proto)[source]¶

capreolus.extractor.common¶

Module Contents¶

Classes¶

Functions¶

Attributes¶

`capreolus.extractor.common`¶