Augmentation¶
-
class
Manteia.Augmentation.
Augmentation
(documents=[], labels=[], strategy='daia', verbose=True)¶ This is the class to do data augmentation.
Args:
- documents (
list
, optional, defaults to None): A list of documents.
- labels (
float
, optional, defaults to None): A list of labels.
- dataset_name (
string
, optional, defaults to ‘’): Name of the dataset.
- path (
string
, optional, defaults to ‘’): Path to save the report.
Example:
from Manteia.Statistic import Statistic documents=['a text','text b'] labels=['a','b'] Statistic(documents,labels)
Attributes:
- documents (
-
class
Manteia.Augmentation.
EfficientRandomGen
¶ A base class that generate multiple random numbers at the same time.
-
get_random_prob
()¶ Get a random number.
-
get_random_token
()¶ Get a random token.
-
reset_random_prob
()¶ Generate many random numbers at the same time and cache them.
-
-
class
Manteia.Augmentation.
TfIdfWordRep
(token_prob, data_stats)¶ TF-IDF Based Word Replacement.
-
get_replace_prob
(all_words)¶ Compute the probability of replacing tokens in a sentence.
-
replace_tokens
(word_list, replace_prob)¶ Replace tokens in a sentence.
-
-
Manteia.Augmentation.
get_data_stats
(texts)¶ Compute the IDF score for each word. Then compute the TF-IDF score.
-
Manteia.Augmentation.
pyramid
(documents, labels, level)¶ This function compute DAIA.
- Args:
documents labels level
- return
documents_augmented labels_augmented
Example: