Augmentation

class Manteia.Augmentation.Augmentation(documents=[], labels=[], strategy='daia', verbose=True)

This is the class to do data augmentation.

Args:

documents (list, optional, defaults to None):

A list of documents.

labels (float, optional, defaults to None):

A list of labels.

dataset_name (string, optional, defaults to ‘’):

Name of the dataset.

path (string, optional, defaults to ‘’):

Path to save the report.

Example:

from Manteia.Statistic import Statistic
documents=['a text','text b']
labels=['a','b']
Statistic(documents,labels)

Attributes:

class Manteia.Augmentation.EfficientRandomGen

A base class that generate multiple random numbers at the same time.

get_random_prob()

Get a random number.

get_random_token()

Get a random token.

reset_random_prob()

Generate many random numbers at the same time and cache them.

class Manteia.Augmentation.TfIdfWordRep(token_prob, data_stats)

TF-IDF Based Word Replacement.

get_replace_prob(all_words)

Compute the probability of replacing tokens in a sentence.

replace_tokens(word_list, replace_prob)

Replace tokens in a sentence.

Manteia.Augmentation.get_data_stats(texts)

Compute the IDF score for each word. Then compute the TF-IDF score.

Manteia.Augmentation.pyramid(documents, labels, level)

This function compute DAIA.

Args:

documents labels level

return

documents_augmented labels_augmented

Example: