Preprocess

class Manteia.Preprocess.Preprocess(documents=[], labels=[], percentage=1.0, nb_sample=0, path='./Document/', lang='english', preprocess=True, verbose=True)

This is the class to preprocess text before task NLP.

Args:

lang=’english’,preprocess=True):

documents (list, optional, defaults to None):

A list of documents.

labels (list, optional, defaults to None):

A list of labels.

percentage (float, optional, defaults to 1.0):

Percentage of the reduction data.

size_by_nb_sample (bool, optional, defaults to False):

Type of réduction by sample or by percentage.

nb_sample (int, optional, defaults to None):

Number of sample after reduction.

path (string, optional, defaults to ‘./Document/’):

Path to save data object.

lang (string, optional, defaults to ‘english’):

lang of stop word.

preprocess (bool, optional, defaults to 1):

make preprocess in init.

Example:

from Manteia.Preprocess import *
import pandas as pd
# Initializing a list of texts,labels
documents=['a text','text b']
# Initializing preprocess configuration
pp=Preprocess(documents)
pp.load()
pp.df_documents=clean(pp.df_documents)
print(pp.df_documents.head())

Attributes: