Preprocess¶

class Manteia.Preprocess.Preprocess(documents=[], labels=[], percentage=1.0, nb_sample=0, path='./Document/', lang='english', preprocess=True, verbose=True)¶

This is the class to preprocess text before task NLP.

Args:

lang=’english’,preprocess=True):

documents (list, optional, defaults to None):
A list of documents.

labels (list, optional, defaults to None):
A list of labels.

percentage (float, optional, defaults to 1.0):
Percentage of the reduction data.

size_by_nb_sample (bool, optional, defaults to False):
Type of réduction by sample or by percentage.

nb_sample (int, optional, defaults to None):
Number of sample after reduction.

path (string, optional, defaults to ‘./Document/’):
Path to save data object.

lang (string, optional, defaults to ‘english’):
lang of stop word.

preprocess (bool, optional, defaults to 1):
make preprocess in init.

Example:

from Manteia.Preprocess import *
import pandas as pd
# Initializing a list of texts,labels
documents=['a text','text b']
# Initializing preprocess configuration
pp=Preprocess(documents)
pp.load()
pp.df_documents=clean(pp.df_documents)
print(pp.df_documents.head())

Attributes: