Preprocess¶
-
class
Manteia.Preprocess.
Preprocess
(documents=[], labels=[], percentage=1.0, nb_sample=0, path='./Document/', lang='english', preprocess=True, verbose=True)¶ This is the class to preprocess text before task NLP.
Args:
lang=’english’,preprocess=True):
- documents (
list
, optional, defaults to None): A list of documents.
- labels (
list
, optional, defaults to None): A list of labels.
- percentage (
float
, optional, defaults to 1.0): Percentage of the reduction data.
- size_by_nb_sample (
bool
, optional, defaults to False): Type of réduction by sample or by percentage.
- nb_sample (
int
, optional, defaults to None): Number of sample after reduction.
- path (
string
, optional, defaults to ‘./Document/’): Path to save data object.
- lang (
string
, optional, defaults to ‘english’): lang of stop word.
- preprocess (
bool
, optional, defaults to 1): make preprocess in init.
Example:
from Manteia.Preprocess import * import pandas as pd # Initializing a list of texts,labels documents=['a text','text b'] # Initializing preprocess configuration pp=Preprocess(documents) pp.load() pp.df_documents=clean(pp.df_documents) print(pp.df_documents.head())
Attributes:
- documents (