Dataset

class Manteia.Dataset.Dataset(name='20newsgroups', train=True, test=False, dev=False, classe=True, desc=False, path='./dataset', verbose=True)

This is the class description in order to get some dataset.

  • name - name of the dataset (str)

  • train - load the dataset train Default: ‘True’.

  • test - load the dataset test Default: ‘False’.

  • dev - load the dataset dev Default: ‘False’.

  • description - load description Default: ‘False’.

  • verbose - produce and display some explanation

  • path - Path to the data file.

del_dir(name)

Delete file of the dataset.

load_20newsgroups()
Defines 20newsgroups datasets.

The labels includes:

  • 0 : sci.crypt.

  • 1 : sci.electronics.

  • 2 : sci.med.

  • 3 : sci.space.

  • 4 : rec.autos.

  • 5 : rec.sport.baseball.

  • 6 : rec.sport.hockey.

  • 7 : talk.politics.guns.

  • 8 : talk.politics.mideast.

  • 9 : talk.politics.misc.

  • 10 : talk.religion.misc.

from Manteia.Dataset import Dataset

ds=Dataset('20newsgroups')

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])
load_Amazon_Review_Full()
Defines Amazon Review Full Star Dataset.

The labels includes:

1 - 5 : rating classes (5 is highly recommended).

from Manteia.Dataset import Dataset

ds=Dataset('Amazon Review Full',test=True,desc=True)

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])

print('Test : ')
print(ds.documents_test[:5])
print(ds.labels_test[:5])

print('Description :')
print(ds.description)
load_Amazon_Review_Polarity()
Defines Amazon Review Polarity datasets.

The labels includes:

  • 1 : Negative polarity.

  • 2 : Positive polarity.

from Manteia.Dataset import Dataset

ds=Dataset('Amazon Review Polarity',test=True,desc=True)

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])
print(ds.documents_test[:5])
print(ds.labels_test[:5])
print(ds.description)
load_DBPedia()
Defines DBPedia datasets.

The labels includes:

  • Company

  • EducationalInstitution

  • Artist

  • Athlete

  • OfficeHolder

  • MeanOfTransportation

  • Building

  • NaturalPlace

  • Village

  • Animal

  • Plant

  • Album

  • Film

  • WrittenWork

from Manteia.Dataset import Dataset

ds=Dataset('DBPedia',test=True,desc=True,classe=True)

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])

print('Test : ')
print(ds.documents_test[:5])
print(ds.labels_test[:5])

print('Description :')
print(ds.description)

print('List labels :')
print(ds.list_labels)
load_SST_2()
Defines SST 2 datasets.

The labels includes:

  • Negative polarity.

  • Positive polarity.

from Manteia.Dataset import Dataset

ds=Dataset('SST-2')

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])
load_SST_5()
Defines SST 5 datasets.

The labels includes:

  • very negative.

  • negative.

  • neutral.

  • positive.

  • very positive.

from Manteia.Dataset import Dataset

ds=Dataset('SST-5',dev=True)

print('Dev : ')
print(ds.documents_dev[:5])
print(ds.labels_dev[:5])
load_Short_Jokes()

Defines Short_Jokes dataset.

from Manteia.Dataset import Dataset

ds=Dataset('pubmed_rct20k')

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])
load_Tweeter_Airline_Sentiment()
Defines Tweeter Airline Sentiment dataset.

The labels includes:

  • positive.

  • neutral.

  • negative.

from Manteia.Dataset import Dataset

ds=Dataset('Tweeter Airline Sentiment')

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])
load_Yahoo_Answers()
Defines Yahoo! Answers datasets.

The labels includes:

  • Society & Culture

  • Science & Mathematics

  • Health

  • Education & Reference

  • Computers & Internet

  • Sports

  • Business & Finance

  • Entertainment & Music

  • Family & Relationships

  • Politics & Government

from Manteia.Dataset import Dataset

ds=Dataset('Yahoo! Answers',test=True,desc=True)

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])

print('Test : ')
print(ds.documents_test[:5])
print(ds.labels_test[:5])

print('Description :')
print(ds.description)

print('List labels :')
print(ds.list_labels)
load_Yelp_Review_Full()
Defines Yelp Review Full Star Dataset.

The labels includes:

1 - 5 : rating classes (5 is highly recommended).

from Manteia.Dataset import Dataset

ds=Dataset('Yelp Review Full',test=True,desc=True)

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])

print('Test : ')
print(ds.documents_test[:5])
print(ds.labels_test[:5])

print('Description :')
print(ds.description)
load_Yelp_Review_Polarity()
Defines Yelp Review Polarity datasets.

The labels includes:

  • 1 : Negative polarity.

  • 2 : Positive polarity.

from Manteia.Dataset import Dataset

ds=Dataset('Yelp Review Polarity',test=True,desc=True)

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])
print(ds.documents_test[:5])
print(ds.labels_test[:5])
print(ds.description)
load_agnews()
Defines Agnews datasets.

The labels includes:

  • 0 : World

  • 1 : Sports

  • 2 : Business

  • 3 : Sci/Tech

from Manteia.Dataset import Dataset

ds=Dataset('agnews')

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])
load_drugscom()
Defines Drugs.com Dataset.

The labels includes:

0 - 9 : rating classes (9 is highly).

from Manteia.Dataset import Dataset

ds=Dataset('drugscom')

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])
load_pubmed_rct20k()
Defines Pubmed RCT20k datasets.

The labels includes:

  • BACKGROUND.

  • CONCLUSIONS.

  • METHODS.

  • OBJECTIVE.

  • RESULTS.

from Manteia.Dataset import Dataset

ds=Dataset('pubmed_rct20k')

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])
load_trec()
Defines Trec datasets.

The labels includes:

  • ABBREVIATION

  • ENTITY

  • DESCRIPTION

  • HUMAN

  • LOCATION

  • NUMERIC

from Manteia.Dataset import Dataset

ds=Dataset('agnews')

print('Train : ')
print(ds.documents_train[:5])
print(ds.labels_train[:5])
Manteia.Dataset.clear_folder(dir)

Del directorie and is content.

Manteia.Dataset.download_and_extract(url, data_dir)

download_and_extract file of dataset.

See [1] for an introduction to stylish blah, blah…

1

Edward Nelson. Radically Elementary Probability Theory. Princeton University Press, 1987.