This repository has been archived by the owner on Jul 26, 2020. It is now read-only.

DataAnalysisProject

Anapy -- A simple package for data analysis, visualization and machine learning

This package can be installed using pip install .

Sub Packages

datamanip

Modules

CentralValues: Returns a dictionary containing central values Mean, Median, Range, Variance, Standard Deviation and Quantile

DataOps: Functions to work with pandas dataframes-- (1)dataDep_csv(infile) returns a deduplicated dataframe; (2) dataFrameSplit(dataframe, no of records) splits a dataframe based on no of records needed

datasetSeparator: Useful for looking into pandas dataframe and do column manipulation like removal of columns, current functions -- (1)displayCols(dataframe) to display columns,(2) remCols(dataframe) to remove columns, (3) sep_data_target(dataframe) to separate data and target

externals

Modules

LoadDataset: (1)load_pickle(filestr) Loads a pickled object, (2) data_target_separator(numpy array) data, target separator for numpy dataset. Assumes the last column contains the labels.

mlops

Modules

learnfromsample: An important module which takes training set and test set as inputs with parameters such as sample size, sample methods and classifier. Scaler is optional. Returns test set true labels and predicted labels, training set true labels and predicted labels and Fitting time and Prediction time of the model under examination.

Visualization: Dimensionality reduction in order to visualize dataset in 2D and 3D spaces.

sampling

Modules

This contains different probability based sampling modules - works with numpy datasets

ClusterSampling

RandomSampling

StratifiedSampling

SystematicSampling

misc

This contains very project specific modules which work only with this project scenario