Skip to content
This repository has been archived by the owner on Jul 26, 2020. It is now read-only.

Latest commit

 

History

History
26 lines (25 loc) · 1.86 KB

README.md

File metadata and controls

26 lines (25 loc) · 1.86 KB

DataAnalysisProject

Anapy -- A simple package for data analysis, visualization and machine learning

This package can be installed using pip install .

Sub Packages

  1. datamanip

Modules

  • CentralValues: Returns a dictionary containing central values Mean, Median, Range, Variance, Standard Deviation and Quantile
  • DataOps: Functions to work with pandas dataframes-- (1)dataDep_csv(infile) returns a deduplicated dataframe; (2) dataFrameSplit(dataframe, no of records) splits a dataframe based on no of records needed
  • datasetSeparator: Useful for looking into pandas dataframe and do column manipulation like removal of columns, current functions -- (1)displayCols(dataframe) to display columns,(2) remCols(dataframe) to remove columns, (3) sep_data_target(dataframe) to separate data and target
  1. externals

Modules

  • LoadDataset: (1)load_pickle(filestr) Loads a pickled object, (2) data_target_separator(numpy array) data, target separator for numpy dataset. Assumes the last column contains the labels.
  1. mlops

Modules

  • learnfromsample: An important module which takes training set and test set as inputs with parameters such as sample size, sample methods and classifier. Scaler is optional. Returns test set true labels and predicted labels, training set true labels and predicted labels and Fitting time and Prediction time of the model under examination.
  • Visualization: Dimensionality reduction in order to visualize dataset in 2D and 3D spaces.
  1. sampling

Modules

This contains different probability based sampling modules - works with numpy datasets

  • ClusterSampling
  • RandomSampling
  • StratifiedSampling
  • SystematicSampling
  1. misc

This contains very project specific modules which work only with this project scenario