FindingData

DashBoard - DashB.ai πŸ“ˆ πŸ“‰

DashBoard - DashB.ai πŸ“ˆ πŸ“‰

A No code machine learning platform

sumit

πŸ“‘ Overview

  • This is a web app that automates the data preprocessing pipeline.Target is to automate the whole machine learning pipeline.But this project is final till data preprocessing pipeline.
  • Currently this project is in developement phase.
  • User can upload comma seperated value files or directly fetch the data from mysql database.(Make sure mysql is installed in your system).
  • User's have all the command what to perform and what to not so selected operations can be passed to the pipeline to showcase the result.
  • User's can visualize the data using dataviz tool comes along with Dash.ai which can visualize the data without writing any code. (Made by Dash by plotly)

main page

βš’ Built With

sumit sumit sumit sumit sumit sumit sumit sumit



scikit learn plotly Dash bootstrap

🟒 Getting Started

To get a local copy up and running follow these simple steps. make sure git is installed in yout machine.

Installation

  1. Clone the repo
git clone https://github.com/IMsumitkumar/No-code-ML-platform-DashB.ai
  1. create a virtual env and activate
conda create -n <env_name> python=3.7
conda activate <env_name>
  1. Install dependencies
inside-your-local-repository
pip install -r requirements.txt

RUN

STEP 1 : Migrate the databse tables and create superuser

python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser
username : *****
email : *****
password : ******

STEP 2

python manage.py runserver

STEP 3 : OPTIONAL

For email recovery you have to set our credentials in DashB -> settings.py

Set your email and password in DashB/settings.py

Preprocessing Pipeline Tree

β”œβ”€β”€ Handle Datatypes
β”‚   β”œβ”€β”€ Drop unnecessary features.
β”‚   β”œβ”€β”€ replace inf with NaN.
β”‚   β”œβ”€β”€ Make sure all the column names are of string type and clean them.
β”‚   β”œβ”€β”€ Remove the column if target column has NaN.
β”‚ β”œβ”€β”€ Remove Duplicate columns
β”‚ β”œβ”€β”€ handle numerical, catergorical and time features.
β”‚   └── Try to determine Ml usecase and encode.
β”œβ”€β”€ Handle Missing Values
β”‚   β”œβ”€β”€β”€β”€β”€β”€ Numerical Features
β”‚   β”œβ”€β”€ Replace with mean.
β”‚   β”œβ”€β”€ Replace with median.
β”‚   β”œβ”€β”€ Repalce with Mode.
β”‚ β”œβ”€β”€ Replace with standard deviation.
β”‚ β”œβ”€β”€ Replace with zero.
β”‚   β”œβ”€β”€β”€β”€β”€β”€ Categorical Features
β”‚   β”œβ”€β”€ Replace with mean.
β”‚   β”œβ”€β”€ Replace with "Missing".
β”‚ └── Repalce with Most frequent value.
β”œβ”€β”€ Removing zero and near zero variance columns
β”‚   β”œβ”€β”€ Eliminate the features that have zero varinace,
β”‚   └── Eliminate the features that have near zero variace.
β”œβ”€β”€ Group Similiar Features
β”‚   └── Group more than two features Make new features with them.
β”œβ”€β”€ Normalization and Transformation
β”‚   β”œβ”€β”€β”€β”€β”€β”€ Operations to apply only on numerical features
β”‚   β”œβ”€β”€ ZScore
β”‚   β”œβ”€β”€ MinMax
β”‚   β”œβ”€β”€ Quantile
β”‚ β”œβ”€β”€ MaxAbs
β”‚   β”œβ”€β”€ Yeo-Johnson
β”‚   β”œβ”€β”€β”€β”€β”€β”€ Target t7ransformation (regression)
β”‚   β”œβ”€β”€ Box-Cox
β”‚   └── Yeo-Johnson
β”œβ”€β”€ Making Time Features
β”‚   β”œβ”€β”€ Take a time feature and extract more features from it
β”‚   └── (Day, Month, Year, Hour, Minute, Second, Quantile, Quarter, Day of week, week day name, day of year, week of year )
β”œβ”€β”€ Feature Encoding
β”‚   β”œβ”€β”€β”€β”€β”€β”€ Ordinal Encoding
β”‚   β”œβ”€β”€ LabelEncoding
β”‚ β”œβ”€β”€ Target Guided ordinal encoding
β”‚   β”œβ”€β”€β”€β”€β”€β”€ One hot encoding
β”‚   β”œβ”€β”€ KDD orange
β”‚   β”œβ”€β”€ Mean Encoding
β”‚ └── Counter/frequency encoding
β”œβ”€β”€ Removing Outliers
β”‚   β”œβ”€β”€ Isolaton Forest
β”‚   β”œβ”€β”€ KNN
β”‚   β”œβ”€β”€ PCA
β”‚   └── Elliptical envelope
β”œβ”€β”€ Feature Selection
β”‚   β”œβ”€β”€ Chi squared (Not working perfectly)
β”‚   β”œβ”€β”€ RFE (Not working on all the data)
β”‚   β”œβ”€β”€ Lasso (works perfectly)
β”‚   β”œβ”€β”€ Random Forest
β”‚   β”œβ”€β”€ lgbm (works perfectly)
β”‚   └── Remove zero variance features
β”œβ”€β”€ Imbalance Dataset (Not done yet)
β”‚   β”œβ”€β”€ Ensemble techniques automatically handles imblance dataset
β”‚   β”œβ”€β”€ Undersampling (Not a good idea)
β”‚   β”œβ”€β”€ Oversampling
β”‚   β”œβ”€β”€ SMOTE
β”‚   └── Isolation Forest
└──NExt Step

Directory Tree

β”œβ”€β”€ accounts
β”‚   └─────────── # handles login, signup and password recovery.
β”œβ”€β”€ DashB
β”‚   └─────────── # main folder contains wsgi, routing, settings and urls.
β”œβ”€β”€ data
β”‚   └─────────── # main folder for performing pipeline.
β”œβ”€β”€ Viz
β”‚   └─────────── # project app for data visualizatio tool.
β”œβ”€β”€ static
β”‚   └─────────── # contains static files.
β”œβ”€β”€ media
β”‚   └─────────── # storage folder of uploaded media.
β”œβ”€β”€ templates
β”‚   └─────────── # contains landing page templates
β”œβ”€β”€ manage.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
└── db.sqlite3

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project https://github.com/IMsumitkumar/No-code-ML-platform-DashB.ai/tree/main/DashB
  2. Create your Feature Branch git checkout -b feature/AmazingFeature
  3. Commit your Changes git commit -m 'Add some AmazingFeature'
  4. Push to the Branch git push origin feature/AmazingFeature
  5. Open a Pull Request




GitHub issues GitHub forks GitHub stars GitHub license

Edit this page on GitHub