FindingData

DashBoard - DashB.ai 📈 📉

A No code machine learning platform

📑 Overview

This is a web app that automates the data preprocessing pipeline.Target is to automate the whole machine learning pipeline.But this project is final till data preprocessing pipeline.
Currently this project is in developement phase.
User can upload comma seperated value files or directly fetch the data from mysql database.(Make sure mysql is installed in your system).
User's have all the command what to perform and what to not so selected operations can be passed to the pipeline to showcase the result.
User's can visualize the data using dataviz tool comes along with Dash.ai which can visualize the data without writing any code. (Made by Dash by plotly)

main page

⚒ Built With

scikit learn plotly Dash bootstrap

🟢 Getting Started

To get a local copy up and running follow these simple steps. make sure git is installed in yout machine.

Installation

Clone the repo

git clone https://github.com/IMsumitkumar/No-code-ML-platform-DashB.ai

create a virtual env and activate

conda create -n <env_name> python=3.7
conda activate <env_name>

Install dependencies

inside-your-local-repository

pip install -r requirements.txt

RUN

STEP 1 : Migrate the databse tables and create superuser

python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser
    username : *****
    email    : *****
    password : ******

STEP 2

python manage.py runserver

STEP 3 : OPTIONAL

For email recovery you have to set our credentials in DashB -> settings.py

Set your email and password in DashB/settings.py

Preprocessing Pipeline Tree

├── Handle Datatypes
│   ├── Drop unnecessary features.
│   ├── replace inf with NaN.
│   ├── Make sure all the column names are of string type and clean them.
│   ├── Remove the column if target column has NaN.
│   ├── Remove Duplicate columns
│   ├── handle numerical, catergorical and time features.
│   └── Try to determine Ml usecase and encode.
├── Handle Missing Values
│   ├────── Numerical Features
│   ├── Replace with mean.
│   ├── Replace with median.
│   ├── Repalce with Mode.
│   ├── Replace with standard deviation.
│   ├── Replace with zero.
│   ├────── Categorical Features
│   ├── Replace with mean.
│   ├── Replace with "Missing".
│   └── Repalce with Most frequent value.
├── Removing zero and near zero variance columns
│   ├── Eliminate the features that have zero varinace,
│   └── Eliminate the features that have near zero variace.
├── Group Similiar Features
│   └── Group more than two features Make new features with them.
├── Normalization and Transformation
│   ├────── Operations to apply only on numerical features
│   ├── ZScore
│   ├── MinMax
│   ├── Quantile
│   ├── MaxAbs
│   ├── Yeo-Johnson
│   ├────── Target t7ransformation (regression)
│   ├── Box-Cox
│   └── Yeo-Johnson
├── Making Time Features
│   ├── Take a time feature and extract more features from it
│   └── (Day, Month, Year, Hour, Minute, Second, Quantile, Quarter, Day of week, week day name, day of year, week of year )
├── Feature Encoding
│   ├────── Ordinal Encoding
│   ├── LabelEncoding
│   ├── Target Guided ordinal encoding
│   ├────── One hot encoding
│   ├── KDD orange
│   ├── Mean Encoding
│   └── Counter/frequency encoding
├── Removing Outliers
│   ├── Isolaton Forest
│   ├── KNN
│   ├── PCA
│   └── Elliptical envelope
├── Feature Selection
│   ├── Chi squared (Not working perfectly)
│   ├── RFE (Not working on all the data)
│   ├── Lasso (works perfectly)
│   ├── Random Forest
│   ├── lgbm (works perfectly)
│   └── Remove zero variance features
├── Imbalance Dataset (Not done yet)
│   ├── Ensemble techniques automatically handles imblance dataset
│   ├── Undersampling (Not a good idea)
│   ├── Oversampling 
│   ├── SMOTE
│   └── Isolation Forest
└──NExt Step

Directory Tree

├── accounts 
│   └─────────── # handles login, signup and password recovery. 
├── DashB
│   └─────────── # main folder contains wsgi, routing, settings and urls.
├── data
│   └─────────── # main folder for performing pipeline.
├── Viz
│   └─────────── # project app for data visualizatio tool.
├── static
│   └─────────── # contains static files.
├── media
│   └─────────── # storage folder of uploaded media.
├── templates
│   └─────────── # contains landing page templates
├── manage.py
├── requirements.txt
├── LICENSE
├── README.md
└── db.sqlite3

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project https://github.com/IMsumitkumar/No-code-ML-platform-DashB.ai/tree/main/DashB
Create your Feature Branch git checkout -b feature/AmazingFeature
Commit your Changes git commit -m 'Add some AmazingFeature'
Push to the Branch git push origin feature/AmazingFeature
Open a Pull Request