❮ Previous

Next ❯

ML Workflow

This module explains the basic Machine Learning workflow, focusing on data preparation and model training to build effective predictive models.

Data Preparation
Data preparation is the most important step in Machine Learning.
Almost 70% of the time is spent on data cleaning.
Step 1: Data Collection
Data sources can be:
- Excel file
- Database
- API
- CSV file
Example (Student Dataset):
Study Hours
Attendance
Result
2
60%
Fail
5
85%
Pass
Step 2: Data Cleaning
- Handle missing values
- Remove duplicates
- Fix incorrect data
Example:
- Blank attendance → fill with average
- Duplicate rows → remove
Step 3: Feature Selection
Not all columns are useful for the model.
Example:
- Student ID → Not useful
- Study Hours → Useful
Step 4: Feature Engineering
Creating new meaningful features from existing data.
Example:
- Convert Attendance % into categories
- Calculate Total Score
Step 5: Data Encoding
Machine Learning models cannot understand text.
Convert text into numbers.
Example:
Result
Pass
Fail

Convert to:
- Pass = 1
- Fail = 0
Step 6: Data Splitting
Divide dataset into two parts:
- Training Data (70–80%)
- Testing Data (20–30%)
Example:
If dataset has 1000 rows:
- 800 → Training
- 200 → Testing
Model Training
Now we train the machine using data.
Step 1: Select Algorithm
Choose algorithm based on problem type:
- Regression → Linear Regression
- Classification → Logistic Regression
- Clustering → K-Means
Step 2: Train Model
The model learns patterns from training data.
Example:
The machine learns:
“More study hours → Higher chance of passing”
Step 3: Model Testing
Use testing data to check whether the model predicts correctly or not.
Step 4: Evaluate Model
Regression Metrics:
- MAE (Mean Absolute Error)
- MSE (Mean Squared Error)
- R² Score
Classification Metrics:
- Accuracy
- Precision
- Recall
- Confusion Matrix

❮ Previous

Next ❯

Study Hours	Attendance	Result
2	60%	Fail
5	85%	Pass

Data Preparation

Step 1: Data Collection

Example (Student Dataset):

Step 2: Data Cleaning

Example:

Step 3: Feature Selection

Step 4: Feature Engineering

Step 5: Data Encoding

Step 6: Data Splitting

Example:

Model Training

Step 1: Select Algorithm

Step 2: Train Model

Step 3: Model Testing

Step 4: Evaluate Model

Regression Metrics:

Classification Metrics:

Login

Create Account