ML Workflow
- This module explains the basic Machine Learning workflow, focusing on data preparation and model training to build effective predictive models.
Data Preparation
Data preparation is the most important step in Machine Learning.
Almost 70% of the time is spent on data cleaning.Step 1: Data Collection
Data sources can be:
Excel file
Database
API
CSV file
Example (Student Dataset):
Step 2: Data Cleaning
Handle missing values
Remove duplicates
Fix incorrect data
Example:
Blank attendance → fill with average
Duplicate rows → remove
Step 3: Feature Selection
Not all columns are useful for the model.
Example:
Student ID → Not useful
Study Hours → Useful
Step 4: Feature Engineering
Creating new meaningful features from existing data.
Example:
Convert Attendance % into categories
Calculate Total Score
Step 5: Data Encoding
Machine Learning models cannot understand text.
Convert text into numbers.Example:
Convert to:
Pass = 1
Fail = 0
Step 6: Data Splitting
Divide dataset into two parts:
Training Data (70–80%)
Testing Data (20–30%)
Example:
If dataset has 1000 rows:
800 → Training
200 → Testing
Model Training
Now we train the machine using data.
Step 1: Select Algorithm
Choose algorithm based on problem type:
Regression → Linear Regression
Classification → Logistic Regression
Clustering → K-Means
Step 2: Train Model
The model learns patterns from training data.
Example:
The machine learns:
“More study hours → Higher chance of passing”Step 3: Model Testing
Use testing data to check whether the model predicts correctly or not.
Step 4: Evaluate Model
Regression Metrics:
MAE (Mean Absolute Error)
MSE (Mean Squared Error)
R² Score
Classification Metrics:
Accuracy
Precision
Recall
Confusion Matrix