Back to blog
← View series: ibm ai engineering

~/blog

Project

Mar 31, 2026•7 min read•By Mohammed Vasim

AIMachine LearningLLMPyTorchTensorFlowGenerative AILangChainAI Agents

Skills Network Logo

Final Project: Classification with Python

Estimated Time Needed: 180 min

Instructions

In this notebook, you will practice all the classification algorithms that we have learned in this course.

Below, is where we are going to use the classification algorithms to create a model based on our training data and evaluate our testing data using evaluation metrics learned in the course.

We will use some of the algorithms taught in the course, specifically:

Linear Regression
KNN
Decision Trees
Logistic Regression
SVM

We will evaluate our models using:

Accuracy Score
Jaccard Index
F1-Score
LogLoss
Mean Absolute Error
Mean Squared Error
R2-Score

Finally, you will use your models to generate the report at the end.

About The Dataset

The original source of the data is Australian Government's Bureau of Meteorology and the latest data can be gathered from http://www.bom.gov.au/climate/dwo/.

The dataset to be used has extra columns like 'RainToday' and our target is 'RainTomorrow', which was gathered from the Rattle at https://bitbucket.org/kayontoga/rattle/src/master/data/weatherAUS.RData

This dataset contains observations of weather metrics for each day from 2008 to 2017. The weatherAUS.csv dataset includes the following fields:

Field	Description	Unit	Type
Date	Date of the Observation in YYYY-MM-DD	Date	object
Location	Location of the Observation	Location	object
MinTemp	Minimum temperature	Celsius	float
MaxTemp	Maximum temperature	Celsius	float
Rainfall	Amount of rainfall	Millimeters	float
Evaporation	Amount of evaporation	Millimeters	float
Sunshine	Amount of bright sunshine	hours	float
WindGustDir	Direction of the strongest gust	Compass Points	object
WindGustSpeed	Speed of the strongest gust	Kilometers/Hour	object
WindDir9am	Wind direction averaged of 10 minutes prior to 9am	Compass Points	object
WindDir3pm	Wind direction averaged of 10 minutes prior to 3pm	Compass Points	object
WindSpeed9am	Wind speed averaged of 10 minutes prior to 9am	Kilometers/Hour	float
WindSpeed3pm	Wind speed averaged of 10 minutes prior to 3pm	Kilometers/Hour	float
Humidity9am	Humidity at 9am	Percent	float
Humidity3pm	Humidity at 3pm	Percent	float
Pressure9am	Atmospheric pressure reduced to mean sea level at 9am	Hectopascal	float
Pressure3pm	Atmospheric pressure reduced to mean sea level at 3pm	Hectopascal	float
Cloud9am	Fraction of the sky obscured by cloud at 9am	Eights	float
Cloud3pm	Fraction of the sky obscured by cloud at 3pm	Eights	float
Temp9am	Temperature at 9am	Celsius	float
Temp3pm	Temperature at 3pm	Celsius	float
RainToday	If there was rain today	Yes/No	object
RainTomorrow	If there is rain tomorrow	Yes/No	float

Column definitions were gathered from http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml

Import the required libraries

python

# All Libraries required for this lab are listed below. The libraries pre-installed on Skills Network Labs are commented.
# !mamba install -qy pandas==1.3.4 numpy==1.21.4 seaborn==0.9.0 matplotlib==3.5.0 scikit-learn==0.20.1
# Note: If your environment doesn't support "!mamba install", use "!pip install"

python

# Surpress warnings:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

python

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import svm
from sklearn.metrics import jaccard_score
from sklearn.metrics import f1_score
from sklearn.metrics import log_loss
from sklearn.metrics import confusion_matrix, accuracy_score
import sklearn.metrics as metrics

Importing the Dataset

python

from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

python

path='https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillUp/labs/ML-FinalAssignment/Weather_Data.csv'

python

await download(path, "Weather_Data.csv")
filename ="Weather_Data.csv"

python

df = pd.read_csv("Weather_Data.csv")

Note: This version of the lab is designed for JupyterLite, which necessitates downloading the dataset to the interface. However, when working with the downloaded version of this notebook on your local machines (Jupyter Anaconda), you can simply skip the steps above of "Importing the Dataset" and use the URL directly in the pandas.read_csv() function. You can uncomment and run the statements in the cell below.

python

#filepath = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillUp/labs/ML-FinalAssignment/Weather_Data.csv"
#df = pd.read_csv(filepath)

python

df.head()

Data Preprocessing

One Hot Encoding

First, we need to perform one hot encoding to convert categorical variables to binary variables.

python

df_sydney_processed = pd.get_dummies(data=df, columns=['RainToday', 'WindGustDir', 'WindDir9am', 'WindDir3pm'])

Next, we replace the values of the 'RainTomorrow' column changing them from a categorical column to a binary column. We do not use the get_dummies method because we would end up with two columns for 'RainTomorrow' and we do not want, since 'RainTomorrow' is our target.

python

df_sydney_processed.replace(['No', 'Yes'], [0,1], inplace=True)

Training Data and Test Data

Now, we set our 'features' or x values and our Y or target variable.

python

df_sydney_processed.drop('Date',axis=1,inplace=True)

python

df_sydney_processed = df_sydney_processed.astype(float)

python

features = df_sydney_processed.drop(columns='RainTomorrow', axis=1)
Y = df_sydney_processed['RainTomorrow']

Linear Regression

Q1) Use the `train_test_split` function to split the `features` and `Y` dataframes with a `test_size` of `0.2` and the `random_state` set to `10`.

python

#Enter Your Code and Execute

python

x_train, x_test, y_train, y_test =

Q2) Create and train a Linear Regression model called LinearReg using the training data (`x_train`, `y_train`).

python

#Enter Your Code and Execute

python

LinearReg =

Q3) Now use the `predict` method on the testing data (`x_test`) and save it to the array `predictions`.

python

#Enter Your Code and Execute

python

predictions =

Q4) Using the `predictions` and the `y_test` dataframe calculate the value for each metric using the appropriate function.

python

#Enter Your Code and Execute

python

LinearRegression_MAE = 
LinearRegression_MSE = 
LinearRegression_R2 =

Q5) Show the MAE, MSE, and R2 in a tabular format using data frame for the linear model.

python

#Enter Your Code and Execute

python

Report =

KNN

Q6) Create and train a KNN model called KNN using the training data (`x_train`, `y_train`) with the `n_neighbors` parameter set to `4`.

python

#Enter Your Code and Execute

python

KNN =

Q7) Now use the `predict` method on the testing data (`x_test`) and save it to the array `predictions`.

python

#Enter Your Code and Execute

python

predictions =

Q8) Using the `predictions` and the `y_test` dataframe calculate the value for each metric using the appropriate function.

python

#Enter Your Code and Execute

python

KNN_Accuracy_Score = 
KNN_JaccardIndex = 
KNN_F1_Score =

Decision Tree

Q9) Create and train a Decision Tree model called Tree using the training data (`x_train`, `y_train`).

python

#Enter Your Code and Execute

python

Tree =

Q10) Now use the `predict` method on the testing data (`x_test`) and save it to the array `predictions`.

python

#Enter Your Code and Execute

python

predictions =

Q11) Using the `predictions` and the `y_test` dataframe calculate the value for each metric using the appropriate function.

python

#Enter Your Code and Execute

python

Tree_Accuracy_Score = 
Tree_JaccardIndex = 
Tree_F1_Score =

Logistic Regression

Q12) Use the `train_test_split` function to split the `features` and `Y` dataframes with a `test_size` of `0.2` and the `random_state` set to `1`.

python

#Enter Your Code and Execute

python

x_train, x_test, y_train, y_test =

Q13) Create and train a LogisticRegression model called LR using the training data (`x_train`, `y_train`) with the `solver` parameter set to `liblinear`.

python

#Enter Your Code and Execute

python

LR =

Q14) Now, use the `predict` and `predict_proba` methods on the testing data (`x_test`) and save it as 2 arrays `predictions` and `predict_proba`.

python

#Enter Your Code and Execute

python

predictions =

python

predict_proba =

Q15) Using the `predictions`, `predict_proba` and the `y_test` dataframe calculate the value for each metric using the appropriate function.

python

#Enter Your Code and Execute

python

LR_Accuracy_Score = 
LR_JaccardIndex = 
LR_F1_Score = 
LR_Log_Loss =

SVM

Q16) Create and train a SVM model called SVM using the training data (`x_train`, `y_train`).

python

#Enter Your Code and Execute

python

SVM =

Q17) Now use the `predict` method on the testing data (`x_test`) and save it to the array `predictions`.

python

#Enter Your Code and Execute

python

predictions =

Q18) Using the `predictions` and the `y_test` dataframe calculate the value for each metric using the appropriate function.

python

SVM_Accuracy_Score = 
SVM_JaccardIndex = 
SVM_F1_Score =

Report

Q19) Show the Accuracy,Jaccard Index,F1-Score and LogLoss in a tabular format using data frame for all of the above models.

*LogLoss is only for Logistic Regression Model

python

Report =

How to submit

Once you complete your notebook you will have to share it. You can download the notebook by navigating to "File" and clicking on "Download" button.

This will save the (.ipynb) file on your computer. Once saved, you can upload this file in the "My Submission" tab, of the "Peer-graded Assignment" section.

About the Authors:

Joseph Santarcangelo has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Other Contributors

Svitlana Kramar

Project

Final Project: Classification with Python

Table of Contents

Instructions

About The Dataset

Import the required libraries

Importing the Dataset

Data Preprocessing

One Hot Encoding

Training Data and Test Data

Linear Regression

Q1) Use the train_test_split function to split the features and Y dataframes with a test_size of 0.2 and the random_state set to 10.

Q2) Create and train a Linear Regression model called LinearReg using the training data (x_train, y_train).

Q3) Now use the predict method on the testing data (x_test) and save it to the array predictions.

Q4) Using the predictions and the y_test dataframe calculate the value for each metric using the appropriate function.

Q5) Show the MAE, MSE, and R2 in a tabular format using data frame for the linear model.

KNN

Q6) Create and train a KNN model called KNN using the training data (x_train, y_train) with the n_neighbors parameter set to 4.

Q7) Now use the predict method on the testing data (x_test) and save it to the array predictions.

Q8) Using the predictions and the y_test dataframe calculate the value for each metric using the appropriate function.

Decision Tree

Q9) Create and train a Decision Tree model called Tree using the training data (x_train, y_train).

Q10) Now use the predict method on the testing data (x_test) and save it to the array predictions.

Q11) Using the predictions and the y_test dataframe calculate the value for each metric using the appropriate function.

Logistic Regression

Q12) Use the train_test_split function to split the features and Y dataframes with a test_size of 0.2 and the random_state set to 1.

Q13) Create and train a LogisticRegression model called LR using the training data (x_train, y_train) with the solver parameter set to liblinear.

Q14) Now, use the predict and predict_proba methods on the testing data (x_test) and save it as 2 arrays predictions and predict_proba.

Q15) Using the predictions, predict_proba and the y_test dataframe calculate the value for each metric using the appropriate function.

SVM

Q16) Create and train a SVM model called SVM using the training data (x_train, y_train).

Q17) Now use the predict method on the testing data (x_test) and save it to the array predictions.

Q18) Using the predictions and the y_test dataframe calculate the value for each metric using the appropriate function.

Report

Q19) Show the Accuracy,Jaccard Index,F1-Score and LogLoss in a tabular format using data frame for all of the above models.

How to submit

About the Authors:

Other Contributors

© IBM Corporation 2020. All rights reserved.

Comments (0)

Leave a comment

Q1) Use the `train_test_split` function to split the `features` and `Y` dataframes with a `test_size` of `0.2` and the `random_state` set to `10`.

Q2) Create and train a Linear Regression model called LinearReg using the training data (`x_train`, `y_train`).

Q3) Now use the `predict` method on the testing data (`x_test`) and save it to the array `predictions`.

Q4) Using the `predictions` and the `y_test` dataframe calculate the value for each metric using the appropriate function.

Q6) Create and train a KNN model called KNN using the training data (`x_train`, `y_train`) with the `n_neighbors` parameter set to `4`.

Q7) Now use the `predict` method on the testing data (`x_test`) and save it to the array `predictions`.

Q8) Using the `predictions` and the `y_test` dataframe calculate the value for each metric using the appropriate function.

Q9) Create and train a Decision Tree model called Tree using the training data (`x_train`, `y_train`).

Q10) Now use the `predict` method on the testing data (`x_test`) and save it to the array `predictions`.

Q11) Using the `predictions` and the `y_test` dataframe calculate the value for each metric using the appropriate function.

Q12) Use the `train_test_split` function to split the `features` and `Y` dataframes with a `test_size` of `0.2` and the `random_state` set to `1`.

Q13) Create and train a LogisticRegression model called LR using the training data (`x_train`, `y_train`) with the `solver` parameter set to `liblinear`.

Q14) Now, use the `predict` and `predict_proba` methods on the testing data (`x_test`) and save it as 2 arrays `predictions` and `predict_proba`.

Q15) Using the `predictions`, `predict_proba` and the `y_test` dataframe calculate the value for each metric using the appropriate function.

Q16) Create and train a SVM model called SVM using the training data (`x_train`, `y_train`).

Q17) Now use the `predict` method on the testing data (`x_test`) and save it to the array `predictions`.

Q18) Using the `predictions` and the `y_test` dataframe calculate the value for each metric using the appropriate function.