Back to blog
← View series: ibm ai engineering

~/blog

League of Legends Match Predictor

Apr 1, 202610 min readBy Mohammed Vasim
AIMachine LearningLLMPyTorchTensorFlowGenerative AILangChainAI Agents

Final Project: League of Legends Match Predictor

Introduction 

League of Legends, a popular multiplayer online battle arena (MOBA) game, generates extensive data from matches, providing an excellent opportunity to apply machine learning techniques to real-world scenarios. Perform the following steps to build a logistic regression model aimed at predicting the outcomes of League of Legends matches. 

Use the league_of_legends_data_large.csv file to perform the tasks. 

Step 1: Data Loading and Preprocessing 

Task 1: Load the League of Legends dataset and preprocess it for training. 

Loading and preprocessing the dataset involves reading the data, splitting it into training and testing sets, and standardizing the features. You will utilize pandas for data manipulation, train_test_split from sklearn for data splitting, and StandardScaler for feature scaling. 

Note: Please ensure all the required libraries are installed and imported.

1 .Load the dataset: Use pd.read_csv() to load the dataset into a pandas DataFrame.
2. Split data into features and target: Separate win (target) and the remaining columns (features).
X = data.drop('win', axis=1)
y = data['win']
3 .Split the Data into Training and Testing Sets: Use train_test_split() from sklearn.model_selection to divide the data. Set test_size=0.2 to allocate 20% for testing and 80% for training, and use random_state=42 to ensure reproducibility of the split.
4. Standardize the features: Use StandardScaler() from sklearn.preprocessing to scale the features.
5. Convert to PyTorch tensors: Use torch.tensor() to convert the data to PyTorch tensors.

Exercise 1: 

Write a code to load the dataset, split it into training and testing sets, standardize the features, and convert the data into PyTorch tensors for use in training a PyTorch model. 

Setup

Installing required libraries:

The following required libraries are not pre-installed in the Skills Network Labs environment. You will need to run the following cell to install them:

python
# !pip install pandas
# !pip install scikit-learn
# !pip install torch
# !pip install matplotlib
python
## Task: 1

## Write your code here
import torch 
from torch import nn, optim 
from torch.utils.data import DataLoader, Dataset
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, confusion_matrix, classification_report, roc_curve, auc

Step 2: Logistic Regression Model 

Task 2: Implement a logistic regression model using PyTorch. 

Defining the logistic regression model involves specifying the input dimensions, the forward pass using the sigmoid activation function, and initializing the model, loss function, and optimizer. 

1 .Define the Logistic Regression Model:
Create a class LogisticRegressionModel that inherits from torch.nn.Module.

  • In the __init__() method, define a linear layer (nn.Linear) to implement the logistic regression model.
  • The forward() method should apply the sigmoid activation function to the output of the linear layer.

2.Initialize the Model, Loss Function, and Optimizer:

  • Set input_dim: Use X_train.shape[1] to get the number of features from the training data (X_train).
  • Initialize the model: Create an instance of the LogisticRegressionModel class (e.g., model = LogisticRegressionModel())while passing input_dim as a parameter
  • Loss Function: Use BCELoss() from torch.nn (Binary Cross-Entropy Loss).
  • Optimizer: Initialize the optimizer using optim.SGD() with a learning rate of 0.01

Exercise 2: 

Define the logistic regression model using PyTorch, specifying the input dimensions and the forward pass. Initialize the model, loss function, and optimizer. 

python
## Task: 2

## Write your code here
class LogisticRegressionModel(nn.Module):
    """   
    Logistic regression model 
    """
    def __init__(self, input_dim):
        super().__init__()
        self.linear = nn.Linear(input_dim, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        return self.sigmoid(self.linear(x))
python
dataset = pd.read_csv('/home/vasim/Khatir/IBM-AI-Engineering/14. Assignments/data/league_of_legends_data_large.csv')
dataset.head()
python
X = dataset.iloc[:, 1:]
y = dataset.iloc[:, 0]
X.shape, y.shape
python
scaler = StandardScaler()
X = scaler.fit_transform(X)
X[:5]
python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
python
model = LogisticRegressionModel(X_train.shape[1])
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
python
model.state_dict()

Step 3: Model Training 

Task 3: Train the logistic regression model on the dataset. 

The training loop will run for a specified number of epochs. In each epoch, the model makes predictions, calculates the loss, performs backpropagation, and updates the model parameters.

  1. Set Number of Epochs:

    • Define the number of epochs for training to 1000.
  2. Training Loop:
    For each epoch:

    • Set the model to training mode using model.train().
    • Zero the gradients using optimizer.zero_grad().
    • Pass the training data (X_train) through the model to get the predictions (outputs).
    • Calculate the loss using the defined loss function (criterion).
    • Perform backpropagation with loss.backward().
    • Update the model's weights using optimizer.step().
  3. Print Loss Every 100 Epochs:

    • After every 100 epochs, print the current epoch number and the loss value.
  4. Model Evaluation:

    • Set the model to evaluation mode using model.eval().
    • Use torch.no_grad() to ensure no gradients are calculated during evaluation.
    • Get predictions on both the training set (X_train) and the test set (X_test).
  5. Calculate Accuracy:

    • For both the training and test datasets, compute the accuracy by comparing the predicted values with the true values (y_train, y_test).
    • Use a threshold of 0.5 for classification
  6. Print Accuracy:

    • Print the training and test accuracies after the evaluation is complete.

Exercise 3: 

Write the code to train the logistic regression model on the dataset. Implement the training loop, making predictions, calculating the loss, performing backpropagation, and updating model parameters. Evaluate the model's accuracy on training and testing sets. 

python
# Write your code here

Step 4: Model Optimization and Evaluation 

Task 4: Implement optimization techniques and evaluate the model's performance. 

Optimization techniques such as L2 regularization (Ridge Regression) help in preventing overfitting. The model is retrained with these optimizations, and its performance is evaluated on both training and testing sets. 

Weight Decay :In the context of machine learning and specifically in optimization algorithms, weight_decay is a parameter used to apply L2 regularization to the model's parameters (weights). It helps prevent the model from overfitting by penalizing large weight values, thereby encouraging the model to find simpler solutions.To use L2 regularization, you need to modify the optimizer by setting the weight_decay parameter. The weight_decay parameter in the optimizer adds the L2 regularization term during training. For example, when you initialize the optimizer with optim.SGD(model.parameters(), lr=0.01, weight_decay=0.01), the weight_decay=0.01 term applies L2 regularization with a strength of 0.01.

  1. Set Up the Optimizer with L2 Regularization:

    • Modify the optimizer to include weight_decay for L2 regularization.
    • Example:
      python
      optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=0.01)
  2. Train the Model with L2 Regularization:

    • Follow the same steps as before but use the updated optimizer with regularization during training.
    • Use epochs=1000
  3. Evaluate the Optimized Model:

    • After training, evaluate the model on both the training and test datasets.
    • Compute the accuracy for both sets by comparing the model's predictions to the true labels (y_train and y_test).
  4. Calculate and Print the Accuracy:

    • Use a threshold of 0.5 to determine whether the model's predictions are class 0 or class 1.
    • Print the training accuracy and test accuracy after evaluation.

Exercise 4: 

Implement optimization techniques like L2 regularization and retrain the model. Evaluate the performance of the optimized model on both training and testing sets. 

python
## Write your code here

Step 5: Visualization and Interpretation 

Visualization tools like confusion matrices and ROC curves provide insights into the model's performance. The confusion matrix helps in understanding the classification accuracy, while the ROC curve illustrates the trade-off between sensitivity and specificity.

Confusion Matrix : A Confusion Matrix is a fundamental tool used in classification problems to evaluate the performance of a model. It provides a matrix showing the number of correct and incorrect predictions made by the model, categorized by the actual and predicted classes. Where

  • True Positive (TP): Correctly predicted positive class (class 1).
  • True Negative (TN): Correctly predicted negative class (class 0).
  • False Positive (FP): Incorrectly predicted as positive (class 1), but the actual class is negative (class 0). This is also called a Type I error.
  • False Negative (FN): Incorrectly predicted as negative (class 0), but the actual class is positive (class 1). This is also called a Type II error.

ROC Curve (Receiver Operating Characteristic Curve): The ROC Curve is a graphical representation used to evaluate the performance of a binary classification model across all classification thresholds. It plots two metrics:

  • True Positive Rate (TPR) or Recall (Sensitivity)-It is the proportion of actual positive instances (class 1) that were correctly classified as positive by the model.
  • False Positive Rate (FPR)-It is the proportion of actual negative instances (class 0) that were incorrectly classified as positive by the model.

AUC: AUC stands for Area Under the Curve and is a performance metric used to evaluate the quality of a binary classification model. Specifically, it refers to the area under the ROC curve (Receiver Operating Characteristic curve), which plots the True Positive Rate (TPR) versus the False Positive Rate (FPR) for different threshold values.

Classification Report: A Classification Report is a summary of various classification metrics, which are useful for evaluating the performance of a classifier on the given dataset.

Exercise 5: 

Write code to visualize the model's performance using confusion matrices and ROC curves. Generate classification reports to evaluate precision, recall, and F1-score. Retrain the model with L2 regularization and evaluate the performance.

python
## Write your code here

Double-click here for the Hint.

Step 6: Model Saving and Loading 

Task 6: Save and load the trained model. 

This task demonstrates the techniques to persist a trained model using torch.save and reload it using torch.load. Evaluating the loaded model ensures that it retains its performance, making it practical for deployment in real-world applications. 

  1. Saving the Model:
  • Save the model's learned weights and biases using torch.save().( e.g. , torch.save(model.state_dict(), 'your_model_name.pth'))
  • Saving only the state dictionary (model parameters) is preferred because it’s more flexible and efficient than saving the entire model object.
  1. Loading the Model:
  • Create a new model instance (e.g., model = LogisticRegressionModel()) and load the saved parameters. ( e.g. , model.load_state_dict(torch.load('your_model_name.pth')))`.
  1. Evaluating the Loaded Model:
    • After loading, set the model to evaluation mode by calling `model.eval()
    • After loading the model, evaluate it again on the test dataset to make sure it performs similarly to when it was first trained..Now evaluate it on the test data.
    • Use torch.no_grad() to ensure that no gradients are computed.

Exercise 6: 

Write code to save the trained model and reload it. Ensure the loaded model performs consistently by evaluating it on the test dataset. 

python
## Write your code here
# Save the model


# Load the model



# Ensure the loaded model is in evaluation mode



# Evaluate the loaded model

Step 7: Hyperparameter Tuning 

Task 7: Perform hyperparameter tuning to find the best learning rate. 

By testing different learning rates, you will identify the optimal rate that provides the best test accuracy. This fine-tuning is crucial for enhancing model performance . 

  1. Define Learning Rates:

    • Choose these learning rates to test ,[0.01, 0.05, 0.1]
  2. Reinitialize the Model for Each Learning Rate:

  • For each learning rate, you’ll need to reinitialize the model and optimizer e.g.(torch.optim.SGD(model.parameters(), lr=lr)).
  • Each new learning rate requires reinitializing the model since the optimizer and its parameters are linked to the learning rate.
  1. Train the Model for Each Learning Rate:
  • Train the model for a fixed number of epochs (e.g., 50 or 100 epochs) for each learning rate, and compute the accuracy on the test set.
  • Track the test accuracy for each learning rate and identify which one yields the best performance.
  1. Evaluate and Compare:
  • After training with each learning rate, compare the test accuracy for each configuration.
  • Report the learning rate that gives the highest test accuracy

Exercise 7: 

Perform hyperparameter tuning to find the best learning rate. Retrain the model for each learning rate and evaluate its performance to identify the optimal rate. 

python
## Write your code here

Step 8: Feature Importance 

Task 8: Evaluate feature importance to understand the impact of each feature on the prediction. 

The code to evaluate feature importance to understand the impact of each feature on the prediction.

1.Extracting Model Weights:

  • The weights of the logistic regression model represent the importance of each feature in making predictions. These weights are stored in the model's linear layer (model.linear.weight).
  • You can extract the weights using model.linear.weight.data.numpy() and flatten the resulting tensor to get a 1D array of feature importances.

2.Creating a DataFrame:

  • Create a pandas DataFrame with two columns: one for the feature names and the other for their corresponding importance values (i.e., the learned weights).
  • Ensure the features are aligned with their names in your dataset (e.g., `X_train.columns).
  1. Sorting and Plotting Feature Importance:
  • Sort the features based on the absolute value of their importance (weights) to identify the most impactful features.
  • Use a bar plot (via matplotlib) to visualize the sorted feature importances, with the feature names on the y-axis and importance values on the x-axis.
  1. Interpreting the Results:
  • Larger absolute weights indicate more influential features. Positive weights suggest a positive correlation with the outcome (likely to predict the positive class), while negative weights suggest the opposite.

Exercise 8: 

Evaluate feature importance by extracting the weights of the linear layer and creating a DataFrame to display the importance of each feature. Visualize the feature importance using a bar plot. 

python
## Write your code here

import pandas as pd
import matplotlib.pyplot as plt

# Extract the weights of the linear layer
## Write your code here

# Create a DataFrame for feature importance
## Write your code here

Double-click here for the Hint

Conclusion: 

Congratulations on completing the project! In this final project, you built a logistic regression model to predict the outcomes of League of Legends matches based on various in-game statistics. This comprehensive project involved several key steps, including data loading and preprocessing, model implementation, training, optimization, evaluation, visualization, model saving and loading, hyperparameter tuning, and feature importance analysis. This project provided hands-on experience with the complete workflow of developing a machine learning model for binary classification tasks using PyTorch.

© Copyright IBM Corporation. All rights reserved.

python
undefined

Comments (0)

No comments yet. Be the first to comment!

Leave a comment