ML Made Simple in 30 hours: Hour16 LightGBM (Light Gradient Boosting Machine)

#### Concept

LightGBM (Light Gradient Boosting Machine) is a gradient boosting framework that uses tree-based learning algorithms. It is designed to be efficient and scalable, offering faster training speeds and higher efficiency compared to other gradient boosting algorithms. LightGBM handles large-scale data and offers better accuracy while consuming less memory.

#### Key Features of LightGBM

1. Leaf-Wise Tree Growth: Unlike level-wise growth used by other algorithms, LightGBM grows trees leaf-wise, focusing on the leaves with the maximum loss reduction.

2. Histogram-Based Decision Tree: Uses a histogram-based algorithm to speed up training and reduce memory usage.

3. Categorical Feature Support: Efficiently handles categorical features without needing to preprocess them.

4. Optimal Split for Missing Values: Automatically handles missing values and determines the optimal split for them.

#### Key Steps

1. Define the Objective Function: The loss function to be minimized.

2. Compute Gradients: Calculate the gradients of the loss function.

3. Fit the Trees: Train decision trees to predict the gradients.

4. Update the Model: Combine the predictions of all trees to make the final prediction.

#### Implementation

Let's implement LightGBM using the same Breast Cancer dataset for consistency.

##### Example

##### Example

# Import necessary libraries

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix,
     classification_report
import lightgbm as lgb

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test 
       = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the LightGBM model
train_data = lgb.Dataset(X_train, label=y_train)
params = {
    'objective': 'binary',
    'boosting_type': 'gbdt',
    'metric': 'binary_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}

# Train the model
model = lgb.train(params, train_data, num_boost_round=100)
# Making predictions

y_pred = model.predict(X_test)
y_pred_binary = [1 if x > 0.5 else 0 for x in y_pred]


# Evaluating the model
accuracy = accuracy_score(y_test, y_pred_binary)
conf_matrix = confusion_matrix(y_test, y_pred_binary)
class_report = classification_report(y_test, y_pred_binary)


print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Classification Report:\n{class_report}")

Results

Accuracy: 0.9736842105263158
Confusion Matrix:
[[41  2]
 [ 1 70]]
Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

#### Explanation of the Code

1. Libraries: We import necessary libraries like numpy, pandas, sklearn, and lightgbm.

2. Data Preparation: We load the Breast Cancer dataset with features and the target variable (malignant or benign).

3. Train-Test Split: We split the data into training and testing sets.

4. Model Training: We create a LightGBM dataset and set the parameters for the model.

5. Predictions: We use the trained LightGBM model to predict the labels for the test set.

6. Evaluation:

- Accuracy: Measures the proportion of correctly classified instances.

- Confusion Matrix: Shows the counts of true positive, true negative, false positive, and false negative predictions.

- Classification Report: Provides precision, recall, F1-score, and support for each class.

print(f"Accuracy: {accuracy}")

print(f"Confusion Matrix:\n{conf_matrix}")

print(f"Classification Report:\n{class_report}")

#### Applications

LightGBM is widely used in various fields such as:

- Finance: Fraud detection, credit scoring.

- Healthcare: Disease prediction, patient risk stratification.

- Marketing: Customer segmentation, churn prediction.

- Sports: Player performance prediction, match outcome prediction.

Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624

ENJOY LEARNING 👍👍

ML Made Simple in 30 hours

Friday, 3 January 2025

Hour16 LightGBM (Light Gradient Boosting Machine)

No comments:

Post a Comment

Hour 30 Hyperparameter Optimization

Search This Blog