ML Made Simple in 30 hours: Hour 8 Naive Bayes Algorithm

###Concept

Naive Bayes is a family of probabilistic algorithms based on Bayes' Theorem with the "naive" assumption of independence between every pair of features. Despite this strong assumption, Naive Bayes classifiers have performed surprisingly well in many real-world applications, particularly for text classification.

#### Types of Naive Bayes Classifiers

1. Gaussian Naive Bayes: Assumes that the features follow a normal distribution.

2. Multinomial Naive Bayes: Typically used for discrete data (e.g., text classification with word counts).

3. Bernoulli Naive Bayes: Used for binary/boolean features.

#### Implementation

Let's consider an example using Python and its libraries.

##### Example

Suppose we have a dataset that records features of different emails, such as word frequencies, to classify them as spam or not spam.

# Import necessary libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix,
     classification_report

# Example data
data = {
    'Feature1': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
    'Feature2': [5, 4, 3, 2, 1, 5, 4, 3, 2, 1],
    'Feature3': [1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
    'Spam': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
}

df = pd.DataFrame(data)

# Independent variables (features) and dependent variable (target)
X = df[['Feature1', 'Feature2', 'Feature3']]
y = df['Spam']


# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
                                   random_state=0)

# Creating and training the Multinomial Naive Bayes model
model = MultinomialNB()
model.fit(X_train, y_train)

# Making predictions
y_pred = model.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Classification Report:\n{class_report}")

Result

Accuracy: 1.0
Confusion Matrix:
[[1 0]
 [0 1]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2

#### Explanation of the Code

1. Libraries: We import necessary libraries like numpy, pandas, and sklearn.

2. Data Preparation: We create a DataFrame containing features (Feature1, Feature2, Feature3) and the target variable (Spam).

3. Feature and Target: We separate the features and the target variable.

4. Train-Test Split: We split the data into training and testing sets.

5. Model Training: We create a MultinomialNB model and train it using the training data.

6. Predictions: We use the trained model to predict whether the emails in the test set are spam.

7. Evaluation: We evaluate the model using accuracy, confusion matrix, and classification report.

#### Evaluation Metrics

- Accuracy: The proportion of correctly classified instances among the total instances.

- Confusion Matrix: Shows the counts of true positives, true negatives, false positives, and false negatives.

- Classification Report: Provides precision, recall, F1-score, and support for each class.

#### Applications

NaiveBayes classifiers are widely used for:

- Text Classification: Spam detection, sentiment analysis, and document categorization.

- Medical Diagnosis: Predicting diseases based on symptoms.

- Recommendation Systems: Recommending products or services based on user behavior.

Cracking the Data Science Interview

👇👇

https://topmate.io/analyst/1024129

Credits: t.me/datasciencefun

ENJOY LEARNING 👍👍

ML Made Simple in 30 hours

Friday, 3 January 2025

Hour 8 Naive Bayes Algorithm

No comments:

Post a Comment

Hour 30 Hyperparameter Optimization

Search This Blog