ML Made Simple in 30 hours: Hour 12 Association Rule Learning

####Concept

Association rule learning is a rule-based machine learning method used to discover interesting relations between variables in large databases. It is widely used in market basket analysis to identify sets of products that frequently co-occur in transactions. The main goal is to find strong rules discovered in databases using some measures of interestingness.

#### Key Terms

- Support: The proportion of transactions in the dataset that contain a particular itemset.

- Confidence: The likelihood that a transaction containing an itemset A also contains an itemset B .

- Lift: The ratio of the observed support to that expected if A and B were independent.

#### Algorithm

The most common algorithm for association rule learning is the Apriori algorithm. It operates in two steps:

1. Frequent Itemset Generation: Identify all itemsets whose support is greater than or equal to a specified minimum support threshold.

2. Rule Generation: From the frequent itemsets, generate high-confidence rules where confidence is greater than or equal to a specified minimum confidence threshold.

#### Implementation

Let's consider an example using Python and its libraries.

##### Example

Suppose we have a dataset of transactions, and we want to identify frequent itemsets and generate association rules.

# Import necessary libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Example data: list of transactions
data = {'TransactionID': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4],
        'Item': ['Milk', 'Bread', 'Butter', 'Bread', 'Butter', 'Milk',
                 'Bread', 'Eggs', 'Milk', 'Bread', 'Butter', 'Eggs']}

df = pd.DataFrame(data)
df = df.groupby(['TransactionID', 'Item'])
                ['Item'].count().unstack().reset_index().fillna(0).
                  set_index('TransactionID')
df = df.applymap(lambda x: 1 if x > 0 else 0)

# Applying the Apriori algorithm
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)

# Generating association rules
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)

print("Frequent Itemsets:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules)

Check Result(similar to this)

Frequent Itemsets:
    support               itemsets
     1.00                (Bread)
     0.75               (Butter)
     0.50                 (Eggs)
     0.75                 (Milk)
     0.75        (Butter, Bread)
     0.50          (Eggs, Bread)
     0.75          (Milk, Bread)
     0.50         (Butter, Milk)
     0.50           (Milk, Eggs)
     0.50  (Butter, Milk, Bread)
    0.50    (Milk, Eggs, Bread)

Association Rules:
      antecedents    consequents  antecedent support  consequent support  support  confidence      lift  representativity  leverage  conviction  zhangs_metric   jaccard  certainty  kulczynski
       (Butter)        (Bread)                0.75                1.00     0.75        1.00  1.000000               1.0     0.000         inf            0.0  0.750000        0.0    0.875000
        (Bread)       (Butter)                1.00                0.75     0.75        0.75  1.000000               1.0     0.000         1.0            0.0  0.750000        0.0    0.875000
         (Eggs)        (Bread)                0.50                1.00     0.50        1.00  1.000000               1.0     0.000         inf            0.0  0.500000        0.0    0.750000
         (Milk)        (Bread)                0.75                1.00     0.75        1.00  1.000000               1.0     0.000         inf            0.0  0.750000        0.0    0.875000
        (Bread)         (Milk)                1.00                0.75     0.75        0.75  1.000000               1.0     0.000         1.0            0.0  0.750000        0.0    0.875000
         (Eggs)         (Milk)                0.50                0.75     0.50        1.00  1.333333               1.0     0.125         inf            0.5  0.666667        1.0    0.833333
 (Butter, Milk)        (Bread)                0.50                1.00     0.50        1.00  1.000000               1.0     0.000         inf            0.0  0.500000        0.0    0.750000
   (Eggs, Milk)        (Bread)                0.50                1.00     0.50        1.00  1.000000               1.0     0.000         inf            0.0  0.500000        0.0    0.750000
  (Eggs, Bread)         (Milk)                0.50                0.75     0.50        1.00  1.333333               1.0     0.125         inf            0.5  0.666667        1.0    0.833333
         (Eggs)  (Milk, Bread)                0.50                0.75     0.50        1.00  1.333333               1.0     0.125         inf            0.5  0.666667        1.0    0.833333

#### Explanation of the Code

1. Libraries: We import necessary libraries like pandas and mlxtend.

2. Data Preparation: We create a transaction dataset and transform it into a format suitable for the Apriori algorithm, where each row represents a transaction and each column represents an item.

3. Apriori Algorithm: We apply the Apriori algorithm to find frequent itemsets with a minimum support of 0.5.

4. Association Rules: We generate association rules from the frequent itemsets with a minimum confidence of 0.7.

#### Evaluation Metrics

- Support: Measures the frequency of an itemset in the dataset.

- Confidence: Measures the reliability of the inference made by the rule.

- Lift: Measures the strength of the rule over random co-occurrence. Lift values greater than 1 indicate a strong association.

#### Applications

Association rule learning is widely used in:

- Market Basket Analysis: Identifying products frequently bought together to optimize store layouts and cross-selling strategies.

- Recommendation Systems: Recommending products or services based on customer purchase history.

- Healthcare: Discovering associations between medical conditions and treatments.

Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624

Credits: t.me/datasciencefun

ENJOY LEARNING 👍👍

ML Made Simple in 30 hours

Friday, 3 January 2025

Hour 12 Association Rule Learning

No comments:

Post a Comment

Hour 30 Hyperparameter Optimization

Search This Blog