Friday, 3 January 2025

Hour 12 Association Rule Learning

####Concept

Association rule learning is a rule-based machine learning method used to discover interesting relations between variables in large databases. It is widely used in market basket analysis to identify sets of products that frequently co-occur in transactions. The main goal is to find strong rules discovered in databases using some measures of interestingness.

#### Key Terms

- Support: The proportion of transactions in the dataset that contain a particular itemset.

- Confidence: The likelihood that a transaction containing an itemset A also contains an itemset B . 

- Lift: The ratio of the observed support to that expected if A and B  were independent. 

#### Algorithm

The most common algorithm for association rule learning is the Apriori algorithm. It operates in two steps:

1. Frequent Itemset Generation: Identify all itemsets whose support is greater than or equal to a specified minimum support threshold.

2. Rule Generation: From the frequent itemsets, generate high-confidence rules where confidence is greater than or equal to a specified minimum confidence threshold.

#### Implementation

Let's consider an example using Python and its libraries.

##### Example

Suppose we have a dataset of transactions, and we want to identify frequent itemsets and generate association rules.

# Import necessary libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Example data: list of transactions
data = {'TransactionID': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4],
        'Item': ['Milk', 'Bread', 'Butter', 'Bread', 'Butter', 'Milk',
'Bread', 'Eggs', 'Milk', 'Bread', 'Butter', 'Eggs']}

df = pd.DataFrame(data)
df = df.groupby(['TransactionID', 'Item'])
['Item'].count().unstack().reset_index().fillna(0).
set_index('TransactionID')
df = df.applymap(lambda x: 1 if x > 0 else 0)

# Applying the Apriori algorithm
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)

# Generating association rules
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)

print("Frequent Itemsets:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules)

Check Result(similar to this) 

Frequent Itemsets:
    support               itemsets
0      1.00                (Bread)
1      0.75               (Butter)
2      0.50                 (Eggs)
3      0.75                 (Milk)
4      0.75        (Butter, Bread)
5      0.50          (Eggs, Bread)
6      0.75          (Milk, Bread)
7      0.50         (Butter, Milk)
8      0.50           (Milk, Eggs)
9      0.50  (Butter, Milk, Bread)
10     0.50    (Milk, Eggs, Bread)

Association Rules:
      antecedents    consequents  antecedent support  consequent support  support  confidence      lift  representativity  leverage  conviction  zhangs_metric   jaccard  certainty  kulczynski
0        (Butter)        (Bread)                0.75                1.00     0.75        1.00  1.000000               1.0     0.000         inf            0.0  0.750000        0.0    0.875000
1         (Bread)       (Butter)                1.00                0.75     0.75        0.75  1.000000               1.0     0.000         1.0            0.0  0.750000        0.0    0.875000
2          (Eggs)        (Bread)                0.50                1.00     0.50        1.00  1.000000               1.0     0.000         inf            0.0  0.500000        0.0    0.750000
3          (Milk)        (Bread)                0.75                1.00     0.75        1.00  1.000000               1.0     0.000         inf            0.0  0.750000        0.0    0.875000
4         (Bread)         (Milk)                1.00                0.75     0.75        0.75  1.000000               1.0     0.000         1.0            0.0  0.750000        0.0    0.875000
5          (Eggs)         (Milk)                0.50                0.75     0.50        1.00  1.333333               1.0     0.125         inf            0.5  0.666667        1.0    0.833333
6  (Butter, Milk)        (Bread)                0.50                1.00     0.50        1.00  1.000000               1.0     0.000         inf            0.0  0.500000        0.0    0.750000
7    (Eggs, Milk)        (Bread)                0.50                1.00     0.50        1.00  1.000000               1.0     0.000         inf            0.0  0.500000        0.0    0.750000
8   (Eggs, Bread)         (Milk)                0.50                0.75     0.50        1.00  1.333333               1.0     0.125         inf            0.5  0.666667        1.0    0.833333
9          (Eggs)  (Milk, Bread)                0.50                0.75     0.50        1.00  1.333333               1.0     0.125         inf            0.5  0.666667        1.0    0.833333

#### Explanation of the Code

1. Libraries: We import necessary libraries like pandas and mlxtend.

2. Data Preparation: We create a transaction dataset and transform it into a format suitable for the Apriori algorithm, where each row represents a transaction and each column represents an item.

3. Apriori Algorithm: We apply the Apriori algorithm to find frequent itemsets with a minimum support of 0.5.

4. Association Rules: We generate association rules from the frequent itemsets with a minimum confidence of 0.7.

#### Evaluation Metrics

- Support: Measures the frequency of an itemset in the dataset.

- Confidence: Measures the reliability of the inference made by the rule.

- Lift: Measures the strength of the rule over random co-occurrence. Lift values greater than 1 indicate a strong association.

#### Applications

Association rule learning is widely used in:

- Market Basket Analysis: Identifying products frequently bought together to optimize store layouts and cross-selling strategies.

- Recommendation Systems: Recommending products or services based on customer purchase history.

- Healthcare: Discovering associations between medical conditions and treatments.

Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624

Credits: t.me/datasciencefun

ENJOY LEARNING 👍👍

No comments:

Post a Comment

Hour 30 Hyperparameter Optimization

#### Concept Hyperparameter optimization involves finding the best set of hyperparameters for a machine learning model to maximize its perfo...