A Beginner Guide to Association Rules Visualization - Quality Measures

October 21, 2018 Jie Wang

3 minute read

This post shows you how to visualize association rules by using the R packages arules and aulesViz. In order to better understand the script, you may have already completed the following parts.

The Basket Data

In Part 2 Read Transaction Data, we have read the following five shopping baskets in a plain text file, into transactions of the Transactions class.

 f,a,c,d,g,l,m,p
 a,b,c,f,l,m,o
 b,f,h,j,o
 b,c,k,s,p
 a,f,c,e,l,p,m,n

In Part 3 Generate Itemsets,

we run arules::apriori with the parameter target set to frequent itemsets. By assigning values to the parameters support, and set minlen and maxlen equal to each other, the apriori function returns all itemsets of a specific length having the minimum support or above.

In Part 4 Generate Rules,

we run arules::apriori with the parameter target set to rules. By assigning values to the parameters support and confident, and set minlen to prune the rules of 1 item, the apriori function returns all the rules having at least 2 items which exceeds the confident threshold.

In this part, we visualize how these three quality measures are related. Do they tend to change in the same or opposite way?

Visualization 2: Support, Confidence, Lift

Generate Rules

The following script will return to rules, all the rules whose support is at least \(0.1\) and confidence is at least \(0.6\).

#rules having at least a confidence of 0.6
rules <- apriori(
  transactions, 
  parameter = list(support=0.1, confidence=0.6, target="rules")
)

For the script details, refer to Part 4 Generate Rules.

A Summary of the Rules

Run the summary with rules.

summary(rules)

## set of 1985 rules
## 
## rule length distribution (lhs + rhs):sizes
##   1   2   3   4   5   6   7   8 
##   7  78 299 554 570 346 115  16 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   4.000   5.000   4.602   5.000   8.000 
## 
## summary of quality measures:
##     support         confidence          lift            count      
##  Min.   :0.2000   Min.   :0.6000   Min.   :0.8333   Min.   :1.000  
##  1st Qu.:0.2000   1st Qu.:1.0000   1st Qu.:1.2500   1st Qu.:1.000  
##  Median :0.2000   Median :1.0000   Median :1.6667   Median :1.000  
##  Mean   :0.2288   Mean   :0.9909   Mean   :2.0421   Mean   :1.144  
##  3rd Qu.:0.2000   3rd Qu.:1.0000   3rd Qu.:1.6667   3rd Qu.:1.000  
##  Max.   :0.8000   Max.   :1.0000   Max.   :5.0000   Max.   :4.000  
## 
## mining info:
##          data ntransactions support confidence
##  transactions             5     0.1        0.6

The summary shows that

1985 rules with a length between 1 and 8 items
range of support: 0.2 - 0.8
range of confidence: 0.6 - 1.0
range of lift: 0.8 - 5

Scatterplot

library(arulesViz)
plot(rules)

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

The plot doesn't reveal much information because of the very small dataset having only five transactions. However, it shows that

With the same support, when lift is high, confidence is high.
With the same support, When lift is low, confidence doesn't show obvious linear relationship.

We can create the plot for a larger dataset, Groceries from the arules package.

The Groceries Data

data("Groceries")
#
rules <- apriori(
  Groceries, 
  parameter = list(support=0.001, confidence=0.6, target="rules")
)
plot(rules)

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

post

Home

Posts

Categories

Tags

Gleam

DoPython

DoR

Books

Contact

Recent Posts

Post

Setting up a Python Environment for Machine Learning and Data Science with Conda Virtual Environment and Jupyter Notebook in MacOS and Windows

Fully Remove Python and Install a Fresh Python in MacOS and Windows

The itertools and functools in Python

Developing R Packages using devtools