This post shows you how to visualize association rules by using the R packages arules and aulesViz. In order to better understand the script, you may have already completed the following parts.
- Part 1 Transactions Class in arules
- Part 2 Read Transaction Data
- Part 3 Generate Itemsets
- Part 4 Generate Rules
The Basket Data
In Part 2 Read Transaction Data, we have read the following five shopping baskets in a plain text file, into transactions
of the Transactions class.
f,a,c,d,g,l,m,p
a,b,c,f,l,m,o
b,f,h,j,o
b,c,k,s,p
a,f,c,e,l,p,m,n
we run arules::apriori
with the parameter target
set to frequent itemsets
. By assigning values to the parameters support
, and set minlen
and maxlen
equal to each other, the apriori
function returns all itemsets of a specific length having the minimum support or above.
we run arules::apriori
with the parameter target
set to rules
. By assigning values to the parameters support
and confident
, and set minlen
to prune the rules of 1 item, the apriori
function returns all the rules having at least 2 items which exceeds the confident threshold.
In this part, we visualize how these three quality measures are related. Do they tend to change in the same or opposite way?
Visualization 2: Support, Confidence, Lift
Generate Rules
The following script will return to rules
, all the rules whose support is at least \(0.1\) and confidence is at least \(0.6\).
#rules having at least a confidence of 0.6
rules <- apriori(
transactions,
parameter = list(support=0.1, confidence=0.6, target="rules")
)
For the script details, refer to Part 4 Generate Rules.
A Summary of the Rules
Run the summary with rules.
summary(rules)
## set of 1985 rules
##
## rule length distribution (lhs + rhs):sizes
## 1 2 3 4 5 6 7 8
## 7 78 299 554 570 346 115 16
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 4.000 5.000 4.602 5.000 8.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.2000 Min. :0.6000 Min. :0.8333 Min. :1.000
## 1st Qu.:0.2000 1st Qu.:1.0000 1st Qu.:1.2500 1st Qu.:1.000
## Median :0.2000 Median :1.0000 Median :1.6667 Median :1.000
## Mean :0.2288 Mean :0.9909 Mean :2.0421 Mean :1.144
## 3rd Qu.:0.2000 3rd Qu.:1.0000 3rd Qu.:1.6667 3rd Qu.:1.000
## Max. :0.8000 Max. :1.0000 Max. :5.0000 Max. :4.000
##
## mining info:
## data ntransactions support confidence
## transactions 5 0.1 0.6
The summary shows that
- 1985 rules with a length between 1 and 8 items
- range of support: 0.2 - 0.8
- range of confidence: 0.6 - 1.0
- range of lift: 0.8 - 5
Scatterplot
library(arulesViz)
plot(rules)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
The plot doesn't reveal much information because of the very small dataset having only five transactions. However, it shows that
- With the same support, when lift is high, confidence is high.
- With the same support, When lift is low, confidence doesn't show obvious linear relationship.
We can create the plot for a larger dataset, Groceries
from the arules
package.
The Groceries Data
data("Groceries")
#
rules <- apriori(
Groceries,
parameter = list(support=0.001, confidence=0.6, target="rules")
)
plot(rules)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
Share this post
Twitter
Facebook
LinkedIn
Email