A Guide to Association Rules in R - Part 4 Rule Generation in arules

October 20, 2018 Jie Wang

4 minute read

This is Part 4 to show you how to generate confident association rules by using the R packages arules and aulesViz. In order to test the script, you must have already completed the following parts.

The Basket Data

In [Part 2]( {{site.url}}{{site.baseurl}}{% post_url 2018-10-15-association-rule-read-transactions %} ), we have read the following five shopping baskets into transactions of the Transactions class.

 f,a,c,d,g,l,m,p
 a,b,c,f,l,m,o
 b,f,h,j,o
 b,c,k,s,p
 a,f,c,e,l,p,m,n

In Part 3 Generate Itemsets,

we run arules::apriori with the parameter target set to frequent itemsets. By assigning values to the paramters support, and set minlen and maxlen equal to each other, the apriori function returns all itemsets of a specific length having the minimum support or above.

In this part, we will generate association rules for a given threshold of a selected measure. The measure evaluates how certain or strong a rule occurs. The measures include confidence, lift and leverage.

To generate the association rules, run the same function, arules::apriori, with a different set of parameters.

Generate the association rules

Parameters

To find the strong association rules, passing values to the following parameters:

support: minimum support
confidence: minimum confidence
target: rules

The following script will return to rules, rules whose support is at least 0.5 and confidence is at least 0.6.

#rules having at least a confidence of 0.6
rules <- apriori(
  transactions, 
  parameter = list(support=0.5, confidence=0.6, target="rules")
)

A Summary of the Rules

To display a summary of the confident rules, run summary with rules. The summary shows number of rules, rule length, ranges of support and lift.

summary(rules)

## set of 84 rules
## 
## rule length distribution (lhs + rhs):sizes
##  1  2  3  4  5 
##  7 22 30 20  5 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.929   4.000   5.000 
## 
## summary of quality measures:
##     support         confidence          lift            count      
##  Min.   :0.6000   Min.   :0.6000   Min.   :0.9375   Min.   :3.000  
##  1st Qu.:0.6000   1st Qu.:1.0000   1st Qu.:1.2500   1st Qu.:3.000  
##  Median :0.6000   Median :1.0000   Median :1.2500   Median :3.000  
##  Mean   :0.6048   Mean   :0.9446   Mean   :1.4152   Mean   :3.024  
##  3rd Qu.:0.6000   3rd Qu.:1.0000   3rd Qu.:1.6667   3rd Qu.:3.000  
##  Max.   :0.8000   Max.   :1.0000   Max.   :1.6667   Max.   :4.000  
## 
## mining info:
##          data ntransactions support confidence
##  transactions             5     0.5        0.6

The summary shows that 84 rules are returned. The confidence of rules ranges from 0.6 to 1. The lift ranges from 0.93 to 1.67.

The Top-N Strong Rules

To print all the rules in descending order of lift,

#print the top-10 rules in descending order of lift score
inspect(head(sort(rules, by="lift", decreasing=TRUE),10))

##      lhs      rhs support confidence lift     count
## [1]  {a}   => {m} 0.6     1          1.666667 3    
## [2]  {m}   => {a} 0.6     1          1.666667 3    
## [3]  {a}   => {l} 0.6     1          1.666667 3    
## [4]  {l}   => {a} 0.6     1          1.666667 3    
## [5]  {m}   => {l} 0.6     1          1.666667 3    
## [6]  {l}   => {m} 0.6     1          1.666667 3    
## [7]  {a,m} => {l} 0.6     1          1.666667 3    
## [8]  {a,l} => {m} 0.6     1          1.666667 3    
## [9]  {l,m} => {a} 0.6     1          1.666667 3    
## [10] {a,f} => {m} 0.6     1          1.666667 3

Generate Association Rules longer than 1

To exclude the rules only one item long, turn on the parameter minlen and set it to 2.

#rules having at least a confidence of 0.6
rules <- apriori(
  transactions, 
  parameter = list(support=0.5, confidence=0.6, minlen=2, target="rules")
)

The minlen argument cuts down the total rules to 66. Print the summary of rules:

summary(rules)

## set of 77 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2  3  4  5 
## 22 30 20  5 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.000   3.000   3.104   4.000   5.000 
## 
## summary of quality measures:
##     support      confidence          lift            count  
##  Min.   :0.6   Min.   :0.7500   Min.   :0.9375   Min.   :3  
##  1st Qu.:0.6   1st Qu.:1.0000   1st Qu.:1.2500   1st Qu.:3  
##  Median :0.6   Median :1.0000   Median :1.6667   Median :3  
##  Mean   :0.6   Mean   :0.9708   Mean   :1.4529   Mean   :3  
##  3rd Qu.:0.6   3rd Qu.:1.0000   3rd Qu.:1.6667   3rd Qu.:3  
##  Max.   :0.6   Max.   :1.0000   Max.   :1.6667   Max.   :3  
## 
## mining info:
##          data ntransactions support confidence
##  transactions             5     0.5        0.6

Exercise

Write a script which returns all the 2-sized rules with the minimum support 0.5 and minimum confidence 0.6, displays the top-10 rules by their lift scores in descending order.

post

Home

Posts

Categories

Tags

Gleam

DoPython

DoR

Books

Contact

Recent Posts

Post

Setting up a Python Environment for Machine Learning and Data Science with Conda Virtual Environment and Jupyter Notebook in MacOS and Windows

Fully Remove Python and Install a Fresh Python in MacOS and Windows

The itertools and functools in Python

Developing R Packages using devtools