This is a simple guide to show you how to run the function read.transaction
to coerce shopping basket data into the required format by the packages arules and aulesViz.
The letter a through s are the name of shopping items available. Assume that we store the sample basket data in a plain text file, namely baskets
.
Convert the Sample Data into the Transactions Class
The arules package provides the function read.transactions
which reads basket data into a transactions class.
The following script will read basket data as the Transactions class.
transactions <- arules::read.transactions(
file="baskets",
format = c("basket"),
sep = ",",
cols =NULL,
rm.duplicates = 1,
skip = 0
)
The parameter format
is a character string indicating the format of the data set.
- For ‘basket’ format, each line in the transaction data file represents a transaction where the items (item labels) are separated by the characters specified by sep.
- For ‘single’ format, each line corresponds to a single item, containing at least ids for the transaction and the item.
The parameter sep
is a character string specifying how fields are separated in the data file. We use ‘,’ in the sample basket data.
The parameter skip
is number of lines to skip in the file before start reading data.
Validate the Data
Inspect the data structure
To validate whether the baskets have been read into the class correctly, take a glimpse of the transactions
data.
str(transactions)
## Formal class 'transactions' [package "arules"] with 3 slots
## ..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
## .. .. ..@ i : int [1:33] 0 2 3 5 6 10 11 14 0 1 ...
## .. .. ..@ p : int [1:6] 0 8 15 20 25 33
## .. .. ..@ Dim : int [1:2] 16 5
## .. .. ..@ Dimnames:List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : NULL
## .. .. ..@ factors : list()
## ..@ itemInfo :'data.frame': 16 obs. of 1 variable:
## .. ..$ labels: chr [1:16] "a" "b" "c" "d" ...
## ..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
The Item Names
To validate the item names, run the command:
transactions@itemInfo$labels
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "j" "k" "l" "m" "n" "o" "p" "s"
The result shows the item names, a through s.
Find Number of Transactions
#examine number of transactions
transactions@data@Dim[2]
## [1] 5
List the Items of a Specific Transaction
The component transactions@data@p
store the position indexes by which we can read the items for a given transaction.
transactions@data@p
## [1] 0 8 15 20 25 33
To find the first position and last position to read the items for the ith transaction, say, the first transaction, run the following script to find the first and last item index.
transactions@data@p[1:2]
## [1] 0 8
The two index numbers give the position range in the component transactions@data@i
.
transactions@data@i[0:8]
## [1] 0 2 3 5 6 10 11 14
To map the item indexes to the item labels:
itemIndex <- transactions@data@i[0:8] + 1
transactions@itemInfo$labels[itemIndex]
## [1] "a" "c" "d" "f" "g" "l" "m" "p"
Compare the items above to the first line in the original basket data:
f,a,c,d,g,l,m,p
They are the same. The data conversion is successful.
Share this post
Twitter
Facebook
LinkedIn
Email