Before reading this post, read the previous part.
A regular expression is a pattern that describes a group of strings.
1. Escaping Characters
\
: Escape metacharacters in regular expression, i.e.
$ * + . ? [ ] ^ { } | ( ) \`
As \
itself needs to be escaped in R, R requires double backslash to escape these metacharacters, like \?.
2. Special Metacharacters
\\t
: Tab\\n
: New line\\v
: Vertical tab\\f
: Form feed\\r
: Carriage return
3. Quantifiers
Quantifiers specify how many times that the preceding pattern should occur.
*
: matches at least 0 times.+
: matches at least 1 times.?
: matches at most 1 times.{n}
: matches exactly n times.{n,}
: matches at least n times.{,m}
: matches at most m times.{n,m}
: matches between n and m times.
Exercise
4. Position Anchors
^
: Start of the string.$
: End of the string.\\b
: Empty string at either edge of a word.\\B
: Empty string, not at the edge of a word.\\<
: Beginning of a word\\>
: End of a word
5. Characters and Operators
.
: Any single character except\n
[...]
: a permitted character list. Use-
inside the brackets to specify a range of characters.[^...]
: an excluded character list. Match any characters except those inside the square brackets.|
: an OR operator, matches patterns on either side of the|
.
6. Character Classes
[[:digit:]]
or\\d
or[0-9]
: digits0 1 2 3 4 5 6 7 8 9
\\D
or[^0-9]
: non-digits[[:lower:]]
or[a-z]
: lower-case letters[[:upper:]]
or[A-Z]
: upper-case letters[[:alpha:]]
or[[:lower:][:upper:]]
or[A-z]
: alphabetic characters[[:alnum:]]
or[[:alpha:][:digit:]]
or[A-z0-9]
: alphanumeric characters\\w
or[[:alnum:]_]
or[A-z0-9_]
: word characters include alphanumeric characters (0-9,a-z,A-Z), (-, - and -) and underscores (_).\\W
or[^A-z0-9_]
: non-word characters[[:xdigit:]]
or[0-9A-Fa-f]
: hexadecimal digits (base 16)0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f
[[:blank:]]
: space and tab[[:space:]]
or `\s' : space characters: tab, newline, vertical tab, form feed, carriage return, space\\S
: not space characters[[:punct:]]
: punctuation characters
! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ ] ^ _ ` { | } ~
[[:graph:]]
or[[:alnum:][:punct:]]
: graphical (human readable) characters[[:print:]]
or[[:alnum:][:punct:]\\s]
: printable characters[[:cntrl:]]
or\\c
: control characters, like\n
or\r
etc.
Exercise:
Continue to Part 3.
Share this post
Twitter
Facebook
LinkedIn
Email