R语言学习笔记(二十一):字符串处理中的元字符
元字符有自己的特殊含义
states <- row.names(USArrests) # 提取字符串子集 substr(x = states, start = 1, stop = 4) abbreviate(states, minlength = 5) # 计算字符串长度 state_chars <- nchar(states) # 字符串匹配 grep(pattern = "w", x = states, value = T) # 匹配大写和小写
[ ]
内的任意字符将被匹配
grep(pattern = "[wW]", x = states, value = T) grep(pattern = "w", ignore.case = T, x = states, value = T) tolower(states) # 变为小写 toupper(states) # 变为大写 library("stringr") # 含a的数目 str_count(states, "a")
\
对元字符进行转义
strsplit("strsplit.also.uses", split = ".") strsplit("strsplit.also.uses", split = "\\.") str_extract_all("me credit card: 334", pattern = "\\d")
^
匹配字符串的开头,将^置于character class 的首位表达的意思是取反义。如[ˆ5] 表示匹配除了“5” 以外的所有字符。
test_vector <- c("123","456","321") str_extract_all(test_vector, "3") str_extract_all(test_vector, "^3") str_extract_all(test_vector, "[^3]")
$
匹配字符串的结尾。但将它置于character class 内则消除了它的特殊含义。如[akm$]将匹配'a','k','m' 或者'$'。
str_extract_all(test_vector, "3$") str_extract_all(test_vector, "[3$]")
.
匹配除换行符以外的任意字符。
str_extract_all(string = c("regular.exp\n","\n"), pattern =".")
|
或者
str_extract_all(string = "we23", pattern ="b|w|3")
?
此符号前的字符(组) 是可有可无的,并且最多被匹配一次
str_extract_all(string = c("abc","bc","ac"),pattern = "ab?c")
( )
表示一个字符组,括号内的字符串将作为一个整体被匹配
str_extract_all(string = c("abc","ac","cde"),pattern = "(ab)c")
*
此符号前的字符(组) 将被匹配零次或多次
str_extract_all(string = c("abab","abc","ac"),pattern = "(ab)*")
+
前面的字符(组) 将被匹配一次或多次
str_extract_all(string = c("abbab","abc","ac"),pattern = "ab+")
{n,m}
重复n次到m次
str_extract_all(string = c("abababab","ababc","abc"),pattern = "(ab){2}") str_extract_all(string = c("abababab","ababc","abc"),pattern = "(ab){2,}") str_extract_all(string = c("abababab","ababc","abc"),pattern = "(ab){2,3}")
相关推荐
amberom 2020-08-03
zhuyonge 2020-07-26
世事一场大梦 2020-11-17
wangzhaotongalex 2020-10-20
rechanel 2020-11-16
cakecc00 2020-11-06
cshanzhizi 2020-10-16
luofuIT成长记录 2020-09-22
周游列国之仕子 2020-09-21
PYTandFA 2020-09-15
taomengxing 2020-09-07
MaggieRose 2020-08-19
kevinweijc 2020-08-18
earthhouge 2020-08-18
yonggeno 2020-08-18
jyj00 2020-08-15
CXsilent 2020-08-12
yiyilanmei 2020-08-03
纬纬 2020-07-31