R语言 字符处理基础函数
1、nchar(x):返回字符串或者字符串向量x的长度。
> nchar("I love you!") [1] 11 > nchar(c("I", "love", "you", "!")) [1] 1 4 3 1
2、grep(pattern,x):返回 pattern 在字符串向量 x 中的位置。
> grep("y", "I love you!") [1] 1 > a <- c("I", "love", "you", "!") > grep("y", a) [1] 3 > grep("k", a) integer(0)
3、paste(...,sep=" "):连接字符串,分隔符为 sep (默认值为空格)。
> paste("I", "love", "you", "!") [1] "I love you !" > a <- c("I", "love", "you", "!") > a [1] "I" "love" "you" "!" > paste(a, 1:4) [1] "I 1" "love 2" "you 3" "! 4" > paste(a, 1:4, sep="-") [1] "I-1" "love-2" "you-3" "!-4" > paste("Today is","Sat Jan 11 2020") [1] "Today is Sat Jan 11 2020"
4、paste0(...,sep=" "):以空字符串连接字符。
> paste0("I", "love", "you", "!") [1] "Iloveyou!" > a <- c("I", "love", "you", "!") > a [1] "I" "love" "you" "!" > paste0(a, 1:4) [1] "I1" "love2" "you3" "!4" > paste0(a, 1:4, sep="--") [1] "I1--" "love2--" "you3--" "!4--" > b <- c("甲","乙","丙","丁","戊","己","庚","辛","壬","癸") > d <- c("子","丑","寅","卯","辰","巳","午","未","申","酉","戌","亥") > paste0(b, d) [1] "甲子" "乙丑" "丙寅" "丁卯" "戊辰" "己巳" "庚午" [8] "辛未" "壬申" "癸酉" "甲戌" "乙亥"
5、sprintf(...):按照一定格式把若干的组件组合成字符串。
> a <- 11 > sprintf("The square of %d is %d", a, a^2) [1] "The square of 11 is 121" > sprintf("The square root of %d is %d", a^2, (a^2)^0.5) [1] "The square root of 121 is 11"
相似于 Python 中的打印格式化字符串
示例:
a = 11 print(‘The square of %d is %d‘ % (a, a**2)) print(‘The square root of {} is {}‘.format(a**2, a)) The square of 11 is 121 The square root of 121 is 11
6、substr(x,start,stop):截取字符串x中start到stop范围的字串。
excel 中的 mid(), python 中的 切片
示例:
> a <- paste0(letters[1:7], collapse="") > a [1] "abcdefg" > substr(a, 1, 3) [1] "abc" > substr(a, 1, 3) <- "aaa" > a [1] "aaadefg" > b <- c("1a","2bb", "3ccc", "4dddd" ) > substr(b, 1, 2) [1] "1a" "2b" "3c" "4d"
7、strsplit(x,split):根据split将x拆分成若干字串,返回这些字串组成的列表。
python 中的 s.split(split)
示例:
> a <-paste(letters[1:7], collapse="_") > a [1] "a_b_c_d_e_f_g" > strsplit(a, "_") [[1]] [1] "a" "b" "c" "d" "e" "f" "g" > b <- paste0(letters[1:7], 1:7, collapse="_") > b [1] "a1_b2_c3_d4_e5_f6_g7" > strsplit(b, "_") [[1]] [1] "a1" "b2" "c3" "d4" "e5" "f6" "g7" > d <- paste0(c(2020, 01, 10), collapse="/") > d [1] "2020/1/10" > strsplit(d, "/") [[1]] [1] "2020" "1" "10" > # 将列表转换为字符串向量 > unlist(strsplit(d, "/")) [1] "2020" "1" "10"
8、regexpr(pattern,x):在字符串 x 中寻找 pattern,返回与pattern匹配的第一个子字符串的起始字符位置。
> a <- "I love you!" > regexpr("y", a) [1] 8 attr(,"match.length") [1] 1 attr(,"index.type") [1] "chars" attr(,"useBytes") [1] TRUE
“y” 在 a 的第八个位置开始,并且长度为1。
9、gregexpr(pattern,x):查找x中的所有与pattern匹配的字串开始位置及长度。
> a <- "I love you!" > b <- "You love me!" > paste(a, b) [1] "I love you! You love me!" > gregexpr("v", paste(a, b)) [[1]] [1] 5 19 attr(,"match.length") [1] 1 1 attr(,"index.type") [1] "chars" attr(,"useBytes") [1] TRUE
"v" 在 paste(a, b) 中出现了两次。
推荐阅读:
http://blog.sina.com.cn/s/blog_69ffa1f90101sie9.html
https://www.cnblogs.com/awishfullyway/p/6601539.html
https://blog.csdn.net/yj1556492839/article/details/82725315