python如何使用正则表达式的前向、后向搜索及前向搜索否定模式详解

sunskyday

2017-11-08

前言

在许多的情况下，很多要匹配内容是一起出现，或者一起不出现的。比如《》，< >，这样的括号，不存在使用半个的情况。因此，在正则表达式里也有一致性的判断，要么两个尖括号一起出现，要么一个也不要出现。怎么样来实现这种判断呢？针对这种情况得引入新的正则表达式的语法：(?=pattern)，这个语法它会向前搜索或者向后搜索相关内容，如果不会出现就不能匹配。不过，这个匹配不会消耗任何输入的字符，它只是查看一下。

例子如下：

#python 3.6 
#蔡军生 
#http://blog.csdn.net/caimouse/article/details/51749579 
# 
import re 
 
address = re.compile( 
 ''''' 
 # A name is made up of letters, and may include "." 
 # for title abbreviations and middle initials. 
 ((?P<name> 
  ([\w.,]+\s+)*[\w.,]+ 
  ) 
  \s+ 
 ) # name is no longer optional 
 
 # LOOKAHEAD 
 # Email addresses are wrapped in angle brackets, but only 
 # if both are present or neither is. 
 (?= (<.*>$)  # remainder wrapped in angle brackets 
  | 
  ([^<].*[^>]$) # remainder *not* wrapped in angle brackets 
  ) 
 
 <? # optional opening angle bracket 
 
 # The address itself: [email protected] 
 (?P<email> 
  [\w\d.+-]+  # username 
  @ 
  ([\w\d.]+\.)+ # domain name prefix 
  (com|org|edu) # limit the allowed top-level domains 
 ) 
 
 >? # optional closing angle bracket 
 ''', 
 re.VERBOSE) 
 
candidates = [ 
 u'First Last <[email protected]>', 
 u'No Brackets [email protected]', 
 u'Open Bracket <[email protected]', 
 u'Close Bracket [email protected]>', 
] 
 
for candidate in candidates: 
 print('Candidate:', candidate) 
 match = address.search(candidate) 
 if match: 
  print(' Name :', match.groupdict()['name']) 
  print(' Email:', match.groupdict()['email']) 
 else: 
  print(' No match')

结果输出如下：

Candidate: First Last <[email protected]>
 Name : First Last
 Email: [email protected]
Candidate: No Brackets [email protected]
 Name : No Brackets
 Email: [email protected]
Candidate: Open Bracket <[email protected]
 No match
Candidate: Close Bracket [email protected]>
 No match

python里使用正则表达式的前向搜索否定模式

上面学习前向搜索或后向搜索模式(?=pattern)，这个模式里看到有等于号=，它是表示一定相等，其实前向搜索模式里，还有不相等的判断。比如你需要识别EMAIL地址：[email protected]，这个EMAIL地址大多数是不需要回复的，所以我们要把这个EMAIL地址识别出来，并且丢掉它。怎么办呢？这时你就需要使用前向搜索否定模式，它的语法是这样：(?!pattern)，这里的感叹号就是表示非，不需要的意思。比如遇到这样的字符串：[email protected]，它会判断noreply@是否相同，如果相同，就丢掉这个模式识别，不再匹配。

例子如下：

#python 3.6 
#蔡军生 
#http://blog.csdn.net/caimouse/article/details/51749579 
# 
import re 
 
address = re.compile( 
 ''''' 
 ^ 
 
 # An address: [email protected] 
 
 # Ignore noreply addresses 
 (?!noreply@.*$) 
 
 [\w\d.+-]+  # username 
 @ 
 ([\w\d.]+\.)+ # domain name prefix 
 (com|org|edu) # limit the allowed top-level domains 
 
 $ 
 ''', 
 re.VERBOSE) 
 
candidates = [ 
 u'[email protected]', 
 u'[email protected]', 
] 
 
for candidate in candidates: 
 print('Candidate:', candidate) 
 match = address.search(candidate) 
 if match: 
  print(' Match:', candidate[match.start():match.end()]) 
 else: 
  print(' No match')

结果输出如下：

Candidate: [email protected]
 Match: [email protected]
Candidate: [email protected]
 No match

总结

正则表达式 python brackets

安科网

python如何使用正则表达式的前向、后向搜索及前向搜索否定模式详解

sunskyday

sunskyday

相关推荐

shell模糊匹配与正则详解

正则表达式中两个反斜杠的匹配规则详解

正则表达式解决input框固定输入值得格式(金额,特殊字符)

浅析golang 正则表达式

Oracle数据库正则表达式使用场景代码实例

Shell—正则表达式（grep命令、sed工具）

【教程】图文解读正则表达式的使用技巧

如何掌握正则表达式这一开发利器，看这篇就够了

基于xpath选择器、PyQuery、正则表达式的格式清理工具详解

3个助你玩转正则表达式的利器

如何使用Grep命令查找多个字符串

C# 正则表达式

正则表达式常用通配符

正则表达式在NLP中应用

正则表达式匹配样例

正则表达式常用的字符类

用正则表达式验证表格的格式

SHELL正则表达式

02-re模块使用

正则表达式 I

sunskyday