没有所谓的捷径
一切都是时间最平凡的累积

python正则不再使用re.compile()先编译表达式

本文最后更新于2019年8月27日,已超过83天没有更新,如果文章内容失效,请反馈给我们,谢谢!

以前从网上学习很多教程都在使用正则match,search,findall前先编译表达式

import re
pattern = re.compile('正则表达式')
text = '字符串'
result = pattern.findall(text)

今天从一个大神文章中学习到,原来完全没必要,于是去看了下re模块源码,果然如此:

def match(pattern, string, flags=0):
    """Try to apply the pattern at the start of the string, returning
    a match object, or None if no match was found."""
    return _compile(pattern, flags).match(string)

def fullmatch(pattern, string, flags=0):
    """Try to apply the pattern to all of the string, returning
    a match object, or None if no match was found."""
    return _compile(pattern, flags).fullmatch(string)

def search(pattern, string, flags=0):
    """Scan through string looking for a match to the pattern, returning
    a match object, or None if no match was found."""
    return _compile(pattern, flags).search(string)

def sub(pattern, repl, string, count=0, flags=0):
    """Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the match object and must return
    a replacement string to be used."""
    return _compile(pattern, flags).sub(repl, string, count)

def subn(pattern, repl, string, count=0, flags=0):
    """Return a 2-tuple containing (new_string, number).
    new_string is the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in the source
    string by the replacement repl.  number is the number of
    substitutions that were made. repl can be either a string or a
    callable; if a string, backslash escapes in it are processed.
    If it is a callable, it's passed the match object and must
    return a replacement string to be used."""
    return _compile(pattern, flags).subn(repl, string, count)

def split(pattern, string, maxsplit=0, flags=0):
    """Split the source string by the occurrences of the pattern,
    returning a list containing the resulting substrings.  If
    capturing parentheses are used in pattern, then the text of all
    groups in the pattern are also returned as part of the resulting
    list.  If maxsplit is nonzero, at most maxsplit splits occur,
    and the remainder of the string is returned as the final element
    of the list."""
    return _compile(pattern, flags).split(string, maxsplit)

def findall(pattern, string, flags=0):
    """Return a list of all non-overlapping matches in the string.

    If one or more capturing groups are present in the pattern, return
    a list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result."""
    return _compile(pattern, flags).findall(string)

def finditer(pattern, string, flags=0):
    """Return an iterator over all non-overlapping matches in the
    string.  For each match, the iterator returns a match object.

    Empty matches are included in the result."""
    return _compile(pattern, flags).finditer(string)

def compile(pattern, flags=0):
    "Compile a regular expression pattern, returning a pattern object."
    return _compile(pattern, flags)

我们常用的match,search,sub,findall等等方法都是自带了__compile,

所以不用画蛇添足的先去编译一次表达式了,那如何是在迭代中使用正则呢?是否就每次都编译一次,是不是影响速度?

比如:

lists = [列表]
for text in lists:
    re.findall('正则表达式', text)

看起像是每次迭代都编译了表达式

lists = [列表]
pattern = re.compile('正则表达式')
for text in lists:
    pattern.findall(text)

而这样只编译了一次,看起似乎有些道理,其实不然,来看看compile源码:

def compile(pattern, flags=0):
    "Compile a regular expression pattern, returning a pattern object."
    return _compile(pattern, flags)
_cache = {}

_pattern_type = type(sre_compile.compile("", 0))

_MAXCACHE = 512
def _compile(pattern, flags):
    # internal: compile pattern
    try:
        p, loc = _cache[type(pattern), pattern, flags]
        if loc is None or loc == _locale.setlocale(_locale.LC_CTYPE):
            return p
    except KeyError:
        pass
    if isinstance(pattern, _pattern_type):
        if flags:
            raise ValueError(
                "cannot process flags argument with a compiled pattern")
        return pattern
    if not sre_compile.isstring(pattern):
        raise TypeError("first argument must be string or compiled pattern")
    p = sre_compile.compile(pattern, flags)
    if not (flags & DEBUG):
        if len(_cache) >= _MAXCACHE:
            _cache.clear()
        if p.flags & LOCALE:
            if not _locale:
                return p
            loc = _locale.setlocale(_locale.LC_CTYPE)
        else:
            loc = None
        _cache[type(pattern), pattern, flags] = p, loc
    return p

_compile方法会储存512条由type(pattern), pattern, flags)组成的Key,如果是同一表达式,再次调用_compile方法时就读取的缓存,不会多次编译.

 

赞(0) 打赏
声明:本站发布的内容(图片、视频和文字)以原创、转载和分享网络内容为主,若涉及侵权请及时告知,将会在第一时间删除,联系邮箱:lwarm@qq.com。文章观点不代表本站立场。本站原创内容未经允许不得转载,或转载时需注明出处:红岩子 » python正则不再使用re.compile()先编译表达式
分享到: 更多 (0)

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

今天所做的努力都是在为明天积蓄力量

联系我们赞助我们