Python之itertools模块
一、无限迭代器
1、itertools.count(start=0, step=1)
创建一个迭代器,返回一个以start开头,以step间隔的值。其大体如下:
def count(start=0, step=1): # count(10) --> 10 11 12 13 14 ... # count(2.5, 0.5) -> 2.5 3.0 3.5 ... n = start while True: yield n n += step
其实咧为:
from itertools import count import time for i in count(10): time.sleep(2) print(i) #10、11、12...
其中count(10)的类型为itertools.count类型,通过被用作map或者zip函数的参数。
比如:
#map使用 map(lambda x:x*2,count(5)) #zip使用 a = zip(count(10),‘xy‘) print(list(a)) """ [(10, ‘x‘), (11, ‘y‘)] """
2、itertools.cycle(iterable)
创建一个迭代器,从迭代器返回元素,并且保存每个元素的副本。当迭代器迭代完毕后,从保存的副本中返回元素,无限重复。其大体如下:
def cycle(iterable): # cycle(‘ABCD‘) --> A B C D A B C D A B C D ... saved = [] for element in iterable: yield element saved.append(element) while saved: for element in saved: yield element
实例为:
from itertools import cycle print(cycle(‘ABCDE‘)) #<itertools.cycle object at 0x0000000000649448> for item in cycle(‘ABCDE‘): print(item) # A、B、C、D、E、A、B、C、D、E...
3、itertools.repeat(object[, times])
创建一个迭代器,一次又一次的返回对象,除非指定times对象,否则将一直运行下去。其大体如下:
def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in range(times): yield object
其可用于map和zip函数中:
In [1]: from itertools import repeat In [2]: list(map(pow, range(10), repeat(2))) Out[2]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] In [3]: list(zip(range(5),repeat(10))) Out[3]: [(0, 10), (1, 10), (2, 10), (3, 10), (4, 10)] In [4]:
二、 迭代器终止最短输入序列
1、itertools.accumulate(iterable[, func])
创建一个迭代器,返回累加的总和或者是其它指定函数的累加结果(通过func函数进行指定),如果提供了func,则它应该是iterable输入的元素。如果输入的iterable为空,则输出的iterable也将为空。其大体如下:
def accumulate(iterable, func=operator.add): ‘Return running totals‘ # accumulate([1,2,3,4,5]) --> 1 3 6 10 15 # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120 it = iter(iterable) try: total = next(it) except StopIteration: return yield total for element in it: total = func(total, element) yield total
其实例为:
from itertools import accumulate print(accumulate([1,2,3])) #<itertools.accumulate object at 0x00000000006E9448> print(list(accumulate([1,2,3]))) #[1, 3, 6] print(list(accumulate([1,2,3],lambda x,y:x*y))) #[1, 2, 6]
2、itertools.chain(*iterables)
创建一个迭代器,该迭代器从第一个可迭代对象返回元素,直到耗尽为止,然后继续进行下一个可迭代对象,直到所有可迭代对象都耗尽为止。用于将连续序列视为单个序列。大致相当于:
def chain(*iterables): # chain(‘ABC‘, ‘DEF‘) --> A B C D E F for it in iterables: for element in it: yield element
其实例为:
from itertools import chain print(list(chain([1,2,3],[5,6,7]))) #[1, 2, 3, 5, 6, 7]
3、classmethod chain.from_iterable(iterable)
chain函数的替代构造函数,从一个单独的可迭代的参数获取连续的输入,大致相当于:
def from_iterable(iterables): # chain.from_iterable([‘ABC‘, ‘DEF‘]) --> A B C D E F for it in iterables: for element in it: yield element
其实例为:
from itertools import chain print(list(chain.from_iterable([[1,2,3],[5,6,7]]))) #[1, 2, 3, 5, 6, 7]
4、itertools.compress(data, selectors)
创造一个迭代器,用于从数据中过滤元素,这些元素是选择器中对应的元素的结果为True。当数据或者选择器中的元素迭代完毕后停止,其大体相当于:
def compress(data, selectors): # compress(‘ABCDEF‘, [1,0,1,0,1,1]) --> A C E F return (d for d, s in zip(data, selectors) if s)
其实例为:
from itertools import compress data = [1, 2, 3, 4] selectors = [1, 0, 1, 0] filter_data = compress(data, selectors) print(filter_data) # <itertools.compress object at 0x00000000009E5B00> print(list(filter_data)) # [1, 3]
5、itertools.dropwhile(predicate, iterable)
创建一个迭代器,只要predicate为真就从iterable中删除对应的元素,然后返回iterable中剩余的元素。注意的是只要predicate为False,迭代器就不会产生任何元素了,其大体相当于:
def dropwhile(predicate, iterable): # dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1 iterable = iter(iterable) for x in iterable: if not predicate(x): yield x break for x in iterable: yield x
其实例为:
from itertools import dropwhile data = [1, 2, 3, 4, 5] result = dropwhile(lambda x: x < 3, data) print(result) # <itertools.dropwhile object at 0x0000000000D5BD48> print(list(result)) # [3, 4, 5]
6、itertools.filterfalse(predicate, iterable)
创建一个迭代器,过滤出那些当predicate为False时对应的iterable中的元素,如果predicate为None,则返回这个对应的元素。其大体相当于:
def filterfalse(predicate, iterable): # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8 if predicate is None: predicate = bool for x in iterable: if not predicate(x): yield x
其实例为:
from itertools import filterfalse data = [1, 2, 3, 4, 5] result = filterfalse(lambda x: x % 2, data) print(result) # <itertools.filterfalse object at 0x0000000000675E10> print(list(result)) # [2, 4]
7、itertools.groupby(iterable, key=None)
创建一个迭代器,从iterable中返回一系列的key和groups。其中key是一个函数,用于计算从iterable中每一个元素产生的key值。
class groupby: # [k for k, g in groupby(‘AAAABBBCCDAABBB‘)] --> A B C D A B # [list(g) for k, g in groupby(‘AAAABBBCCD‘)] --> AAAA BBB CC D def __init__(self, iterable, key=None): if key is None: key = lambda x: x self.keyfunc = key self.it = iter(iterable) self.tgtkey = self.currkey = self.currvalue = object() def __iter__(self): return self def __next__(self): while self.currkey == self.tgtkey: self.currvalue = next(self.it) # Exit on StopIteration self.currkey = self.keyfunc(self.currvalue) self.tgtkey = self.currkey return (self.currkey, self._grouper(self.tgtkey)) def _grouper(self, tgtkey): while self.currkey == tgtkey: yield self.currvalue try: self.currvalue = next(self.it) except StopIteration: return self.currkey = self.keyfunc(self.currvalue)
8、itertools.islice(iterable, start, stop[, step])
创建一个迭代器,返回从iterable中选择的元素。如果start非零,则iterable中的元素一直被取出直到取出的个数到达start截止;如果stop是None,则直到iterable中的元素耗尽为止,islice方法对于start、stop、step不支持负数。其大致相当于:
def islice(iterable, *args): # islice(‘ABCDEFG‘, 2) --> A B # islice(‘ABCDEFG‘, 2, 4) --> C D # islice(‘ABCDEFG‘, 2, None) --> C D E F G # islice(‘ABCDEFG‘, 0, None, 2) --> A C E G s = slice(*args) it = iter(range(s.start or 0, s.stop or sys.maxsize, s.step or 1)) try: nexti = next(it) except StopIteration: return for i, element in enumerate(iterable): if i == nexti: yield element nexti = next(it)
特别的是如果start是None,迭代器是从0开始,如果step是None,默认是从1。
其实例为:
from itertools import islice data = [1, 2, 3, 4, 5, 6] result = islice(data, 1, 5) print(result) # <itertools.islice object at 0x000000000A426EF8> print(list(result)) # [2, 3, 4, 5]
9、itertools.starmap(function, iterable)
创建一个迭代器,从iterable中获取参数来计算函数,map()和starmap()的区别相当于function(a,b)和function(*c),其大体如下:
def starmap(function, iterable): # starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000 for args in iterable: yield function(*args)
10、itertools.takewhile(predicate, iterable)
创建一个迭代器,只要predicate为True,就返回与之对应的iterable中的元素。其大体如下:
def takewhile(predicate, iterable): # takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4 for x in iterable: if predicate(x): yield x else: break
11、itertools.tee(iterable, n=2)
从一个iterable返回n个独立的迭代器。其大体如下:
def tee(iterable, n=2): it = iter(iterable) deques = [collections.deque() for i in range(n)] def gen(mydeque): while True: if not mydeque: # when the local deque is empty try: newval = next(it) # fetch a new value and except StopIteration: return for d in deques: # load it to all the deques d.append(newval) yield mydeque.popleft() return tuple(gen(d) for d in deques)
实例为:
from itertools import tee result = tee([1,2,3],2) print(result) #(<itertools._tee object at 0x0000000000669448>, <itertools._tee object at 0x00000000006CBD08>) for item in result: print(list(item)) #[1, 2, 3], [1, 2, 3]
12、itertools.zip_longest(*iterables, fillvalue=None)
创建一个迭代器,以聚合每个iterable中的元素,如果iterable中元素的长度不均匀,则用fillvalue进行填充缺失值,迭代一直持续到最长的iterable耗尽为止。其大体相当于:
class ZipExhausted(Exception): pass def zip_longest(*args, **kwds): # zip_longest(‘ABCD‘, ‘xy‘, fillvalue=‘-‘) --> Ax By C- D- fillvalue = kwds.get(‘fillvalue‘) counter = len(args) - 1 def sentinel(): nonlocal counter if not counter: raise ZipExhausted counter -= 1 yield fillvalue fillers = repeat(fillvalue) iterators = [chain(it, sentinel(), fillers) for it in args] try: while iterators: yield tuple(map(next, iterators)) except ZipExhausted: pass
三、组合迭代器
1、itertools.product(*iterables, repeat=1)
大致等效于生成器中的for循环:
((x,y) for x in A for y in B)
其大体如下:
def product(*args, repeat=1): # product(‘ABCD‘, ‘xy‘) --> Ax Ay Bx By Cx Cy Dx Dy # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111 pools = [tuple(pool) for pool in args] * repeat result = [[]] for pool in pools: result = [x+[y] for x in result for y in pool] for prod in result: yield tuple(prod)
2、itertools.permutations(iterable, r=None)
def permutations(iterable, r=None): # permutations(‘ABCD‘, 2) --> AB AC AD BA BC BD CA CB CD DA DB DC # permutations(range(3)) --> 012 021 102 120 201 210 pool = tuple(iterable) n = len(pool) r = n if r is None else r if r > n: return indices = list(range(n)) cycles = list(range(n, n-r, -1)) yield tuple(pool[i] for i in indices[:r]) while n: for i in reversed(range(r)): cycles[i] -= 1 if cycles[i] == 0: indices[i:] = indices[i+1:] + indices[i:i+1] cycles[i] = n - i else: j = cycles[i] indices[i], indices[-j] = indices[-j], indices[i] yield tuple(pool[i] for i in indices[:r]) break else: return
permutations也可以用product函数来进行表示,只要排除那些重复的元素即可。
def permutations(iterable, r=None): pool = tuple(iterable) n = len(pool) r = n if r is None else r for indices in product(range(n), repeat=r): if len(set(indices)) == r: yield tuple(pool[i] for i in indices)
3、itertools.combinations(iterable, r)
组合按字典顺序排序。因此,如果对输入的iterable进行排序,则将按排序顺序生成组合元组。其大体相当于:
def combinations(iterable, r): # combinations(‘ABCD‘, 2) --> AB AC AD BC BD CD # combinations(range(4), 3) --> 012 013 023 123 pool = tuple(iterable) n = len(pool) if r > n: return indices = list(range(r)) yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != i + n - r: break else: return indices[i] += 1 for j in range(i+1, r): indices[j] = indices[j-1] + 1 yield tuple(pool[i] for i in indices)
4、itertools.combinations_with_replacement(iterable, r)
从输入迭代返回元素的r长度子序列, 允许单个元素重复多次。组合按字典顺序排序。因此,如果对输入的iterable进行排序,则将按排序顺序生成组合元组。其大体如下:
def combinations_with_replacement(iterable, r): # combinations_with_replacement(‘ABC‘, 2) --> AA AB AC BB BC CC pool = tuple(iterable) n = len(pool) if not n and r: return indices = [0] * r yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != n - 1: break else: return indices[i:] = [indices[i] + 1] * (r - i) yield tuple(pool[i] for i in indices)