Python模块之collections
一、概述
在collections的源码中,可以看到:
‘‘‘This module implements specialized container datatypes providing alternatives to Python‘s general purpose built-in containers, dict, list, set, and tuple. * namedtuple factory function for creating tuple subclasses with named fields * deque list-like container with fast appends and pops on either end * ChainMap dict-like class for creating a single view of multiple mappings * Counter dict subclass for counting hashable objects * OrderedDict dict subclass that remembers the order entries were added * defaultdict dict subclass that calls a factory function to supply missing values * UserDict wrapper around dictionary objects for easier dict subclassing * UserList wrapper around list objects for easier list subclassing * UserString wrapper around string objects for easier string subclassing ‘‘‘ __all__ = [‘deque‘, ‘defaultdict‘, ‘namedtuple‘, ‘UserDict‘, ‘UserList‘, ‘UserString‘, ‘Counter‘, ‘OrderedDict‘, ‘ChainMap‘]
这也就说明collections模块包含以下内容:
- deque
- defaultdict
- namedtuple
- UserDict
- UserList
- UserString
- Counter
- OrderDict
- ChainMap
二、namedTuple
(一)Tuple
namedTuple是Tuple的子类,所以Tuple有的特性,namedTuple都存在,那么Tuple有什么特性呢?
1、不可变类型
Tuple是不可变的数据类型:
>>> user_tuple = ("zhangsan",30) #创建Tuple对象
一旦创建不可更改,比如做如下的更改操作:
>>> user_tuple[1] = 32
就会报错:
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: ‘tuple‘ object does not support item assignment
但是,Tuple的不可变也不是绝对的,我们看到Tuple内部的元素都是不可变的,如果改变内部可变的数据类型是没有问题的:
>>> user_tuple = ("zhangsan",30,["reading","movies"]) >>> user_tuple[2].append("animals") >>> user_tuple (‘zhangsan‘, 30, [‘reading‘, ‘movies‘, ‘animals‘]) >>>
2、可迭代对象
Tuple像列表等数据类型一样,是可迭代对象,自然拥有循环取值、切片这些特性:
#for循环迭代取值 user_tuple = ("zhangsan",30) for el in user_tuple: print(el)
3、拆包
number_Tuple = (1,2,3,4) first,*others = number_Tuple print(first,others) #1 [2, 3, 4]
4、作为字典的key
字典的key都是不可变类型,也就是说必须是可哈希的:
condition = ("a","b") filter_dict = {} filter_dict[condition] = "result" print(filter_dict) #{(‘a‘, ‘b‘): ‘result‘}
(二)namedTuple
1、类创建
我们一般创建类是这样来创建的:
class User: def __init__(self,username,password): self.username = username self.password = password user = User("zhangsan",123456) print(user.username,user.password) #zhangsan 123456
但是使用namedTuple可以更简单的创建:
User = namedtuple("User",["username","password"]) user = User("zhangsan",123456) print(user.username,user.password) #zhangsan 123456
至于参数传递实际上与class类中传递是一样的,可以通过*args,**kwargs。
#Tuple传值 User = namedtuple("User",["username","password"]) args = ("zhangsan",123456) user = User(*args) print(user.username,user.password) #zhangsan 123456 #Dict传值 User = namedtuple("User",["username","password"]) kwargs = {"username":"zhangsan","password":123456} user = User(**kwargs) print(user.username,user.password) #zhangsan 123456
2、_make
在上面的传值中,我们使用**args或者**kwargs来进行传值,那么通过_make方法可以更简单的进行传值:
from collections import namedtuple #define class User = namedtuple("User",["username","password"]) #define parameters parameters_list = ["zhangsan",123456] parameters_tuple = ("zhangsan",123456) parameters_dict = {"username":"zhangsan","password":123456} #init object user = User._make(parameters_list) user1 = User._make(parameters_tuple) user2 = User._make(parameters_dict) #output print(user.username,user.password) #zhangsan 123456 print(user1.username,user.password) #zhangsan 123456 print(user2.username,user.password) #zhangsan 123456
可以看到在_make方法中只需要传递可迭代对象的参数即可。
@classmethod def _make(cls, iterable, new=tuple.__new__, len=len): ‘Make a new {typename} object from a sequence or iterable‘ result = new(cls, iterable) if len(result) != {num_fields:d}: raise TypeError(‘Expected {num_fields:d} arguments, got %d‘ % len(result)) return result
_make
3、_asdict
该方法可以输出OrderDict类型的结果,将字典进行排序后输出。
from collections import namedtuple User = namedtuple("User",["username","password"]) kwargs = {"username":"zhangsan"} user = User(**kwargs,password=123456) print(user) #User(username=‘zhangsan‘, password=123456) user_dict = user._asdict() print(user_dict) #OrderedDict([(‘username‘, ‘zhangsan‘), (‘password‘, 123456)])
三、defaultdict
defaultdict是内置dict的子类,也就是说dict有的特性它都有,另外在源码中:
class defaultdict(dict): def __init__(self, default_factory=None, **kwargs): # known case of _collections.defaultdict.__init__ """ defaultdict(default_factory[, ...]) --> dict with default factory The default factory is called without arguments to produce a new value when a key is not present, in __getitem__ only. A defaultdict compares equal to a dict with the same items. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments. # (copied from class doc) """ pass
从这里可以知道有一个参数是default_factory函数,它是在当dict中的key不存在时,会被给予给默认值。假如现在有这样一个实例:
s = [‘yellow‘, ‘blue‘,‘yellow‘, ‘blue‘,‘red‘]
统计s列表中每个元素出现的个数,我们可能更多的使用如下的方式来实现:
from collections import defaultdict s = [‘yellow‘, ‘blue‘,‘yellow‘, ‘blue‘,‘red‘] count_dict = {} for i in s: if i not in count_dict: count_dict[i] = 1 else: count_dict[i] += 1 print(count_dict) #{‘yellow‘: 2, ‘blue‘: 2, ‘red‘: 1}
使用defaultdict可以更容易的来实现上述过程:
from collections import defaultdict s = [‘yellow‘, ‘blue‘,‘yellow‘, ‘blue‘,‘red‘] d = defaultdict(int) #key值不存在就会使用int类型的默认值,默认为0,相当于{“yellow”:0,"blue":0,"red":0} for i in s: d[i] += 1 print(d) #defaultdict(<class ‘int‘>, {‘red‘: 1, ‘yellow‘: 2, ‘blue‘: 2})
另外可以使用其构造更为复杂的数据结构:
from collections import defaultdict def gen_default(): return { "username":"", "age":0 } d = defaultdict(gen_default) d["g1"] #g1键值不存在会生成默认的数据结构{"g1":{"username":"","age":0}}
四、deque
(一)deque初始化
先看源码:
class deque(object): def __init__(self, iterable=(), maxlen=None): # known case of _collections.deque.__init__ """ deque([iterable[, maxlen]]) --> deque object A list-like sequence optimized for data accesses near its endpoints. # (copied from class doc) """ pass
初始化一个双端队列的话,需要传入一个可迭代对象。
from collections import deque d = deque(["a","b","c"]) print(d) #deque([‘a‘, ‘b‘, ‘c‘])
当然,也可以传入元祖和字典(得到的是key值)。
(二)deque方法
deque中有很多方法:
class deque(object): """ deque([iterable[, maxlen]]) --> deque object A list-like sequence optimized for data accesses near its endpoints. """ def append(self, *args, **kwargs): # real signature unknown """ Add an element to the right side of the deque. """ pass def appendleft(self, *args, **kwargs): # real signature unknown """ Add an element to the left side of the deque. """ pass def clear(self, *args, **kwargs): # real signature unknown """ Remove all elements from the deque. """ pass def copy(self, *args, **kwargs): # real signature unknown """ Return a shallow copy of a deque. """ pass def count(self, value): # real signature unknown; restored from __doc__ """ D.count(value) -> integer -- return number of occurrences of value """ return 0 def extend(self, *args, **kwargs): # real signature unknown """ Extend the right side of the deque with elements from the iterable """ pass def extendleft(self, *args, **kwargs): # real signature unknown """ Extend the left side of the deque with elements from the iterable """ pass def index(self, value, start=None, stop=None): # real signature unknown; restored from __doc__ """ D.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present. """ return 0 def insert(self, index, p_object): # real signature unknown; restored from __doc__ """ D.insert(index, object) -- insert object before index """ pass def pop(self, *args, **kwargs): # real signature unknown """ Remove and return the rightmost element. """ pass def popleft(self, *args, **kwargs): # real signature unknown """ Remove and return the leftmost element. """ pass def remove(self, value): # real signature unknown; restored from __doc__ """ D.remove(value) -- remove first occurrence of value. """ pass def reverse(self): # real signature unknown; restored from __doc__ """ D.reverse() -- reverse *IN PLACE* """ pass def rotate(self, *args, **kwargs): # real signature unknown """ Rotate the deque n steps to the right (default n=1). If n is negative, rotates left. """ pass
源码中的方法
1、pop
from collections import deque d = deque(["a","b","c"]) print(d.pop()) #c print(d) #deque([‘a‘, ‘b‘])
2、popleft
from collections import deque d = deque(["a","b","c"]) print(d.popleft()) #a print(d) #deque([‘b‘, ‘c‘])
3、append
from collections import deque d = deque(["a","b","c"]) d.append("d") print(d) #deque([‘a‘, ‘b‘, ‘c‘, ‘d‘])
4、appendleft
from collections import deque d = deque(["a","b","c"]) d.appendleft("d") print(d) #deque([‘d‘, ‘a‘, ‘b‘, ‘c‘])
5、extend
from collections import deque d1 = deque(["a","b","c"]) d2 = deque(["d","e","f"]) d1.extend(d2) print(d1) #deque([‘a‘, ‘b‘, ‘c‘, ‘d‘, ‘e‘, ‘f‘])
注意:extend没有返回值,d1调用extend就是对d1的扩展。
6、insert
from collections import deque d = deque(["a","b","c"]) d.insert(1,"d") print(d) #deque([‘a‘, ‘d‘, ‘b‘, ‘c‘])
7、reverse
from collections import deque d = deque(["a","b","c"]) d.reverse() print(d) #deque([‘c‘, ‘b‘, ‘a‘])
8、copy
from collections import deque d1 = deque(["a","b","c"]) d2 = d1.copy() #id不同证明是不同的变量 print(id(d1)) #173766656 print(id(d2)) #173766864 #拷贝之后操作d1对d2没影响 d1.insert(2,"d") print(d1) #deque([‘a‘, ‘b‘, ‘d‘, ‘c‘]) print(d2) #deque([‘a‘, ‘b‘, ‘c‘]) #如果d1中有可变元素 d3 = deque(["a","b",["c","d"]]) d4 = d3.copy() print(id(d3)) #173570256 print(id(d4)) #173570360 #操作可变元素,也就是说虽然d3和d4是不同的变量了,但是对于内部的可变元素是指引,不可变元素才是真正的拷贝互不影响 d3[2].append("e") print(d3) #deque([‘a‘, ‘b‘, [‘c‘, ‘d‘, ‘e‘]]) print(d4) #deque([‘a‘, ‘b‘, [‘c‘, ‘d‘, ‘e‘]])
还有很多方法,其余的可以参考源码进行学习。
五、Counter
Counter类是Python内置dict的一个子类,也就是说dict有的特性它都有,它主要是用来进行数据统计的。它是一个无序集合,其中元素被存储为字典的键,计数被存储为字典的值。计数可以被允许是整数、零或者负数。
(一)统计个数
可以向Counter类中传递可迭代对象,比如:字符串、列表:
1、字符串统计
from collections import Counter counter1 = Counter("ABCDAD") print(counter1) #Counter({‘D‘: 2, ‘A‘: 2, ‘C‘: 1, ‘B‘: 1}) """ 因为Counter返回的是一个字典(是dict的子类),所以可以有字典的方法 """ counter2 = Counter("DEFABC") counter1.update(counter2) print(counter1) #Counter({‘A‘: 3, ‘D‘: 3, ‘C‘: 2, ‘B‘: 2, ‘F‘: 1, ‘E‘: 1})
2、列表统计
from collections import Counter counter1 = Counter(["A", "B", "C", "D", "A", "D"]) print(counter1) #Counter({‘D‘: 2, ‘A‘: 2, ‘C‘: 1, ‘B‘: 1}) """ 因为Counter返回的是一个字典(是dict的子类),所以可以有字典的方法 """ counter2 = Counter(["D", "E", "F", "A", "B", "C"]) counter1.update(counter2) print(counter1) #Counter({‘A‘: 3, ‘D‘: 3, ‘C‘: 2, ‘B‘: 2, ‘F‘: 1, ‘E‘: 1})
(二)TopN问题
在Counter类中有一个most_common方法返回的是个数最多的前几项。
from collections import Counter top3 = Counter(‘abcdeabcdabcaba‘).most_common(3) print(top3) #[(‘a‘, 5), (‘b‘, 4), (‘c‘, 3)]
源码:
class Counter(dict): def most_common(self, n=None): ‘‘‘List the n most common elements and their counts from the most common to the least. If n is None, then list all element counts. >>> Counter(‘abcdeabcdabcaba‘).most_common(3) [(‘a‘, 5), (‘b‘, 4), (‘c‘, 3)] ‘‘‘ # Emulate Bag.sortedByCount from Smalltalk if n is None: return sorted(self.items(), key=_itemgetter(1), reverse=True) return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
(三)其它方法
1、elements
#迭代器遍历每个元素的次数与它的计数相同 c = Counter("ABCDAD") print(sorted(c.elements())) #[‘A‘, ‘A‘, ‘B‘, ‘C‘, ‘D‘, ‘D‘]
2、subtract
元素从一个可迭代的或从另一个映射(或计数器)中减去。
from collections import Counter c = Counter(a=4, b=2, c=0, d=-2) d = Counter(a=1, b=2, c=3, d=4) c.subtract(d) print(c) #Counter({‘a‘: 3, ‘b‘: 0, ‘c‘: -3, ‘d‘: -6})
六、OrderDict
OrderDict是dict的子类,它拥有dict的所有特性,而它本身是有序的(记住插入顺序的字典)。
from collections import OrderedDict d = OrderedDict() #初始化一个字典 d["a"] = 1 d["b"] = 2 d["c"] = 3 print(d) #OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)])
可以看到最后生成的结果并不是无序的,而是按照插入到字典中的元素进行排序的。
在OrderDict中有很多的方法,比如:
1、popitem
移除最后一个添加的元素。
from collections import OrderedDict d = OrderedDict() #初始化一个字典 d["a"] = 1 d["b"] = 2 d["c"] = 3 print(d) #OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)]) print(d.popitem()) #(‘c‘, 3) print(d) #OrderedDict([(‘a‘, 1), (‘b‘, 2)])
2、move_to_end
移动一个已经存在的元素到OrderDict的元素最后。
from collections import OrderedDict d = OrderedDict() #初始化一个字典 d["a"] = 1 d["b"] = 2 d["c"] = 3 print(d) #OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)]) d.move_to_end("b") print(d) #OrderedDict([(‘a‘, 1), (‘c‘, 3), (‘b‘, 2)])
3、pop
移除指定key值得元素
from collections import OrderedDict d = OrderedDict() #初始化一个字典 d["a"] = 1 d["b"] = 2 d["c"] = 3 print(d) #OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)]) print(d.pop("b")) #2 print(d) #OrderedDict([(‘a‘, 1), (‘c‘, 3)])
七、ChainMap
Chain是将多个dict或者映射组合在一起,从而创建单一的、可更新的视图。比如下面的情况:
d1 = {"a":1,"b":2} d2 = {"c":1,"d":2} #循环打印上面的字典 for k,v in d1.items(): print(k,v) for k,v in d2.items(): print(k,v)
上面的两个字典,分别单独使用for循环打印,如果使用ChainMap就可以这样来做:
from collections import ChainMap d1 = {"a":1,"b":2} d2 = {"c":1,"d":2} d3 = ChainMap(d1,d2) print(d3) #ChainMap({‘a‘: 1, ‘b‘: 2}, {‘c‘: 1, ‘d‘: 2}) for k,v in d3.items(): print(k,v)
注意的是ChainMap对两个字典的合并并非是将其拷贝到另一个空间进行合并,只是对之前的两个字典进行指向。当然除了合并还有其它方法,比如:
1、new_child
返回一个新的ChainMap,其中包含一个新映射,以及当前实例中的所有映射。
from collections import ChainMap d1 = {"a":1,"b":2} d2 = {"c":1,"d":2} d3 = ChainMap(d1,d2) d4 = d3.new_child({"e":5}) #添加新的ChainMap print(d4) # ChainMap({‘e‘: 5}, {‘b‘: 2, ‘a‘: 1}, {‘d‘: 2, ‘c‘: 1})
2、parents
这是一个属性,返回一个新的ChainMap包含当前实例中除了第一个以外所有的maps。
from collections import ChainMap d1 = {"a":1,"b":2} d2 = {"c":1,"d":2} d3 = ChainMap(d1,d2) print(d3.parents) #ChainMap({‘d‘: 2, ‘c‘: 1})
3、maps
这是一个属性,返回的是所有maps组成的列表。
from collections import ChainMap d1 = {"a":1,"b":2} d2 = {"c":1,"d":2} d3 = ChainMap(d1,d2) print(d3.maps) #[{‘b‘: 2, ‘a‘: 1}, {‘d‘: 2, ‘c‘: 1}]