pycookbook-数据结构

解压序列赋值给多个变量

In [32]: b = ['qwe','asd']

In [33]: q,a = b

In [34]: a
Out[34]: 'asd'

In [35]: q
Out[35]: 'qwe'

如果变量数跟序列元素数不一致,会出现异常

In [36]: q,a,z = b
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-36-5bedc90bd5b2> in <module>()
----> 1 q,a,z = b

ValueError: not enough values to unpack (expected 3, got 2)

使用*号解压

In [38]: a = [44,55,22,6,7,99,12,20]

In [39]: *z,x,c = a

In [40]: z
Out[40]: [44, 55, 22, 6, 7, 99]

In [41]: x
Out[41]: 12

In [42]: c
Out[42]: 20

In [43]: q,w,*e = a

In [44]: q
Out[44]: 44

In [45]: w
Out[45]: 55

In [46]: e
Out[46]: [22, 6, 7, 99, 12, 20]

查找最大或者最小的n个元素

heapq模块

In [48]: a
Out[48]: [44, 55, 22, 6, 7, 99, 12, 20]

In [49]: import heapq

In [50]: heapq.nlargest(3,a)
Out[50]: [99, 55, 44]

In [51]: heapq.nsmallest(3,a)
Out[51]: [6, 7, 12]

另外一种较复杂的情况

In [52]: portfolio = [
    ...: {'name': 'IBM', 'shares': 100, 'price': 91.1},
    ...: {'name': 'AAPL', 'shares': 50, 'price': 543.22},
    ...: {'name': 'FB', 'shares': 200, 'price': 21.09},
    ...: {'name': 'HPQ', 'shares': 35, 'price': 31.75},
    ...: {'name': 'YHOO', 'shares': 45, 'price': 16.35},
    ...: {'name': 'ACME', 'shares': 75, 'price': 115.65}
    ...: ]
    ...: cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
    ...: expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])
    ...:

In [53]: cheap
Out[53]:
[{'name': 'YHOO', 'price': 16.35, 'shares': 45},
 {'name': 'FB', 'price': 21.09, 'shares': 200},
 {'name': 'HPQ', 'price': 31.75, 'shares': 35}]

In [54]: expensive
Out[54]:
[{'name': 'AAPL', 'price': 543.22, 'shares': 50},
 {'name': 'ACME', 'price': 115.65, 'shares': 75},
 {'name': 'IBM', 'price': 91.1, 'shares': 100}]

字典中的键映射多个值

In [55]: from collections import defaultdict

In [56]: d = defaultdict(list)

In [57]: d['a'].append(1)

In [58]: d['a'].append(2)

In [59]: d['a'].append(3)

In [60]: d
Out[60]: defaultdict(list, {'a': [1, 2, 3]})

In [61]: d.get('a')
Out[61]: [1, 2, 3]

字典排序

collections 模块中的OrderedDict 类。在迭代操作的时候它会保持元素被插入时的顺序

In [86]: d = {}

In [87]: d['foo'] = 1
    ...: d['bar'] = 2
    ...: d['spam'] = 3
    ...: d['grok'] = 4
    ...:

In [88]: d
Out[88]: {'bar': 2, 'foo': 1, 'grok': 4, 'spam': 3}

In [89]: dd = OrderedDict()

In [90]: dd['foo'] = 1

In [91]: dd['bar'] = 2

In [92]: dd['spam'] = 3

In [93]: dd['grok'] = 4

In [94]: dd
Out[94]: OrderedDict([('foo', 1), ('bar', 2), ('spam', 3), ('grok', 4)])

查找两字典的相同点

在两个字典的 keys()和items()方法返回结果上执行集合操作

In [3]: a = {
   ...: 'x' : 1,
   ...: 'y' : 2,
   ...: 'z' : 3
   ...: }

In [5]: b = {
   ...: 'w' : 10,
   ...: 'x' : 11,
   ...: 'y' : 2
   ...: }

In [6]: a.keys() & b.keys()
Out[6]: {'x', 'y'}

In [7]: a.keys() - b.keys()
Out[7]: {'z'}

In [8]: b.keys() - a.keys()
Out[8]: {'w'}

In [9]: a.items() & b.items()
Out[9]: {('y', 2)}

构造一个排除几个指定键的新字典,可以通过字典推导实现

In [10]: a
Out[10]: {'x': 1, 'y': 2, 'z': 3}

In [11]: c = {key:a[key] for key in a.keys()-{'x'}}

In [12]: c
Out[12]: {'y': 2, 'z': 3}

序列中出现次数最多的元素

In [16]: words = [
    ...: 'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
    ...: 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
    ...: 'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into',
    ...: 'my', 'eyes', "you're", 'under'
    ...: ]

In [17]: from collections import Counter

In [18]: word_counts = Counter(words)

In [19]: word_counts
Out[19]:
Counter({'around': 2,
         "don't": 1,
         'eyes': 8,
         'into': 3,
         'look': 4,
         'my': 3,
         'not': 1,
         'the': 5,
         'under': 1,
         "you're": 1})

In [20]: top_3 = word_counts.most_common(3)

In [21]: top_3
Out[21]: [('eyes', 8), ('the', 5), ('look', 4)]

可以看到word_counts.most_common()返回的是一个排好序的列表

In [27]: word_counts.most_common()
Out[27]:
[('eyes', 8),
 ('the', 5),
 ('look', 4),
 ('into', 3),
 ('my', 3),
 ('around', 2),
 ('not', 1),
 ("don't", 1),
 ("you're", 1),
 ('under', 1)]

通过这个列表 我们还可以取到 出现词数最少的n个元素

In [28]: word_counts.most_common()[:-3:-1]
Out[28]: [('under', 1), ("you're", 1)]

通过关键字 排序字典列表

通过使用 operator 模块的 itemgetter 函数来对字典列表进行排序

In [29]: rows = [
    ...: {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
    ...: {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
    ...: {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
    ...: {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
    ...: ]

In [30]: from operator import itemgetter

In [31]: rows_by_fname = sorted(rows,key=itemgetter('fname'))

In [32]: rows_by_fname
Out[32]:
[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004},
 {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
 {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
 {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]

In [33]: rows_by_uid = sorted(rows,key=itemgetter('uid'))

In [34]: rows_by_uid
Out[34]:
[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
 {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
 {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
 {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]

In [35]: rows_by_uid = sorted(rows,key=itemgetter('uid'),reverse=True)

In [36]: rows_by_uid
Out[36]:
[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004},
 {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
 {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
 {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]

排序不支持原生比较的对象

In [38]: class User:
    ...:     def __init__(self, user_id):
    ...:         self.user_id = user_id
    ...:     def __repr__(self):
    ...:         return 'User({})'.format(self.user_id)
    ...:

In [39]: users = [User(23), User(3), User(99)]

In [40]: from operator import attrgetter

In [41]: sorted(users, key=attrgetter('user_id'))
Out[41]: [User(3), User(23), User(99)]

通过某个字段将记录分组

先排序,再分组

In [49]: from itertools import groupby

In [50]: from operator import itemgetter

In [51]: rows = [
    ...: {'address': '5412 N CLARK', 'date': '07/01/2012'},
    ...: {'address': '5148 N CLARK', 'date': '07/04/2012'},
    ...: {'address': '5800 E 58TH', 'date': '07/02/2012'},
    ...: {'address': '2122 N CLARK', 'date': '07/03/2012'},
    ...: {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
    ...: {'address': '1060 W ADDISON', 'date': '07/02/2012'},
    ...: {'address': '4801 N BROADWAY', 'date': '07/01/2012'},
    ...: {'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
    ...: ]

In [52]: rows.sort(key=itemgetter('date'))

In [53]: rows
Out[53]:
[{'address': '5412 N CLARK', 'date': '07/01/2012'},
 {'address': '4801 N BROADWAY', 'date': '07/01/2012'},
 {'address': '5800 E 58TH', 'date': '07/02/2012'},
 {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
 {'address': '1060 W ADDISON', 'date': '07/02/2012'},
 {'address': '2122 N CLARK', 'date': '07/03/2012'},
 {'address': '5148 N CLARK', 'date': '07/04/2012'},
 {'address': '1039 W GRANVILLE', 'date': '07/04/2012'}]

In [54]: for date, items in groupby(rows, key=itemgetter('date')):
    ...:     print(date)
    ...:     for i in items:
    ...:         print(' ', i)
    ...:
07/01/2012
  {'address': '5412 N CLARK', 'date': '07/01/2012'}
  {'address': '4801 N BROADWAY', 'date': '07/01/2012'}
07/02/2012
  {'address': '5800 E 58TH', 'date': '07/02/2012'}
  {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'}
  {'address': '1060 W ADDISON', 'date': '07/02/2012'}
07/03/2012
  {'address': '2122 N CLARK', 'date': '07/03/2012'}
07/04/2012
  {'address': '5148 N CLARK', 'date': '07/04/2012'}
  {'address': '1039 W GRANVILLE', 'date': '07/04/2012'}

过滤列表元素

最简单的列表推导

In [1]: mylist = [1, 4, -5, 10, -7, 2, 3, -1]

In [2]: new_list = [n for n in mylist if n > 0]

In [3]: new_list
Out[3]: [1, 4, 10, 2, 3]

但是这种方法如果输入非常大的话,会比较消耗资源,另外一个方法是通过生成器迭代产生过滤的元素

In [10]:  new_gen = (n for n in mylist if n > 0)

In [11]: new_gen
Out[11]: <generator object <genexpr> at 0x0000000005A90678>

In [23]: for i in new_gen:
    ...:     print(i)
    ...:
1
4
10
2
3

从字典中提取子集

跟列表生成很相似

In [24]: prices = {
    ...: 'ACME': 45.23,
    ...: 'AAPL': 612.78,
    ...: 'IBM': 205.55,
    ...: 'HPQ': 37.20,
    ...: 'FB': 10.75
    ...: }

In [25]: p1 = dict((key,value) for key,value in prices.items() if value > 200)

In [26]: p1
Out[26]: {'AAPL': 612.78, 'IBM': 205.55}

合并多个字典或映射

In [30]: from collections import ChainMap

In [31]: a = {'x': 1, 'z': 3 }
    ...: b = {'y': 2, 'z': 4 }
    ...:

In [32]: c = ChainMap(a,b)

In [33]: c
Out[33]: ChainMap({'x': 1, 'z': 3}, {'y': 2, 'z': 4})

In [34]: c['x']
Out[34]: 1

In [35]: c['y']
Out[35]: 2

In [36]: c['z']
Out[36]: 3

一个 ChainMap 接受多个字典并将它们在逻辑上变为一个字典。然后,这些字典并
不是真的合并在一起了, ChainMap 类只是在内部创建了一个容纳这些字典的列表并重
新定义了一些常见的字典操作来遍历这个列表。

pycookbook-数据结构》上有1条评论

发表评论

电子邮件地址不会被公开。