查询数据结构

ziruoxiangyi

2012-03-01

关注关注

http://blog.csdn.net/v_JULY_v/article/details/6530142

http://www.51testing.com/html/19/n-238119.html

B树特点：

1 多路查找平衡树，树的深度比二叉树小。

2 每个阶段有N个元素，N+1个子节点。N个元素有序，子节点Pn的元素上下界介于K(n-1)和Kn之间。

3 每个节点包含的元素个数有边界要求，边界要求为阶的函数，阶为最大子节点数量。

4 当增、删结果导致节点元素个数不符合要求时，那么要进行维护，节点分裂、元素转移、指针转移等操作来维护B树结构。

5 B树操作时间复杂度为O(lgN)。主要用于外存储设备的操作，减少IO。

查询数据结构

B树中的每个结点根据实际情况可以包含大量的关键字信息和分支，一般来说是对应一个磁盘块(当然是不能超过磁盘块的大小，根据磁盘驱动(disk drives)的不同，一般块的大小在1k~4k左右)；这样树的深度降低了，这就意味着查找一个元素只要很少结点从外存磁盘中读入内存，很快访问到要查找的数据。

查询数据结构

如果是一个运行时间很长的b树，那么几乎所有的请求，都是随机io。因为磁盘块本身已经不再连续，很难保证可以顺序读取。磁盘本身是一个顺序读写快，随机读写慢的系统，基本读写单位是块，由磁道、柱面、扇区三个坐标定位。参考http://my.oschina.net/xiangxw/blog/11288

B+树：

1 节点包含N个元素，N个子节点。非叶子节点本身不包含关键字具体信息，只包含索引信息。因此，同一个节点，B+树能够比B树存储更多索引信息，减少IO次数。非叶子节点的关键字大小是其孩子节点中最大的（或最小的）。

2 关键字信息要达到叶子节点才能取得到，因此每次查询的效率十分稳定，都是到叶子节点才能结束。叶子节点包含所有关键字信息，并且有序。

3 B+树对区间查询的支持更好，因为关键字全部存储在叶子节点中，并且有序，叶子节点连环船。区间查询的效果会因为节点分裂导致，所在的物理存储块不连续而变差，所以需要磁盘优化。

4 B+树比B 树更适合实际应用中操作系统的文件索引和数据库索引

查询数据结构

查找38号员工例子查询数据结构

B+树还有一个最大的好处，方便扫库，B树必须用中序遍历的方法按序扫库，而B+树直接从叶子结点挨个扫一遍就完了，B+树支持range-query非常方便，而B树不支持。这是数据库选用B+树的最主要原因。

比如要查 5-10之间的，B+树一把到5这个标记，再一把到10，然后串起来就行了，B树就非常麻烦。B树的好处，就是成功查询特别有利，因为树的高度总体要比B+树矮。不成功的情况下，B树也比B+树稍稍占一点点便宜。

In addition, they also enable you to do range scans very efficiently, since the leaf nodes

in the tree are linked and represent an in-order list of all keys, avoiding more costly tree

traversals. That is one of the reasons why they are used for indexes in relational database

systems.

Log-Structured-Merge Tree(LSM Tree 概括的说写进数据的时辰先写到内存中的数据布局中，再排序下，然后再批量merge到磁盘中)

1 当写读比例很大的时候，LSM树相比于B树有更好的性能。因为随着insert操作，为了维护B树结构，节点分裂。读磁盘的随机读写概率会变大，性能会逐渐减弱。而LSM，具有批量特性，存储延迟，C0为内存组件，Cn为磁盘组件（Cn可以有多个，并且是从小到大，逐个合并），当C0 size达到阀值时，才会把insert批量刷新到磁盘。多次单页随机写，变成一次多页随机写。复用了磁盘寻道时间，极大提升效率。(C0和Cn都是有序树，查询效率为logN)

The time to complete a [read of a single page] could be estimated to be about 20ms (assumes 10ms seek, 8.3ms rotational delay, 1.7ms read) . . . The time to perform a se-quential prefetch read [of 64 contiguous pages] could be estimated to be about 125ms (assumes10ms seek, 8.3ms rotational delay, 106.9ms read of 64 records [pages]), or about 2 ms per page." Thus the ratio of 2 ms per page for multi-page block I/O to 20 ms for random I/O im-plies a ratio of rental costs for the disk arm, COSTπ/COSTP, equal to about 1/10. An analysis ofa more recent SCSI-2 disk read of a 4 KByte page gives us a 9 ms seek, 5.5 ms rotational delay,and 1.2 ms read, totalling 16 ms. Reading 64 contiguous 4 KByte pages requires a 9 ms seek,5.5 ms rotational delay, and 80 ms read for 64 pages, or a total of 95 ms, about 1.5 ms/page.Once again COSTπ/COSTP is again equal to about 1/10.

B+ trees work well until there are too many modifications, because they force you to perform costly optimizations to retain that advantage for a limited amount of time. The more and faster you add data at random locations, the faster the pages become fragmented again. Eventually, you may take in data at a higher rate than the optimization process takes to rewrite the existing files. The updates and deletes are done at disk seek（磁盘寻道） rates, rather thandisk transfer（磁盘传输） rates. LSM-trees work at disk transfer rates and scale much better to handle large amounts of data. They also guarantee a very consistent insert rate, as they transform random writes into sequential writes using the logfile plus in-memory store. The reads are in-dependent from the writes, so you also get no contention between these two operations.

http://zuroc.42qu.com/10038695

lsm译文

http://duanple.blog.163.com/blog/static/7097176720120391321283/

ziruoxiangyi

0 关注 0 粉丝 0 动态

关注关注

安科网

查询数据结构

ziruoxiangyi

ziruoxiangyi

ziruoxiangyi