Study notes for Btree and Rtree
摘要：Study notes for Btree and Rtree
Btree
 Btree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
 Btrees are balanced search trees: height for the worst case, where t >2 is the order of tree, i.e., the maximum number of pointers for each node.
 Note that t is typically set so that one node fits into one disk block or page.
 Btree is a generalization of a binary search tree (i.e., a multiway tree) in that a node can have more than two children.
 Similar to redblack trees, but show better performance on disk I/O operations.
 Binary trees may be useful for rapid searching in main memory, but not appropriate for data stored on disks.
 When accessing data on a disk, an entire block (or page) is input at once, so it makes sense to design the tree so that each node essentially occupies one entire block.
 Btree is optimized for systems that read and write large blocks of data. Btrees (and its variants) are commonly used in databases and file systems.
 Structure: every node x has four fields
 The number of keys currently stored in node x, i.e., n, which is between [t/2]1 and t1.
 The n keys themselves, stored in nondecreasing order:
 A boolean value
 n+1 pointers: to its children, represented by:
 Properties:
 All leaves have the same height, which is the tree's height h.
 Btree guarantees a storage utilization of at least 50%, i.e., at least half of each allocated page actually stores index entries.
 There are upper and lower bounds on the number of keys on a node.
 Lower bound: every node other than root must have at least t1 keys => at least t children
 Upper bound: every node can contain at most 2t1 keys => every internal node has at most 2t children.
 Example is shown as follows:
 Conventions:
 Root of Btree is always in main memory
 Any node pased as parameter must have had a DiskRead operation performed on them.
B+tree
 A Btree is very efficient with respect to search and modification operations that involve a single record.
 But it is not particularly suited for sequential operations nor for range searches, B+tree is to solve this issue.
 B+tree is the most widely used index structure for databases.
 Main idea:
 The leaf nodes contain all the key values (and the associated information)
 The internal nodes (organized as a Btree) store some separators which have the only function of determining the path to follow during searching
 The leaf nodes are linked in a (doubly linked) list, in order to efficiently support range searches or sequential searches.
 Comparison with Btrees:
 The search of a single key value is in general more expensive in a B+tree because we have always to reach a leaf node to fetch the pointer to the data file.
 For operations requiring an ordering of the retrieved records according to the search key values or for range queries, the B+tree is to be preferred.
 The Btree requires less storage since the key values are stored only once.
B*tree
 B*tree is a variation of the B+tree where the storage utilization for nodes must be at least 66% (2/3) instead of 50%.
 The nonroot and nonleaf nodes of B*tree contain pointers to sibling nodes.
Rtree
 The Btree and its variants are useful to index and search data in onedimensional space (where data is stored on disks rather than main memory). The basic idea is to separate a line into several segments and gradually reduce to the minimum segment where the searched data is located, illustrated as follows (figure is obtained from July et al.'s blog):
 However, for highdimensional data, Btree and its variants are not efficient. Other tree index structures such as Rtree, kdtree are more suited in this case.
 Rtree is a generalization of Btree for indexing and searching multidimensional data such as geographical coordinates, rectangles or polygons.
 A commonly realworld usage for an Rtree might be to store spatial objects such as restaurant locations, or the polygons that typical maps are made of scuh as streets, buildings, outlines of lakes, coastlines, etc, and then find answers quickly to queries such as "Find all museums within 2km of my current location". => It is useful for map.
 Main points:
 The key idea is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree.
 At the leaf level, each rectangle describes a single object; at higher levels, the aggregation of an increasing number of objects.
 Rtree is a balanced search tree (i.e., all leaf nodes are at the same height), organizes the data in pages, and is designed for storage on disk (as used in databases).
 Rtree only guarantees a minimum usage of 3040%. The reason is the more complex balancing required for spatial data as opposed to linear data stored in Btrees.
 The key difficulty of Rtree is to build an efficient tree that on one hand is balanced, on the other hand the rectangles do not cover too much empty space and do not overlap too much.
 A typical Rtree is represented as follows (figure is originally from Wikipedia).
References
 Btree: http://en.wikipedia.org/wiki/Btree
 Rtree: http://en.wikipedia.org/wiki/Rtree
 Lecture notes, CMSC 420, Btrees
 Andreas Kaltenbrunner et al., Btrees
 Other online tutorials
 July et al. 从B 树、B+ 树、B* 树谈到R 树
顶
踩

JavaScript成为网络霸主的“绯闻”
JavaScript正凭借新型工具与功能提升以极度夸张的速度吞噬整个世界。我们是否应该接受这一无法逆转的趋势？ 还记得那些旧日往事吗？很多用户因为担心安全问题而在浏览器中禁用JavaScript。如...

C++是垃圾语言？！
Linux之父对C++进行了炮轰，说它是糟糕程序员的垃圾语言，可谓是一石激起千层浪，引起众多程序员朋友的关注和讨论。本专题给出正反方的讨论观点，供大家评说！另外，给出了编程语言的发展状况...

MySQL数据库入门与精通
MySQL是一个跨平台的开源关系型数据库管理系统，目前MySQL被广泛地应用在Internet上的中小型网站中。由于其体积小、速度快、总体拥有成本低，尤其是开放源码这一特点，许多中小型网站为了降低...
最新资讯
 SQL教程之SQL NOW() 函数20170714 17:48
 SQL教程之SQL ROUND() 函数20170714 17:48
 SQL教程之SQL LEN() 函数20170714 17:48
 SQL教程之SQL MID() 函数20170714 17:48
 SQL教程之SQL LCASE() 函数20170714 17:48
 SQL教程之SQL UCASE() 函数20170714 17:48
 SQL教程之SQL HAVING 子句20170714 17:48
 SQL教程之SQL 总结20170714 17:48
 SQL教程之SQL 主机20170714 17:48
 SQL教程之SQL 快速参考20170714 17:48
热点文章
 24小时
 一周
 本月
相关热门文章
 本文暂无Tags标签
0 »