About
RSS

Bit Focus


日常的数据结构 - 动态最值统计与堆

Posted at 2011-10-03 13:01:06 | Updated at 2024-12-24 03:47:34

    如果设计一个顺序统计系统, 需要动态向集合内添加元素, 又可以随时从集合中取得并丢弃最小值. 由于集合中有集合会被移除, 因此接下来再次取最小值时如果重新扫一次集合, 时间开销会相当大. 在一般情形中, 若为了均衡时间开销, 需要考虑维护一个更复杂的数据结构.
    这个数据结构建立在满二叉树 (full binary tree)完全二叉树 (complete binary tree)的概念上.
    "满" 这个字眼提示在树的每一层都摆满了节点, 而这恰好又是个充要条件, 即如果一棵二叉树每一层都堆满了节点, 那么它就是满二叉树. 满二叉树的定义干脆就按节点个数来: 一棵二叉树如果深度为 K, 而拥有 2K-1 个节点, 那么它就是一棵满二叉树. 如下面是 2 层和 3 层满二叉树, 分别拥有 3 个和 7 个节点

                             +---+
                             | a |
    +---+                    +---+
    | a |               .---'     `---.
    +---+            +---+           +---+
   /     \           | b |           | c |
+---+   +---+        +---+           +---+
| b |   | c |       /     \         /     \
+---+   +---+    +---+   +---+   +---+   +---+
                 | d |   | e |   | f |   | g |
                 +---+   +---+   +---+   +---+

    而如果一棵二叉树满足
那么它就是一棵完全二叉树. 更直观一些, 将一个满二叉树的节点按照广度优先 (即逐层向下) 遍历的方式顺序编号, 编号从 1 开始 (而不是从 0 开始), 如

                            +---+
                            | 1 |
                            +---+
               .-----------'     `-----------.
            +---+                           +---+
            | 2 |                           | 3 |
            +---+                           +---+
       .---'     `---.                 .---'     `---.
    +---+           +---+           +---+           +---+
    | 4 |           | 5 |           | 6 |           | 7 |
    +---+           +---+           +---+           +---+
   /     \         /     \         /     \         /     \
+---+   +---+   +---+   +---+   +---+   +---+   +---+   +---+
| 8 |   | 9 |   | A |   | B |   | C |   | D |   | E |   | F |
+---+   +---+   +---+   +---+   +---+   +---+   +---+   +---+

    在这棵树上删除一些编号最大的节点, 得到的子树就是完全二叉树 (满二叉树本身也就是完全二叉树), 如

                            +---+
                            | 1 |
                            +---+
               .-----------'     `-----------.
            +---+                           +---+
            | 2 |                           | 3 |
            +---+                           +---+
       .---'     `---.                 .---'     `---.
    +---+           +---+           +---+           +---+
    | 4 |           | 5 |           | 6 |           | 7 |
    +---+           +---+           +---+           +---+
   /     \         /     \         /
+---+   +---+   +---+   +---+   +---+
| 8 |   | 9 |   | A |   | B |   | C |
+---+   +---+   +---+   +---+   +---+

    如果在编号较大的节点删除之前, 某个小号节点被删去了, 这样的子树则不是完全二叉树, 如

                            +---+                 -.
                            | 1 |                  |
                            +---+                  |
               .-----------'     `-----------.     |
            +---+                           +---+  |
            | 2 |                           | 3 |  | 非满二叉树
            +---+                           +---+  |
       .---'     `---.                 .---'       |
    +---+           +---+           +---+          |
    | 4 |           | 5 |           | 6 |          |
    +---+           +---+           +---+         -'
   /     \         /     \         /     \
+---+   +---+   +---+   +---+   +---+   +---+
| 8 |   | 9 |   | A |   | B |   | C |   | D |
+---+   +---+   +---+   +---+   +---+   +---+

                            +---+
                            | 1 |
                            +---+
               .-----------'     `-----------.
            +---+                           +---+
            | 2 |                           | 3 |
            +---+                           +---+
       .---'     `---.                 .---'     `---.
    +---+           +---+           +---+           +---+
    | 4 |           | 5 |           | 6 |           | 7 |
    +---+           +---+           +---+           +---+
   /     \         /     \         /               /
+---+   +---+   +---+   +---+   +---+           +---+    -.
| 8 |   | 9 |   | A |   | B |   | C |           | E |     | 缺少的节点不连续
+---+   +---+   +---+   +---+   +---+           +---+    -'

    完全二叉树的好处之一在于它是一棵天然平衡的树, 这样就不会出现由于不平衡导致二叉树退化成线性表的情况.

    如果完全二叉树上, 非叶节点元素的值小于或等于其子节点元素的值, 那么称它为最小堆 (minimum heap), 相反, 如果非叶节点元素的值大于或等于其子节点元素的值, 那么称这棵完全二叉树为最大堆 (maximum heap). 在计算机科学领域, 动态分配的内存所在的内存区域也被称之为堆 (heap), 然而这两个是完全不同的概念. 这里要讨论的堆是数据结构, 跟内存分配没有关系. 下面是一个整数最小堆的示例

                           +---+
                           | 2 |
                           +---+
               .----------'     `----------.
            +---+                         +----+
            | 4 |                         | 12 |
            +---+                         +----+
       .---'     `---.                   /      \
    +---+           +---+            +----+    +----+
    | 5 |           | 5 |            | 13 |    | 11 |
    +---+           +---+            +----+    +----+
   /     \         /     \          /
+---+   +---+   +---+   +---+   +----+
| 8 |   | 7 |   | 7 |   | 6 |   | 14 |
+---+   +---+   +---+   +---+   +----+

    回到文章最开始的需求之一, 向集合内添加元素. 新入堆的元素首先必须保证让堆仍然是一棵完全二叉树, 如插入 9 到上面的树中, 那么新入节点应该是完全二叉树下一个较大编号的节点

                           +---+
                           | 2 |
                           +---+
               .----------'     `----------.
            +---+                         +----+
            | 4 |                         | 12 |
            +---+                         +----+
       .---'     `---.                   /      \
    +---+           +---+            +----+    +----+
    | 5 |           | 5 |            | 13 |    | 11 |
    +---+           +---+            +----+    +----+
   /     \         /     \          /      \
+---+   +---+   +---+   +---+   +----+    =====
| 8 |   | 7 |   | 7 |   | 6 |   | 14 |    [ 9 ]
+---+   +---+   +---+   +---+   +----+    =====

    这是非常容易做到的. 但是这样做会破坏堆的性质. 不过好在这种破坏是局部的, 所以进行修正只需要少量的工作. 首先, 9 比其父节点小, 那么交换它和其父节点的值 13

                           +---+
                           | 2 |
                           +---+
               .----------'     `---------.
            +---+                        +----+
            | 4 |                        | 12 |
            +---+                        +----+
       .---'     `---.                  /      \
    +---+           +---+            =====    +----+
    | 5 |           | 5 |            [ 9 ]    | 11 |
    +---+           +---+            =====    +----+
   /     \         /     \          /     \
+---+   +---+   +---+   +---+   +----+   +----+
| 8 |   | 7 |   | 7 |   | 6 |   | 14 |   | 13 |
+---+   +---+   +---+   +---+   +----+   +----+

    这时, 9 仍然比其父节点小, 再来交换 9 和其父节点 12

                           +---+
                           | 2 |
                           +---+
               .----------'     `----------.
            +---+                         +---+
            | 4 |                         | 9 |
            +---+                         +---+
       .---'     `---.                   /     \
    +---+           +---+            +----+   +----+
    | 5 |           | 5 |            | 12 |   | 11 |
    +---+           +---+            +----+   +----+
   /     \         /     \          /      \
+---+   +---+   +---+   +---+   +----+    +----+
| 8 |   | 7 |   | 7 |   | 6 |   | 14 |    | 13 |
+---+   +---+   +---+   +---+   +----+    +----+

    现在堆的规则重新满足, 无需进一步交换了. 每一次交换, 除了一直受到关注的 9, 其它节点交换是否一定合规则呢? 这其中有两种情况, 其一, 9 还是叶节点, 这时交换无须任何顾虑; 其二, 9 非叶节点, 由于这时它在与其父节点交换, 而父节点肯定不小于它的任何直接或间接子节点, 父节点与 9 交换之后进入一个较深的节点时, 肯定还能满足这一点. 所以每次交换都将是合法的.
    那么, 最小堆的插入 - 修正过程可以简述为

    而从堆中弹出最小值节点, 也就是根节点. 与插入类似, 要让根节点弹出后, 满足二叉树的性质, 便于以后修正堆的性质. 要做到这一点, 须将根节点与最后一个子节点交换, 再移除之, 就不会破坏完全二叉树结构了.

                           ======
                           [ 13 ] <-------------------.
                           ======                     |
               .----------'      `---------.          |
            +---+                         +---+       |
            | 4 |                         | 9 |       |
            +---+                         +---+       |
       .---'     `---.                   /     \      |
    +---+           +---+            +----+   +----+  |
    | 5 |           | 5 |            | 13 |   | 11 |  |
    +---+           +---+            +----+   +----+  |
   /     \         /     \          /                 |
+---+   +---+   +---+   +---+   +----+     - -        |
| 8 |   | 7 |   | 7 |   | 6 |   | 14 |    : 2 : <-----'
+---+   +---+   +---+   +---+   +----+     - -

    现在, 要将目前的根节点下降到一个合理的位置. 方法是, 每次让它与它子节点中较小的那个交换, 直到到达叶节点, 或一个合适的位置. 首先 13 跟左子节点 4 交换

                            +---+
                            | 4 |
                            +---+
                .----------'     `---------.
            ======                        +---+
            [ 13 ]                        | 9 |
            ======                        +---+
       .---'      `--.                   /     \
    +---+           +---+            +----+   +----+
    | 5 |           | 5 |            | 13 |   | 11 |
    +---+           +---+            +----+   +----+
   /     \         /     \          /
+---+   +---+   +---+   +---+   +----+
| 8 |   | 7 |   | 7 |   | 6 |   | 14 |
+---+   +---+   +---+   +---+   +----+

    现在两个子节点值相同, 就换到左边吧

                           +---+
                           | 4 |
                           +---+
               .----------'     `-----------.
            +---+                          +---+
            | 5 |                          | 9 |
            +---+                          +---+
        .--'     `----.                   /     \
    ======           +---+            +----+   +----+
    [ 13 ]           | 5 |            | 13 |   | 11 |
    ======           +---+            +----+   +----+
   /      \         /     \          /
+---+    +---+   +---+   +---+   +----+
| 8 |    | 7 |   | 7 |   | 6 |   | 14 |
+---+    +---+   +---+   +---+   +----+

    最后再跟较小的 7 交换, 于是完全二叉树变回了一个堆

                            +---+
                            | 4 |
                            +---+
               .-----------'     `----------.
            +---+                          +---+
            | 5 |                          | 9 |
            +---+                          +---+
       .---'     `----.                   /     \
    +---+            +---+            +----+   +----+
    | 7 |            | 5 |            | 13 |   | 11 |
    +---+            +---+            +----+   +----+
   /     \          /     \          /      \
+---+   +----+   +---+   +---+   +----+    +----+
| 8 |   | 13 |   | 7 |   | 6 |   | 14 |    | 13 |
+---+   +----+   +---+   +---+   +----+    +----+

    同样, 除了一直被聚焦的节点 13, 其它与之交换的节点在交换过程中能够一直保持堆的约束.
    堆的移除 - 修正过程简单来说就是
    对于堆而言, 无论是弹出最小值还是添加新节点, 堆的操作时间消耗都大致与节点数目的对数成比例. 而每次取最小值重新扫一次集合这种充满暴力美学的方法, 虽然在连续弹出最小值时开销较大, 但在加入新元素是能极快完成. 可见堆并不是一种完美的替代品, 它只是另一种应该被考虑的方案.

Post tags:   Data Structure  Heap  Algorithm  Order Statistic

Leave a comment:




Creative Commons License Your comment will be licensed under
CC-NC-ND 3.0


. Back to Bit Focus
NijiPress - Copyright (C) Neuron Teckid @ Bit Focus
About this site