日常的数据结构 - 堆的实现与第 K 最小堆

在前一篇文章中纸上谈了堆的性质以及如何在插入元素和弹出最值时保持这些性质. 这篇文章将聊聊实现方式.
从实现的角度来说, 使用完全二叉树作为堆的前提的好处是, 完全二叉树非常容易实现, 甚至可以说是最容易实现的二叉树. 由于完全二叉树的节点编号是连续的, 那么它可以被拉平, 放进一个日常的数组中, 如

+---+ | 4 | +---+ / \ +---+ +---+ | 5 | | 9 | +---+ +---+ / \ +---+ +---+ | 8 | | 5 | +---+ +---+

这样一棵完全二叉树可以被转换成

.----. /--. \ +---+---+---+---+---+---+----- | - | 4 | 5 | 9 | 8 | 5 | ... +---+---+---+---+---+---+----- \______/ / \________/

其中的线连接节点与它们的子节点. 如果用节点的编号来标识这个数组, 则会是

.----. /--. \ +---+---+---+---+---+---+----- | 0 | 1 | 2 | 3 | 4 | 5 | ... +---+---+---+---+---+---+----- \______/ / \________/

这里有个很奇妙的性质, 索引为 i 的节点, 它的左子节点的索引是 2i, 而右子节点的索引是 2i+1, 其父节点索引则是 floor(i/2) (根节点除外). 如果用 0 号节点而不是 1 号节点存储根节点呢? 如

.----. /--. \ +---+---+---+---+---+---+----- | - | 0 | 1 | 2 | 3 | 4 | ... +---+---+---+---+---+---+----- \______/ / \________/

也能很容易计算, 索引为 i 的节点, 左子节点索引是 2i+1, 右子节点索引是 2i+2, 父节点索引是 floor((i-1)/2). 似乎也没什么太大区别. 不过, 之前那种计算方式的好处在于, 2i, 2i+1, i/2 这样的算式都能换成极快的位运算: 2i 等效于 i << 1, 2i+1 等效于 (i << 1) | 1, floor(i/2) 等效于 i >> 1, 这还能提供一丁点效率优化 (和一部分代码混乱程度加成, 以及大量的极客自豪感上升).
既然堆的逻辑结构是数组, 那么可以采用 std::vector 作为存储数据结构. 此外, 将比较方式以模板形式抽出, 这样可以构造一个抽象的最值堆, 而不是死板的最大堆或者最小堆. 下面是堆的框架

template

class heap {
    std::vector<_T> array;
    _Less const less;
    typedef typename std::vector<_T>::size_type size_type;
public:
    heap()
        : array(1) /* insert a placeholder, array[0] */
        , less(_Less())
    {}

    void push(_T const& value);
    _T pop();
};

向堆中加入元素实现为插入并修正, 修正的过程, 也就是向父节点方向移动, 是递归实现的

template
class heap {
    std::vector<_T> array;
    _Less const less;
    typedef typename std::vector<_T>::size_type size_type;
public:
    heap()
        : array(1)
        , less(_Less())
    {}

    void push(_T const& value)
    {
        array.push_back(value);
        shift_up(array.size() - 1);
    }

    _T pop();
private:
    void shift_up(size_type index)
    {
        size_type parent_index = index >> 1;
        if (index != 1 && less(array[index], array[parent_index])) {
            std::swap(array[index], array[parent_index]);
            shift_up(parent_index);
        }
    }
};

这里要求传入模板的类型参数 _Less 类型含有函数
bool operator()(_T const& lhs, _T const& rhs) const;
返回值表示前面的参数是否小于后一个参数.
接下来是弹出最值, 因为向更深层移动节点会牵扯到很多索引越界的检查, 因此代码看起来会繁琐一些. 仍然采用递归实现

template
class heap {
    std::vector<_T> array;
    _Less const less;
    typedef typename std::vector<_T>::size_type size_type;
public:
    heap()
        : array(1)
        , less(_Less())
    {}

    void push(_T const& value)
    {
        array.push_back(value);
        shift_up(array.size() - 1);
    }

    _T pop()
    {
        _T result = array[1];
        std::swap(array[1], array.back());
        array.pop_back();
        shift_down(1);
        return result;
    }
private:
    void shift_up(size_type index)
    {
        size_type parent_index = index >> 1;
        if (index != 1 && less(array[index], array[parent_index])) {
            std::swap(array[index], array[parent_index]);
            shift_up(parent_index);
        }
    }

    void shift_down(size_type index)
    {
        size_type lchild_index = index << 1;
        size_type rchild_index = (index << 1) | 1;
        if (array.size() <= index || array.size() <= lchild_index) {
            return;
        }

        size_type min_child_index = 0;
        if (array.size() <= rchild_index) {
            min_child_index = lchild_index;
        } else {
            min_child_index = less(array[lchild_index], array[rchild_index])
                                ? lchild_index : rchild_index;
        }

        if (less(array[min_child_index], array[index])) {
            std::swap(array[min_child_index], array[index]);
            shift_down(min_child_index);
        }
    }
};

STL 中当然自带堆, 不过名字并不叫堆, 而叫优先队列 (std::priorty_queue), 它除了像上述实现的堆支持数据类型和比较方式两个模板参数, 还能设置底层数据结构 (默认为 std::vector), 其声明为

template
        , typename _Compare = std::less >
class std::priority_queue;

使用默认的比较函数对象类型得到的是一个最大堆, 即

std::priority_queue heap;
heap.push(1);
heap.push(2);
heap.push(0);
std::cout << heap.top() << std::endl;

输出将是最大值 2, 而如下则可以得到一个整数最小堆
typedef std::priority_queue, std::greater_equal > minheap;

堆可以以很均衡的代价从一个动态集合中获得最值, 但是这还不够拉风, 怎么改进一下, 能够以差不多均衡的代价从一个动态集合中取得第 K 最小值呢?
一个堆干这个事情可能很困难, 那么就来两个吧. 将一个最大堆和一个最小堆放在一起, 并且将较小的 K 个元素维护在最大堆中, 其它元素放入最小堆, 这时最小堆的堆顶就是第 K 最小值了. 还是用 std::priority_queue 而不是之前文中实现的山寨货来构造这样一个玩意儿

/* binary_function 逆转 adaptor
   将 binary_function 得到的返回值求非并返回 */
template
struct binary_negate {
    _CompareBinaryFn fn;

    binary_negate()
        : fn(_CompareBinaryFn())
    {}

    explicit binary_negate(_CompareBinaryFn const& f)
        : fn(f)
    {}

    bool operator()(_Tp const& lhs, _Tp const& rhs)
    {
        return !fn(lhs, rhs);
    }
};

template
class kth_heap {
    /* 较小的一堆数量为 _K 的元素, 放置在最大堆中 */
    std::priority_queue<_Tp, std::vector<_Tp>, _Compare > lessers;
    /* 较小的一堆元素, 放置在最小堆中 */
    std::priority_queue<_Tp, std::vector<_Tp>, binary_negate<_Tp, _Compare> > greaters;
public:
    void push(_Tp const& value)
    {
        lessers.push(value);
        if (_K < lessers.size()) {
            greaters.push(lessers.top());
            lessers.pop();
        }
    }

    _Tp pop()
    {
        _Tp result(greaters.top());
        greaters.pop();
        return result;
    }
};

/* 特化 _K 为 0, 也就是单纯的最小堆的情形 */
template
class kth_heap<_Tp, 0, _Compare>
{
    std::priority_queue<_Tp, std::vector<_Tp>, binary_negate<_Tp, _Compare> > heap;
public:
    void push(_Tp const& value)
    {
        heap.push(value);
    }

    _Tp pop()
    {
        _Tp result(heap.top());
        heap.pop();
        return result;
    }
};

	Back to Tobary book
	Tobary book - Copyright (C) ZhePlus @ Bit Focus
	About