大家好，我是曲镇。今天想分享一下关于堆的相关内容。通常我们学习一个新的知识先从 它是什么,特性是什么，可以用来做什么，应用场景是什么 开始了解。附上本文的目录：

什么是堆

介绍

堆(heap) 是一种可以迅速从一堆数中找到最大值或最小值的数据结构

大顶堆

大顶堆(max heap) 的每一个节点的值都大于等于它的两个子节点的值，根节点的值是最大值

小顶堆

小顶堆(min heap) 的每一个节点的值都小于等于它的两个子节点的值，根节点的值是最小值

特性

操作的时间复杂度：

find: O(1)
delete: O(logN)
insert: O(logN) (斐波那契堆是O(1))

堆的种类

堆的种类很多包括二叉堆，斐波那契堆，二项式堆等，这里我们重点介绍一种简单的堆：二叉堆

二叉堆 binary heap

性质：

是一棵完全二叉树，其数据一般存放在数组中
树中任意结点的值总是 >= 其子结点的值（这是大顶堆，小顶堆则相反）

堆的用途

堆的应用

堆排序

利用 go 内置 heap 实现堆排序

package main

import (
	"container/heap"
	"fmt"
)

type heapInt []int
func (h heapInt) Len() int {
	return len(h)
}

func (h heapInt) Less(i, j int) bool {
	return h[i] < h[j] // min heap, max heap is: return h[i] > h[j]
}

func (h heapInt) Swap(i, j int) {
	h[i], h[j] = h[j], h[i]
}

func (h *heapInt) Push(x interface{}) {
	*h = append(*h, x.(int))
}

func (h *heapInt) Pop() interface{} {
	old := *h
	n := len(old)
	x := old[n-1]
	*h = old[0 : n-1]
	return x
}


func main() {
	h := heapInt{12, 3, 45, 23, 65, 43, 10}
	heap.Init(&h)
	for h.Len() > 0 {
		fmt.Println(heap.Pop(&h))
	}
}

优先队列

思路：

设置一个变量 priority，表示优先级别
根据 priority 进行堆排序
通过 pop 依次取出优先级别较高的数据

具体事例：

引自：golang.org/pkg/contain…

package main

import (
	"container/heap"
	"fmt"
)

// An Item is something we manage in a priority queue.
type Item struct {
	value    string // The value of the item; arbitrary.
	priority int    // The priority of the item in the queue.
	// The index is needed by update and is maintained by the heap.Interface methods.
	index int // The index of the item in the heap.
}

// A PriorityQueue implements heap.Interface and holds Items.
type PriorityQueue []*Item

func (pq PriorityQueue) Len() int { return len(pq) }

func (pq PriorityQueue) Less(i, j int) bool {
	// We want Pop to give us the highest, not lowest, priority so we use greater than here.
	return pq[i].priority > pq[j].priority
}

func (pq PriorityQueue) Swap(i, j int) {
	pq[i], pq[j] = pq[j], pq[i]
	pq[i].index = i
	pq[j].index = j
}

func (pq *PriorityQueue) Push(x interface{}) {
	n := len(*pq)
	item := x.(*Item)
	item.index = n
	*pq = append(*pq, item)
}

func (pq *PriorityQueue) Pop() interface{} {
	old := *pq
	n := len(old)
	item := old[n-1]
	old[n-1] = nil  // avoid memory leak
	item.index = -1 // for safety
	*pq = old[0 : n-1]
	return item
}

// update modifies the priority and value of an Item in the queue.
func (pq *PriorityQueue) update(item *Item, value string, priority int) {
	item.value = value
	item.priority = priority
	heap.Fix(pq, item.index)
}

// This example creates a PriorityQueue with some items, adds and manipulates an item,
// and then removes the items in priority order.
func main() {
	// Some items and their priorities.
	items := map[string]int{
		"banana": 3, "apple": 2, "pear": 4,
	}

	// Create a priority queue, put the items in it, and
	// establish the priority queue (heap) invariants.
	pq := make(PriorityQueue, len(items))
	i := 0
	for value, priority := range items {
		pq[i] = &Item{
			value:    value,
			priority: priority,
			index:    i,
		}
		i++
	}
	heap.Init(&pq)

	// Insert a new item and then modify its priority.
	item := &Item{
		value:    "orange",
		priority: 1,
	}
	heap.Push(&pq, item)
	pq.update(item, item.value, 5)

	// Take the items out; they arrive in decreasing priority order.
	for pq.Len() > 0 {
		item := heap.Pop(&pq).(*Item)
		fmt.Printf("%.2d:%s ", item.priority, item.value)
	}
}

除此之外还有 lru cache, 定时器 等等

能解决的问题

topK 问题

在一个包含n个数据的数组中，查找前K大数：

维护一个大小为k的小顶堆
顺序遍历数组，从数组中取出数据与堆顶元素做比较

a. 如果比对顶元素大，我们就把堆顶元素删除，并且把这个元素插入到堆中

b. 如果比堆顶元素小，则不做处理，继续遍历数组
堆中的数据就是前K大数据

求中位数问题

中位数： 一组数据从小到大排列，处于中间位置的数字

一般处理是先排序，然后取中间位置的数字

利用堆获取中位数：

假如有n个数据，维护两个堆，一个大顶堆，一个小顶堆，各占 n/2 的容量
将数据依次加入小顶堆，若小顶堆容量超出，则pop 一个数据出来加入大顶堆
当 n 是奇数切，小顶堆的容量大于大顶堆时，小顶堆 pop 出来的就是中位数；当 n是偶数，2个堆 pop 出来的都是中位数

如何实现

二叉堆

3 个关键点：

将堆的数据存储在数组中
插入堆中的 heapifyUp 过程
删除堆顶数据的 heapifyDown 过程

如何在数组里存储一个堆？

若当前节点下标为 i, 则：

left child : 2*i+1
right child: 2*i+1
parent: (i-1)/2

HeapifyUp

插入数据过程，时间复杂度：O(logn)

新元素一律先插入到堆尾，即数组尾部
从堆尾沿着父节点依次向上调整，整个堆的结构

过程如图所示：

HeapifyDown

删除堆顶数据， 时间复杂度： O(logN)

将堆尾元素于堆顶数据交换，然后删除堆尾数据（即刚发生交换的堆顶数据）（即堆顶被替代删除）
依次从堆顶向下调整整个堆的结构

过程如图所示：

模板（参考 std 里面的heap)

package heap

import "sort"

type heap interface {
	sort.Interface
	Push(x interface{})
	Pop() interface{}
}

func Push(h heap, x interface{}) {
	h.Push(x)
	up(h, h.Len()-1)
}

func Pop(h heap) interface{} {
	h.Swap(0, h.Len()-1)
	down(h, 0, h.Len()-1)
	return h.Pop()
}

func up(h heap, i int) {
	for {
		j := (i - 1) / 2
		if !h.Less(i, j) || j == i {
			break
		}
		h.Swap(i, j)
		i = j
	}
}
func down(h heap, i, n int) bool {
	j := i
	for {
		l := 2*j + 1    // left child
		if l >= n || l < 0 {
			break
		}
		if r := l + 1; r < n && h.Less(r, l) {   // right child
			l = r
		}
		if !(h.Less(l, j)) {
			break
		}
		h.Swap(j, l)
		j = l
	}
	return j > i
}

拓展

堆和二叉树的区别

堆是一棵完全二叉树，其数据存储在数组中
堆的每一个节点的都大于等于（小于等于）其子节点的值

堆和内存中的堆的有什么关系

没有什么关系
程序和进程中的堆、栈是一种存储结构，是一种“具体”或“物理”的概念
数据结构的堆、栈是一种逻辑结构，因此是一种抽象的概念，类似的还有二叉树，红黑树

topK 问题

处理 topK 问题时选择堆排序还是快排？

时间复杂度：
- 堆 nlogk
- 快排：（平均）时间复杂度 n
空间：
- 排原数组修改，如果不能修改空间为 n
- 堆只保留k个

如何选择？

海量数据时使用堆，因为快排占用资源过多

最后

以上，由于能力有限，疏忽和不足之处难以避免，欢迎读者指正，以便及时修改。

若本文对你有帮助的话，欢迎点赞👍 和收藏，感谢支持！

浅谈堆的原理和应用

什么是堆

介绍

大顶堆

小顶堆

特性

堆的种类

二叉堆 binary heap

堆的用途

堆的应用

堆排序

优先队列

能解决的问题

topK 问题

求中位数问题

如何实现

二叉堆

HeapifyUp

HeapifyDown

拓展

堆和二叉树的区别

堆和内存中的堆的有什么关系

topK 问题

最后