对比常见排序算法，从DualPivotQuicksort学习高性能排序算法Comparable 注释该接口对实现它的每

Comparable

注释

该接口对实现它的每个类的对象进行了总排序。这种排序被称为类的自然排序，类的 compareTo 方法被称为它的自然比较方法。

实现此接口的对象列表（和数组）可以通过 Collections.sort(List list) 和 Arrays.sort() 自动排序

Collections.sort(List list)

定义

public static <T extends Comparable<? super T>> void sort(List<T> list) {
    list.sort(null);
}

这里实际调用 List 类的 sort()

sort(Comparator<? super E> c)

定义

default void sort(Comparator<? super E> c) {
    Object[] a = this.toArray();
    Arrays.sort(a, (Comparator) c);
    ListIterator<E> i = this.listIterator();
    for (Object e : a) {
        i.next();
        i.set((E) e);
    }
}

List 的子类会实现 listIterator() 方法：

ArrayList 类的实现
- ListItr 类
  - set()
    - 实际调用 ArrayList 的 set() 方法设置值
- Itr 类
  - next()
    - cursor 用来记录下一个要遍历的下标
    - lastRet 用来记录当前下标
LinkedList 类的实现
- ListItr 类
  - next()
    - next 为下一节点，nextIndex 为下一节点的下标
    - lastReturned 为当前节点
  - set()
    - 排序实际由 Arrays.sort() 方法实现：

Arrays.sort()

有多个重载的方法

public static void sort(int[] a) {
    DualPivotQuicksort.sort(a, 0, a.length - 1, null, 0, 0);
}

public static void sort(int[] a, int fromIndex, int toIndex) {
    rangeCheck(a.length, fromIndex, toIndex);
    DualPivotQuicksort.sort(a, fromIndex, toIndex - 1, null, 0, 0);
}

public static void sort(long[] a) {
    DualPivotQuicksort.sort(a, 0, a.length - 1, null, 0, 0);
}

...

可以看到，这些方法实际调用 DualPivotQuicksort 里的方法实现排序的功能。

涉及到 插入、快速、归并、计数 排序。

插入排序

按照升序排列

算法描述

从第一个元素开始，该元素可以认为已经被排序
取出下一个元素，在已经排序的元素序列中从后向前扫描
如果该元素（已排序）大于新元素，将该元素移到下一位置
重复步骤 3，直到找到已排序的元素小于或者等于新元素的位置
将新元素插入到该位置后
重复步骤 2~5

复杂度

	描述
平均时间复杂度		O(n^2)
最坏时间复杂度	序列是降序排列	O(n^2)
最优时间复杂度	序列是升序排列	O(n)
空间复杂度		O(1)

算法特点

对于部分有序的数组十分高效，也很适合小规模数组。如果数组中倒置的数量小于数组大小的某个倍数，那么我们说这个数组是部分有序的。倒置是指数组中的两个顺序颠倒的元素，比如 EXAMPLE 中有 11 对倒置：E-A, X-A, X-M, X-P, X-L, X-E, M-L, M-E, P-L, P-E 以及 L-E。下面是几种典型的部分有序的数组：

数组中每个元素距离它的最终位置都不远
一个有序的大数组接一个小数组
数组中只有几个元素的位置不正确

插入排序需要的交换次数和数组中的倒置的数量相同。需要的比较次数大于等于倒置数量，小于等于倒置的数量加上数组的大小再减一

算法实现

private int[] insertSort(int[] array) {
    int j, temp;
    int size = array.length;

    for (int i = 1; i < size; i++) {
        temp = array[i];
        for (j = i - 1; j >= 0 && array[j] > temp; j--) {
            array[j + 1] = array[j];
        }
        array[j + 1] = temp;
    }
    return array;
}

插入排序和选择排序非常相像，下面讲一下选择排序

DualPivotQuicksort 的实现

if (length < INSERTION_SORT_THRESHOLD) {
    // leftmost 指定的范围是否在数组的最左边
    if (leftmost) {
        // 在最左边的部分使用针对服务器 VM 优化的传统（无标记）插入排序
        for (int i = left, j = i; i < right; j = ++i) {
            // 保存要插入的值到 ai
            int ai = a[i + 1];
            while (ai < a[j]) {
                a[j + 1] = a[j];
                if (j-- == left) {
                    break;
                }
            }
            a[j + 1] = ai;
        }
    } else {
        // 跳过最长的升序序列
        do {
            if (left >= right) {
                return;
            }
        } while (a[++left] >= a[left - 1]);

        // 这里使用了 pair insertion sort 算法，在下面详细介绍
        for (int k = left; ++left <= right; k = ++left) {
            int a1 = a[k], a2 = a[left];

            // 确保 a1 大于 a2
            if (a1 < a2) {
                a2 = a1; a1 = a[left];
            }
            while (a1 < a[--k]) {
                a[k + 2] = a[k];
            }
            a[++k + 1] = a1;

            while (a2 < a[--k]) {
                a[k + 1] = a[k];
            }
            a[k + 1] = a2;
        }
        int last = a[right];

        while (last < a[--right]) {
            a[right + 1] = a[right];
        }
        a[right + 1] = last;
    }
    return;
}

pair insertion sort (结伴插入排序) 的伪代码参考：formal.kastel.kit.edu/ulbrich/ver…，并据此写出 Java 版本：

private void myInsertSortNotLeftMost(int[] a) {
    int i = 0;
    while (i < a.length - 1) {
        // 让 x 和 y 保持 a 中相邻的元素
        int x = a[i];
        int y = a[i + 1];

        // 确保 x 大于 y
        if (x < y) {
            int temp = x;
            x = y;
            y = temp;
        }

        // j 是用于查找插入点的索引
        int j = i - 1;
        // 找到 x 的插入点
        while (j >= 0 && a[j] > x) {
            // 将现有内容移动 2
            a[j + 2] = a[j];
            j = j - 1;
        }
        // 将 x 存储在其插入位置
        a[j + 2] = x;
        // a[j + 1] 现在是一个可用的空间

        // 找到 y 的插入点
        while (j >= 0 && a[j] > y) {
            // 将现有内容移动 1
            a[j + 1] = a[j];
            j = j -1;
        }
        // 将 y 存储在其插入位置
        a[j + 1] = y;

        i = i + 2;
    }

    // 如果数组长度是奇数，最后一个元素还需要进行一次插入。之所以要做这个步骤，是因为结伴插入的结尾每次都是下标后移两位，数组元素个数只有偶数个时才能把全部元素遍历一次
    if (i == a.length - 1) {
        int y = a[i];
        int j = i - 1;
        while (j >= 0 && a[j] > y) {
            a[j + 1] = a[j];
            j = j - 1;
        }
        a[j + 1] = y;
    }
    System.out.println(Arrays.toString(a));
}

对比结伴插入排序的两段代码块和传统的插入排序的代码的逻辑，发现十分相似。

因此总结出结伴插入的算法描述：

将 a[i] 和 a[i + 1] 中的较大值存到 x，较小值存到 y
取出 a[i] 的上一个元素，在已经排序的元素序列中从后向前扫描
如果该元素（已排序）大于 x，将该元素移到下 2 个位置
重复步骤 3，直到找到已排序的元素小于或者等于 x 的位置 (j + 2)
如果元素 a[j] 大于 y，则该元素移到下 1 个位置
重复步骤 5，直到找到已排序的元素小于或者等于 y 的位置
下标后移 2 位，重复步骤 1，直到数组遍历完毕
若数组长度长度为奇数，最后一个元素还需要进行一次插入

对比我写的 Java 版和 JDK 的实现，还需要在我写的基础上优化一下才能变成 JDK 那样的：

结伴插入转化.png

选择排序

按照升序排列

算法描述

在未排序序列中找到最小（大）元素，存放到排序序列的起始位置
从剩余未排序元素中继续寻找最小（大）元素，然后放到已排序序列的末尾
重复步骤 2

复杂度

	描述
平均时间复杂度		O(n^2)
最坏时间复杂度	序列逆序	O(n^2)
最优时间复杂度	序列已经有序	O(n^2)
空间复杂度		O(1)

算法特点

运行时间与输入无关，数据移动是最少的

算法实现

private int[] selectionSort(int[] array) {
    int min, temp;
    int size = array.length;

    for (int i = 0; i < size; i++) {
        min = i;
        for (int j = i + 1; j < size; j++) {
            if (array[j] < array[min]) {
                min = j;
            }
        }
        if (min != i) {
            temp = array[i];
            array[i] = array[min];
            array[min] = temp;
        }
    }

    return array;
}

插入排序、结伴插入排序和选择排序的性能比较

希尔排序

复杂度


平均时间复杂度	根据步长序列的不同而不同
最坏时间复杂度	根据步长序列的不同而不同。已知最好的：O(n log^2 n)
最优时间复杂度	O(n)
空间复杂度	O(1)

算法特点

希尔排序比插入排序和选择排序要快得多，并且数组越大，优势越大。运行时间达不到平方级别

算法实现

public static void shellSort(int[] arr) {
    int length = arr.length;
    int temp;
    for (int step = length / 2; step >= 1; step /= 2) {
        for (int i = step; i < length; i++) {
            temp = arr[i];
            int j = i - step;
            while (j >= 0 && arr[j] > temp) {
                arr[j + step] = arr[j];
                j -= step;
            }
            arr[j + step] = temp;
        }
    }
}

插入排序、选择排序和希尔排序性能比较

快速排序

按照升序排列

算法描述

挑选基准值：从数列中挑出一个元素，称为“基准”（pivot）
分割：重新排序数列，所有比基准值小的元素摆放在基准前面，所有比基准值大的元素摆在基准后面（与基准值相等的数可以到任何一边）。在这个分割结束之后，对基准值的排序就已经完成
递归排序子序列：递归地将小于基准值元素的子序列和大于基准值元素的子序列排序

复杂度

	描述
平均时间复杂度		O(nlogn)
最坏时间复杂度	序列逆序	O(n^2)
最优时间复杂度	序列已经有序	O(nlogn)
空间复杂度		O(logn)

算法特点

通常是实际排序应用中最好的选择，因为平均性能非常好：它的期望时间复杂度是 O(n lgn)，而且 O(n lgn) 中隐含的常数因子非常小。

算法实现

private int[] quickSort(int[] arr, int left, int right) {
    if (left < right) {
        int p = partition(arr, left, right);
        quickSort(arr, left, p - 1);
        quickSort(arr, p + 1, right);
    }
    return arr;
}

private void swap(int[] arr, int i, int j) {
    int temp = arr[i];
    arr[i] = arr[j];
    arr[j] = temp;
}

Lomuto 分区方案

这个方案是在《算法导论》里介绍的，其效率其实比 Hoare 分区方案低

private int partition(int[] arr, int left, int right) {
    int pivot = arr[right];
    int i = left;
    for (int j = left; j < right; j++) {
        if (arr[j] <= pivot) {
            swap(arr, j, i);
            i++;
        }
    }
    swap(arr, i, right);
    return i;
}

Hoare 分区方案

private static int partition(int[] arr, int left, int right) {
        int pivot = arr[left];
        while (left < right) {
            while (left < right && arr[right] >= pivot) {
                right--;
            }
            arr[left] = arr[right];
            while (left < right && arr[left] <= pivot) {
                left++;
            }
            arr[right] = arr[left];
        }
        arr[left] = pivot;
        return left;
    }

算法优化

一般来说，使用以下改进措施可以将快速排序的性能提升 20% ~ 30%

切换到插入排序

对于小数组，快速排序比插入排序慢
因为递归，快速排序的 sort() 方法在小数组也会调用自己

三取样切分

《算法》里面提到：“使用子数组的一小部分元素的中位数来切分数组，这样做得切分更好，但是代价是需要计算中位数。人们发现取样大小设为 3 并用大小居中的效果最好。”

三切分：

// 给 array[low]、array[middle] 和 array[high] 排序，然后交换 array[low] 和 array[middle] 的值
int middle = low + (high - low) / 2;

if (ArrayUtil.less(array[middle], array[low])) {
    ArrayUtil.exchange(array, middle, low);
}
if (ArrayUtil.less(array[high], array[low])) {
    ArrayUtil.exchange(array, high, low);
}
if (ArrayUtil.less(array[high], array[middle])) {
    ArrayUtil.exchange(array, high, middle);
}

//Swap median with low
ArrayUtil.exchange(array, middle, low);
Comparable pivot = array[low];

五切分（五个随机数）：

if (high - low + 1 < 5) {
    return;
}

int randomIndex1 = StdRandom.uniform(low, high + 1);
int randomIndex2 = StdRandom.uniform(low, high + 1);
int randomIndex3 = StdRandom.uniform(low, high + 1);
int randomIndex4 = StdRandom.uniform(low, high + 1);
int randomIndex5 = StdRandom.uniform(low, high + 1);

medianOf5Array[0] = array[randomIndex1];
medianOf5Array[1] = array[randomIndex2];
medianOf5Array[2] = array[randomIndex3];
medianOf5Array[3] = array[randomIndex4];
medianOf5Array[4] = array[randomIndex5];

Map<Comparable, Integer> originalIndexes = new HashMap<>();
originalIndexes.put(medianOf5Array[0], randomIndex1);
originalIndexes.put(medianOf5Array[1], randomIndex2);
originalIndexes.put(medianOf5Array[2], randomIndex3);
originalIndexes.put(medianOf5Array[3], randomIndex4);
originalIndexes.put(medianOf5Array[4], randomIndex5);

Comparable median;

//1st compare
if (!ArrayUtil.less(medianOf5Array[0], medianOf5Array[1])) {
    ArrayUtil.exchange(medianOf5Array, 0, 1);
}
//2nd compare
if (!ArrayUtil.less(medianOf5Array[3], medianOf5Array[4])) {
    ArrayUtil.exchange(medianOf5Array, 3, 4);
}
//3rd compare
if (!ArrayUtil.less(medianOf5Array[0], medianOf5Array[3])) {
    ArrayUtil.exchange(medianOf5Array, 0, 3);
}

//4th compare
if (ArrayUtil.less(medianOf5Array[1], medianOf5Array[2])) {
    //5th compare
    if (ArrayUtil.less(medianOf5Array[1], medianOf5Array[3])) {
        //6th compare
        if (ArrayUtil.less(medianOf5Array[2], medianOf5Array[3])) {
            median = medianOf5Array[2];
        } else {
            median = medianOf5Array[3];
        }
    } else {
        //6th compare
        if (ArrayUtil.less(medianOf5Array[1], medianOf5Array[4])) {
            median = medianOf5Array[1];
        } else {
            median = medianOf5Array[4];
        }
    }
} else {
    //5th compare
    if (ArrayUtil.less(medianOf5Array[3], medianOf5Array[2])) {
        //6th compare
        if (ArrayUtil.less(medianOf5Array[2], medianOf5Array[4])) {
            median = medianOf5Array[2];
        } else {
            median = medianOf5Array[4];
        }
    } else {
        //6th compare
        if (ArrayUtil.less(medianOf5Array[1], medianOf5Array[3])) {
            median = medianOf5Array[1];
        } else {
            median = medianOf5Array[3];
        }
    }
}

int originalMedianIndex = originalIndexes.get(median);

//Swap median with low
ArrayUtil.exchange(array, originalMedianIndex, low);

以下是我实测的数据：随机无序数组

完全有序的序列

可见三取样切分取轴可以比取第一个元素为轴更快。而五取样切分取轴比取第一个元素还慢，可能是我的代码效率不够高，这里就不深入探究了。

熵最优的排序

三向切分的快速排序

算法特点：

三向切分的最坏情况是所有主键均不相同。当存在重复主键时，它的性能就会比归并排序好得多。

是排序库函数的最佳算法选择。

算法描述：

指针 lo 指向数组最左的元素，hi 指向数组最右的元素，lt 使得 a[lo .. lt - 1] 中的元素都小于 v，gt 使得 a[gt + 1 .. hi] 中的元素都大于 v，i 使得 a[lt .. i - 1] 中的元素都等于 v，a[i .. gt] 中的元素还未确定。

a[i] 小于 v，将 a[lt] 和 a[i] 交换，将 lt 和 i 加一
a[i] 大于 v，将 a[gt] 和 a[i] 交换，将 gt 减一
a[i] 等于 v，将 i 加一

算法实现：

private static final Random RANDOM = new Random();

private void quickSort(int[] nums, int left, int right) {
    int pIndex = partition(nums, left, right);
    quickSort(nums, left, pIndex - 1);
    quickSort(nums, pIndex + 1, right);
}

private int partition(int[] nums, int left, int right) {
    int randomIndex = left + RANDOM.nextInt(right - left + 1);
    swap(nums, randomIndex, left);

    int pivot = nums[left];
    int lt = left + 1;
    int gt = right;

    while (true) {
        while (lt <= right && nums[lt] < pivot) {
            lt++;
        }

        while (gt > left && nums[gt] > pivot) {
            gt--;
        }

        if (lt >= gt) {
            break;
        }

        swap(nums, lt, gt);
        lt++;
        gt--;
    }
    swap(nums, left, gt);
    return gt;
}

private void swap(int[] nums, int index1, int index2) {
    int temp = nums[index1];
    nums[index1] = nums[index2];
    nums[index2] = temp;
}

DualPivotQuicksort 的实现

此数组为 T[] a，T 是指基本数据类型（如 int, float, byte, char, double, long, short）,两个轴 P1、P2，三个指针 L、K、G，left 指向数组最左边的元素，right 指向数组最右边的元素。

算法描述

对于长度小于 17 的数组，使用插入排序
选择两个数组元素当作轴，比如我们可以把数组第一个元素 a[left] 当作 P1，最后一个元素 a[right] 当作 P2
P1 必须小于 P2，否则互换两者的值。现在数组被分为以下几个部分：
- part I 的下标从 left + 1 到 L - 1，元素大小 < P1
- part I的下标从 L 到 K - 1，P1 <= 元素大小 <= P2
- part III 的下标从 G + 1 到 right - 1，元素大小 < P2
- part IV 的下标从 K 到 G，包含要检查的其他元素，即这一 part 存放还未排序的元素
part IV 中下一个要被排序的元素 a [ k ] 与两个轴 P1 和 P2 进行比较，并放置到相应的 part I、 II 或 III
指针 L、K 和 G 在相应的方向上发生变化
当 K <= G 时，重复步骤 4-5
P1 与 part I 的最后一个元素交换，P3 与 part III 的最后一个元素交换
在 part I、part II、part III 中递归重复步骤 1-7

算法特点

比传统的快速排序（单轴）更快。实验证明了轴越多，排序速度越快，且在快速排序算法中实现更多支点所带来的速度增长趋于缓慢减少：iopscience.iop.org/article/10.…，epubs.siam.org/doi/pdf/10.…

算法实现

/**
* @author Vladimir Yaroslavskiy
* @version 2009.09.17 m765.817
*/

private static final int DIST_SIZE = 13;
private static final int TINY_SIZE = 17;

private static void dualPivotQuicksort(int[] a, int left, int right) {
    int len = right - left;
    int x;
    // 属于切换到插入排序
    if (len < TINY_SIZE) {
        for (int i = left + 1; i <= right; i++) {
            for (int j = i; j > left && a[j] < a[j - 1]; j--) {
                x = a[j - 1];
                a[j - 1] = a[j];
                a[j] = x;
            }
        }
        return;
    }
    // 属于五取样切分
    int sixth = len / 6;
    int m1 = left + sixth;
    int m2 = m1 + sixth;
    int m3 = m2 + sixth;
    int m4 = m3 + sixth;
    int m5 = m4 + sixth;
    // 5-element sorting network
    if (a[m1] > a[m2]) { x = a[m1]; a[m1] = a[m2]; a[m2] = x; }
    if (a[m4] > a[m5]) { x = a[m4]; a[m4] = a[m5]; a[m5] = x; }
    if (a[m1] > a[m3]) { x = a[m1]; a[m1] = a[m3]; a[m3] = x; }
    if (a[m2] > a[m3]) { x = a[m2]; a[m2] = a[m3]; a[m3] = x; }
    if (a[m1] > a[m4]) { x = a[m1]; a[m1] = a[m4]; a[m4] = x; }
    if (a[m3] > a[m4]) { x = a[m3]; a[m3] = a[m4]; a[m4] = x; }
    if (a[m2] > a[m5]) { x = a[m2]; a[m2] = a[m5]; a[m5] = x; }
    if (a[m2] > a[m3]) { x = a[m2]; a[m2] = a[m3]; a[m3] = x; }
    if (a[m4] > a[m5]) { x = a[m4]; a[m4] = a[m5]; a[m5] = x; }
    // pivots: [ < pivot1 | pivot1 <= && <= pivot2 | > pivot2 ]
    int pivot1 = a[m2];
    int pivot2 = a[m4];
    boolean diffPivots = pivot1 != pivot2;
    a[m2] = a[left];
    a[m4] = a[right];
    // center part pointers
    int less = left + 1;
    int great = right - 1;
    // 三向切分快速排序
    if (diffPivots) { // 先处理两边 < pivot1 和 x > pivot2 的元素，遇到 == pivot1 或 == pivot2 则 less 不 ++ 或 great 不 --
        for (int k = less; k <= great; k++) {
            x = a[k];
            if (x < pivot1) {
                a[k] = a[less];
                a[less++] = x;
            }
            else if (x > pivot2) {
                while (a[great] > pivot2 && k < great) {
                    great--;
                }
                a[k] = a[great];
                a[great--] = x;
                x = a[k];
                if (x < pivot1) {
                    a[k] = a[less];
                    a[less++] = x;
                }
            }
        }
    }
    else { // 两个轴的值相同
        for (int k = less; k <= great; k++) {
            x = a[k];
            if (x == pivot1) {
                continue;
            }
            if (x < pivot1) {
                a[k] = a[less];
                a[less++] = x;
            }
            else { // x > pivot1,pivot2
                while (a[great] > pivot2 && k < great) {
                    great--;
                }
                a[k] = a[great];
                a[great--] = x;
                x = a[k];
                if (x < pivot1) {
                    a[k] = a[less];
                    a[less++] = x;
                }
            }
        }
    }
    // swap
    a[left] = a[less - 1];
    a[less - 1] = pivot1;
    a[right] = a[great + 1];
    a[great + 1] = pivot2;
    // 递归处理左右部分
    dualPivotQuicksort(a, left, less - 2);
    dualPivotQuicksort(a, great + 2, right);
    // 处理 == pivot1 或 == pivot2
    if (great - less > len - DIST_SIZE && diffPivots) {
        for (int k = less; k <= great; k++) {
            x = a[k];
            if (x == pivot1) {
                a[k] = a[less];
                a[less++] = x;
            }
            else if (x == pivot2) {
                a[k] = a[great];
                a[great--] = x;
                x = a[k];
                if (x == pivot1) {
                    a[k] = a[less];
                    a[less++] = x;
                }
            }
        }
    }
    // center part
    if (diffPivots) {
        dualPivotQuicksort(a, less, great);
    }
}

JDK 的实现

判断单双轴

// length / 7 的近似值，这里使用到无穷级数，下面讲解一下
int seventh = (length >> 3) + (length >> 6) + 1;

// 属于五取样切分
int e3 = (left + right) >>> 1;
int e2 = e3 - seventh;
int e1 = e2 - seventh;
int e4 = e3 + seventh;
int e5 = e4 + seventh;

// 使用插入排序，升序
if (a[e2] < a[e1]) { int t = a[e2]; a[e2] = a[e1]; a[e1] = t; }
if (a[e3] < a[e2]) { int t = a[e3]; a[e3] = a[e2]; a[e2] = t;
                    if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
                   }
if (a[e4] < a[e3]) { int t = a[e4]; a[e4] = a[e3]; a[e3] = t;
                    if (t < a[e2]) { a[e3] = a[e2]; a[e2] = t;
                                    if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
                                   }
                   }
if (a[e5] < a[e4]) { int t = a[e5]; a[e5] = a[e4]; a[e4] = t;
                    if (t < a[e3]) { a[e4] = a[e3]; a[e3] = t;
                                    if (t < a[e2]) { a[e3] = a[e2]; a[e2] = t;
                                                    if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
                                                   }
                                   }
                   }

int less  = left;
int great = right;

// 双轴实现属于快速排序算法改进中的“熵最优排序”
if (a[e1] != a[e2] && a[e2] != a[e3] && a[e3] != a[e4] && a[e4] != a[e5]) {
    // 双轴实现的代码
} else {
    // 单轴实现的代码
}

解释 seventh = (length >> 3) + (length >> 6) + 1：

因为 length / 7 = (length >> 3) + (length >> 6) = length / 2^3 + length / 2^6 = length (1 / 2^3 + 1 / 2^6) 即 1 / 2^3 + 1 / 2^6 = 1 / 7 <式子1>
且注释说该代码涉及到无穷级数

无穷级数表示为
因此可以把 <式子1> 化为无穷级数： = 1 / 2^k + 1 / 2^2k + 1 / 2^3k + ... = 1 / (2^k - 1) <式子2>
接下来证明 <式子2> 的等式成立
<式子2> 可以化为 1 / 2^k + 1 / 2^2k + 1 / 2^3k + ... = (1 / 2^k) / ((2^k - 1) / 2^k) 1 / 2^k + 1 / 2^2k + 1 / 2^3k + ... = (1 / 2^k) / (1 - 1 / 2 ^ k) <式子3>
设 x = 1 / 2 ^ k，则 <式子3> 可以转化为 x + x^2 + x^3 + ... = x / (1 - x)，等式两边同时除以 x 1 + x + x^2 + x^3 + ... = 1 / (1 - x)，设等式左边为 f(x) f(x) = 1 + x + x^2 + x^3 + ... f(x) = 1 + x(1 + x^2 + ...) f(x) = 1 + x*f(x) f(x)(1 - x) = 1 f(x) = 1 / (1 - x)，证得 <式子2> 的等式成立
综上所述 1 / 2^k + 1 / 2^2k + 1 / 2^3k + ... = 1 / (2^k - 1)

为什么是除以 7？

length / 7 是作为 e1、e2、e3、e4、e5 之间的间距，而这间距是经验值。

调试代码

Random random = new Random();
int[] a = new int[250];
for (int i = 0; i < 250; i++) {
    a[i] = random.nextInt(100);
}
Arrays.sort(a);

双轴

// 使用五个已排序元素中的第二个和第四个作为轴
int pivot1 = a[e2];
int pivot2 = a[e4];

a[e2] = a[left];
a[e4] = a[right];

while (a[++less] < pivot1);
while (a[--great] > pivot2);

// 将 less 到 great 之间的元素一一与 pivot1 和 pivot2 比较
// 小于 pivot1 的放在 less 左边 less++
// 大于 pivot2 的放在 great 右边，great--
outer:
// 循环一
for (int k = less - 1; ++k <= great; ) {
    int ak = a[k];
    if (ak < pivot1) { // 将小于 pivot1 的元素放到 left part
        a[k] = a[less];
        a[less] = ak;
        ++less;
    } else if (ak > pivot2) { // 将大于 pivot2 的元素放到 right part
        // 循环 A
        while (a[great] > pivot2) {
            if (great-- == k) {
                break outer;
            }
        }
        // 因为如果执行了循环 A 后 a[great] 会小于 pivot2，又因为 pivot2 > pivot1，所以 a[great] 可能会小于 pivot1，使用这个 if 语句块判断 a[great] 是否小于 pivot1
        if (a[great] < pivot1) {
            a[k] = a[less];
            a[less] = a[great];
            ++less;
        } else {
            a[k] = a[great];
        }

        a[great] = ak;
        --great;
    }
}

// 将轴交换到最终位置
a[left]  = a[less  - 1]; a[less  - 1] = pivot1;
a[right] = a[great + 1]; a[great + 1] = pivot2;

sort(a, left, less - 2, leftmost);
sort(a, great + 2, right, false);

// 如果中间的部分大于数组长度的 4 / 7
// 将等于 pivot1 的值放到 less 左边，less++
// 将等于 pivot2 的值放到 great 右边，great--
if (less < e1 && e5 < great) {
    
    while (a[less] == pivot1) {
        ++less;
    }
    
    while (a[great] == pivot2) {
        --great;
    }

    outer:
    // 循环二，处理与轴值相同的元素
    for (int k = less - 1; ++k <= great; ) {
        int ak = a[k];
        if (ak == pivot1) {
            a[k] = a[less];
            a[less] = ak;
            ++less;
        } else if (ak == pivot2) {
            while (a[great] == pivot2) {
                if (great-- == k) {
                    break outer;
                }
            }
            if (a[great] == pivot1) {
                a[k] = a[less];
                a[less] = pivot1;
                ++less;
            } else {
                a[k] = a[great];
            }
            a[great] = ak;
            --great;
        }
    }
}

sort(a, less, great, false);

调试

250个随机数的数组

[46, 14, 67, 14, 96, 1, 46, 97, 7, 11, 25, 88, 86, 69, 56, 34, 22, 53, 59, 43, 56, 33, 20, 45, 40, 92, 35, 38, 17, 2, 54, 29, 97, 29, 60, 19, 39, 65, 17, 10, 41, 24, 59, 82, 85, 26, 34, 92, 80, 45, 67, 57, 85, 17, 15, 23, 81, 65, 20, 95, 31, 14, 61, 39, 55, 0, 83, 2, 16, 26, 21, 43, 54, 50, 0, 35, 16, 87, 52, 96, 31, 87, 94, 10, 12, 66, 82, 0, 7, 46, 34, 9, 35, 79, 54, 33, 60, 47, 72, 84, +150 more]

插入排序前

a[e1] = 59
a[e2] = 81
a[e3] = 46
a[e4] = 15
a[e5] = 85

插入排序后

a[e1] = 15
a[e2] = 46
a[e3] = 59
a[e4] = 81
a[e5] = 85

开始循环一前

开始循环二前

因为我这里调试时 great - less 小于数组长度的 4 / 7，所以会跳过循环二的执行，而是递归中间部分

单轴

在单轴的第一行代码处打断点

因为我写的测试代码是生成随机数数组，所以有概率会出现单轴的情况，只要多点几次 debug，就有可能进入到单轴的代码块中

int pivot = a[e3];

// 循环三
for (int k = less; k <= great; ++k) {
    if (a[k] == pivot) {
        continue;
    }
    int ak = a[k];
    if (ak < pivot) {
        a[k] = a[less];
        a[less] = ak;
        ++less;
    } else {
        while (a[great] > pivot) {
            --great;
        }
        if (a[great] < pivot) {
            a[k] = a[less];
            a[less] = a[great];
            ++less;
        } else {
            a[k] = pivot;
        }
        a[great] = ak;
        --great;
    }
}

sort(a, left, less - 1, leftmost);
sort(a, great + 1, right, false);

可以看到这里的单轴快速排序实现和传统的完全不一样

调试

250个随机数的数组

[92, 12, 25, 31, 92, 79, 62, 49, 28, 79, 35, 7, 34, 23, 59, 14, 98, 72, 65, 85, 24, 60, 79, 25, 23, 46, 34, 25, 27, 86, 23, 81, 87, 40, 70, 55, 94, 89, 88, 17, 67, 91, 84, 18, 84, 31, 31, 3, 79, 92, 70, 14, 88, 66, 15, 95, 52, 92, 11, 92, 28, 15, 1, 63, 67, 45, 92, 86, 33, 63, 64, 34, 60, 87, 42, 65, 92, 54, 17, 63, 59, 58, 17, 0, 48, 36, 95, 77, 97, 15, 87, 20, 53, 2, 31, 29, 36, 42, 77, 19, +150 more]

插入排序后

a[e1] = 15
a[e2] = 15
a[e3] = 16
a[e4] = 23
a[e5] = 97

开始循环三之前

执行完循环三

传统快速排序和 DualPivotQuicksort 的性能比较

归并排序

按照升序排列

算法描述

递归法（Top-down）

申请空间，使其大小为两个已经排序序列之和，该空间用来存放合并后的序列
设定两个指针，最初位置分别为两个已经排序序列的起始位置
比较两个指针所指向的元素，选择相对小的元素放入到合并空间，并移动指针到下一位置
重复步骤 3 直到某一指针到达序列尾
将另一序列剩下的所有元素直接复制到合并序列尾

迭代法（Bottom-up）

将序列每相邻两个数字进行归并操作，形成 ceil(n/2) 个序列，排序后每个序列包含两/一个元素
若此时序列数不是1个则将上述序列再次归并，形成 ceil(n/4) 个序列，每个序列包含四/三个元素
重复步骤 2，直到所有元素排序完毕，即序列数为 1

复杂度


平均时间复杂度	O(n logn)
最坏时间复杂度	O(n logn)
最优时间复杂度	O(n logn)
空间复杂度	O(n)

算法特点

迭代法比较适用于链表组织的数据

算法实现

递归法（Top-down）

public void mergeSort(int[] arr, int[] result, int start, int end) {
    if (start >= end)
        return;
    int len = end - start, mid = (len >> 1) + start;
    int start1 = start;
    int start2 = mid + 1;
    merge_sort(arr, result, start1, mid);
    merge_sort(arr, result, start2, end);
    // 合并两子数组
    int k = start;
    while (start1 <= mid && start2 <= end)
        result[k++] = arr[start1] < arr[start2] ? arr[start1++] : arr[start2++];
    while (start1 <= mid)
        result[k++] = arr[start1++];
    while (start2 <= end)
        result[k++] = arr[start2++];
    for (k = start; k <= end; k++)
        arr[k] = result[k];
}

算法优化

对于小规模子数组使用插入排序
- 因为递归会使小规模问题中方法得调用过于频繁，所以改进对它们的处理方法就能改进整个算法。而插入排序非常简单，因此很可能在小数组上比归并排序更快。使用插入排序处理小规模数组一般可以将归并排序的运行时间缩短 10% ~ 15%
测试数组是否有序
- 添加一个判断：如果 a[mid] 小于等于 a[mid+1]，我们就认为数组已经是有序的并跳过合并操作，这可以将任意有序的子数组算法的运行时间变为线性的
不将元素复制到辅助数组

迭代法（Bottom-up）

public void merge_sort(int[] arr) {
    int[] orderedArr = new int[arr.length];
    for (int i = 2; i < arr.length * 2; i *= 2) {
        for (int j = 0; j < (arr.length + i - 1) / i; j++) {
            int left = i * j;
            int mid = left + i / 2 >= arr.length ? (arr.length - 1) : (left + i / 2);
            int right = i * (j + 1) - 1 >= arr.length ? (arr.length - 1) : (i * (j + 1) - 1);
            int start = left, l = left, m = mid;
            while (l < mid && m <= right) {
                if (arr[l] < arr[m]) {
                    orderedArr[start++] = arr[l++];
                } else {
                    orderedArr[start++] = arr[m++];
                }
            }
            while (l < mid)
                orderedArr[start++] = arr[l++];
            while (m <= right)
                orderedArr[start++] = arr[m++];
            System.arraycopy(orderedArr, left, arr, left, right - left + 1);
        }
    }
}

递归法和迭代法性能比较

DualPivotQuicksort 的实现

在 TimSort 内

TimSort

定义

N

表示待排序的数组的长度

run

待排序的数组的有序子数组，顺序是非递减的（a0 <= a1 <= a2 <= ...），或者是严格递减的（a0 > a1 > a2 > ...），严格递减是因为子数组如果有相等的元素会破坏稳定性

根据科林斯词典的解释：

翻译过来的意思是：如果你说某个长的东西，比如一条路，朝着一个特定的方向 run，你就是在描述它的路线或位置。简单来说，可以把 run 理解成有方向的路径，如果你觉得这个解释还是太长，就叫它“向径”吧。

minrun

表示 run 的最小长度

算法描述

计算 minrun
- minrun 的取值原则
  - minrun 不能太长，因为 run 要用到插入排序，而插入排序对于短数组效率才高
  - minrun 不能太短，因为 run 太短会导致在下一步操作时有更多的 run 需要被合并
  - 在 (32, 65) 中选择一个 minrun 使得 N/minrun 等于或者接近d但小于 2 的幂，因为 N/minrun 等于 2 的幂时，每个 run 都是归并排序的叶子节点，这棵归并排序的树是满二叉树，如果不为满二叉树会导致更多的数据移动，以下举个例子：这是求 minrun 的代码
```
assert n >= 0;
int r = 0;
while (n >= 32) {
    // 代码行 1，二进制最后一位为 1 的是奇数
    r |= (n & 1); 
    n >>= 1;
}
return n + r;
```
    假设数组长度为 2112，如果不使用代码行 1 则返回 16，即 132 个run，否则返回 17，即 124 个 run
    
    可以看到，run 的个数大于 2 的幂会导致 run 的合并次数比个数为幂等于 2 的更多
合并的条件记最上边的三个 run 的长度从下到上分别是 A, B, C
- 不能先 A + C，因为如果 A，B，C 对应的数组中都有同一个数字 p，那么 A + C 会导致可能 C 的 p 出现在 B 的后面，或者 A 的 p 出现在 B 的前面，这破坏了稳定性。因此只能考虑先 (A+B) 或者 (B+C)
- 若 A > B + C，B > C 则不进行合并
  - B > C 意味着待合并的 run 栈中的 run 的长度从栈底到栈顶是递减
  - A > B + C 意味着从栈顶到栈底的 run 长度增长至少和斐波那契数列一样快
- 若 A <= B + C，则合并 B 与 A、C 中长度较短的 run
- 若 B <= C，则合并 B 和 C
有利于内存优化的操作
- 在合并 A 和 B 前，使用二分查找算法查找 B[0] 在 A 中的位置 p1，在 p1 之前的元素的位置是已经确定了的；接着同样使用二分查找法查找 A[-1] 在 B 的位置 p1，在 p2 之后的元素可以被忽略
- 使用临时内存等于 min(A, B)
  - 如果 A 比 B 小，将 A 对应的子数组复制到一个临时数组 temp，对 temp 和 B 对应的子数组进行归并算法，归并结果从左往右放置到原本 A 待的地方
  - 如果 B 比 A 小，则做类似上一步的操作（与上一步的操作是镜像对称的）
合并算法
- 当 A <= B 时的操作
  - “一次比较一对”模式（one pair at a time）
    - 比较 A 和 B 的首个元素，如果 B[0] < A[0] 则将 B[0] 覆盖到合并的区域，否则将 A[0] 覆盖到合并的区域
    - 如果 A 或 B 中的某一个连续将元素覆盖到合并区域的次数达到 MIN_GALLOP（进入急速移动的阈值），那么就进入下面的“急速移动”模式
  - “急速移动”模式（galloping mode）
    - 查找 A[0] 在 B 中的位置 q1，将 q1 前的元素（这些元素的大小都小于 A[0]）都覆盖到合并区域，接着将 A[0] 覆盖到合并区域
    - 查找 B[0] 在 A 中的位置 q2，将 q2 后的元素（这些元素都大于 B[0]）都覆盖到合并区域，接着将 B[0] 覆盖到合并区域
    - 当要覆盖到合并区域的数组元素个数小于 MIN_GALLOP 时，就会回到“一次比较一对”模式
- 当 A > B 时的操作
  - 略过，和前者很相似

复杂度


平均时间复杂度	O(n logn)
最坏时间复杂度	O(n logn)
最优时间复杂度	O(n)
空间复杂度	O(n)

算法实现

jdk 中的 TimSort

private T[] tmp;
private int tmpBase;
private int tmpLen;

// 这三个变量用于描述一个栈，该栈存储等待被合并的多个 run
// runBase[i] 是第 i 个 run 在输入数组的起始下标，runLen[i] 是其对应的长度
// runBase[i] + runLen[i] == runBase[i + 1]
private int stackSize = 0;
private final int[] runBase;
private final int[] runLen;

sort()

static <T> void sort(T[] a, int lo, int hi, Comparator<? super T> c,
                     T[] work, int workBase, int workLen) {
    assert c != null && a != null && lo >= 0 && lo <= hi && hi <= a.length;

    int nRemaining  = hi - lo;
    if (nRemaining < 2)
        return;  // 长度小于 2 就不需要排序了

    // 如果数组长度小于 32，就用二分插入排序算法
    if (nRemaining < MIN_MERGE) {
        int initRunLen = countRunAndMakeAscending(a, lo, hi, c);
        binarySort(a, lo, hi, lo + initRunLen, c);
        return;
    }

    // 从左到右遍历数组一次，找到 run，将长度不足 minrun 的 run 扩展到 minrun 长度，然后合并多个 run 以保持堆栈不变
    TimSort<T> ts = new TimSort<>(a, c, work, workBase, workLen);
    int minRun = minRunLength(nRemaining);
    do {
        // 找到下一个 run 并返回其长度
        int runLen = countRunAndMakeAscending(a, lo, hi, c);

        // 如果 run 太短，则将其长度扩展到 min(minRun, nRemaining)
        if (runLen < minRun) {
            int force = nRemaining <= minRun ? nRemaining : minRun;
            binarySort(a, lo, lo + force, lo + runLen, c);
            runLen = force;
        }

        // 把已经有序的数组压入堆栈中
        ts.pushRun(lo, runLen);
        ts.mergeCollapse();

        // Advance to find next run
        lo += runLen;
        nRemaining -= runLen;
    } while (nRemaining != 0);

    // Merge all remaining runs to complete sort
    assert lo == hi;
    ts.mergeForceCollapse();
    assert ts.stackSize == 1;
}

countRunAndMakeAscending()

// 返回从指定数组中指定位置开始的 run 的长度，如果是递减则反转run（确保方法返回时 run 总是升序）
private static <T> int countRunAndMakeAscending(T[] a, int lo, int hi,
                                                Comparator<? super T> c) {
    assert lo < hi;
    int runHi = lo + 1;
    if (runHi == hi)
        return 1;

    // 找出最长的升序子数组，如果是严格降序则反转该子数组
    if (c.compare(a[runHi++], a[lo]) < 0) { // 严格递减
        while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) < 0)
            runHi++;
        reverseRange(a, lo, runHi);
    } else {                              // 升序
        while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) >= 0)
            runHi++;
    }

    return runHi - lo;
}

TimSort()

private TimSort(T[] a, Comparator<? super T> c, T[] work, int workBase, int workLen) {
    this.a = a;
    this.c = c;

    // Allocate temp storage (which may be increased later if necessary)
    int len = a.length;
    int tlen = (len < 2 * INITIAL_TMP_STORAGE_LENGTH) ?
        len >>> 1 : INITIAL_TMP_STORAGE_LENGTH;
    if (work == null || workLen < tlen || workBase + tlen > work.length) {
        @SuppressWarnings({"unchecked", "UnnecessaryLocalVariable"})
        T[] newArray = (T[])java.lang.reflect.Array.newInstance
            (a.getClass().getComponentType(), tlen);
        tmp = newArray;
        tmpBase = 0;
        tmpLen = tlen;
    }
    else {
        tmp = work;
        tmpBase = workBase;
        tmpLen = workLen;
    }

    
    int stackLen = (len <    120  ?  5 :
                    len <   1542  ? 10 :
                    len < 119151  ? 24 : 49);
    runBase = new int[stackLen];
    runLen = new int[stackLen];
}

minRunLength()

// 返回指定长度数组的最小可接受的长度，在 [16, 32)之间
private static int minRunLength(int n) {
    assert n >= 0;
    int r = 0;      // Becomes 1 if any 1 bits are shifted off
    while (n >= MIN_MERGE) {
        r |= (n & 1); // n 最后一位是否为 1
        n >>= 1; // 缩小两倍
    }
    return n + r;
}

private static void reverseRange(Object[] a, int lo, int hi) {
    hi--;
    while (lo < hi) {
        Object t = a[lo];
        a[lo++] = a[hi];
        a[hi--] = t;
    }
}

// 将指定的 run 推入待办的 run 堆栈
// runBase 是 run 中第一个元素的下标
// runLen 表示 run 的数组元素个数
private void pushRun(int runBase, int runLen) {
    this.runBase[stackSize] = runBase;
    this.runLen[stackSize] = runLen;
    stackSize++;
}

mergeCollapse()

// 检查等待合并的 run 堆栈并合并相邻的 run，直到重新建立堆栈不变量
private void mergeCollapse() {
    while (stackSize > 1) {
        int n = stackSize - 2;
        // 若 A <= B + C，则合并 B 与 A、C 中长度较短的 run
        if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) {
            if (runLen[n - 1] < runLen[n + 1])
                n--;
            mergeAt(n);
        } else if (runLen[n] <= runLen[n + 1]) { // 若 B <= C，则合并 B 和 C
            mergeAt(n);
        } else { // A > B + C 且 B > C
            break;
        }
    }
}

mergeAt()

// 合并栈中下标为 i 和 i + 1 的两个 run，i 必须是栈的倒数第二个或者倒数第三个 run
// 如果是合并 B、C，则 i == stackSize - 2
// 如果是合并 A、B，则 i == stackSize - 3
private void mergeAt(int i) {
    assert stackSize >= 2;
    assert i >= 0;
    assert i == stackSize - 2 || i == stackSize - 3;

    int base1 = runBase[i];
    int len1 = runLen[i];
    int base2 = runBase[i + 1];
    int len2 = runLen[i + 1];
    assert len1 > 0 && len2 > 0;
    assert base1 + len1 == base2;

    // 合并两个 run 的长度并存到 runLen[i] 中
    runLen[i] = len1 + len2;
    // 如果是合并 A、B，因为 B 会合并进 A 中，所以 C 对应的 run 的信息可以往 B 处移
    if (i == stackSize - 3) {
        runBase[i + 1] = runBase[i + 2];
        runLen[i + 1] = runLen[i + 2];
    }
    stackSize--;

    // 从左往右查找 run2 的第一个元素在 run1 中的位置 k
    // run1 中在 k 之前的元素可以忽略（因为它们已经就位）
    int k = gallopRight(a[base2], a, base1, len1, 0, c);
    assert k >= 0;
    base1 += k;
    len1 -= k;
    if (len1 == 0)
        return;

    // 从右到左查找 run1 的最后一个元素在 run2 中的位置 l，这个 l 可以当作 run2 的长度
    // 具体实现和 gallopRight() 很像，后面就略过不讲了
    len2 = gallopLeft(a[base1 + len1 - 1], a, base2, len2, len2 - 1, c);
    assert len2 >= 0;
    if (len2 == 0)
        return;

    // 合并剩余的 run，对 len1、len2 中较小的 run 使用 tmp 数组
    if (len1 <= len2)
        mergeLo(base1, len1, base2, len2);
    else
        mergeHi(base1, len1, base2, len2);
}

gallopRight()

// 假设合并 run1、run2，run2 在 run1 右边
// key 为 run2 中的值，目的是为了查找其要插入 run1 的位置到 base 的距离
// a 是输入的数组，即待排序的数组
// base 是 run1 的起始下标
// len 是 run1 的长度
// hint 是开始搜索的索引
private static <T> int gallopRight(T key, T[] a, int base, int len,
                                   int hint, Comparator<? super T> c) {
    assert len > 0 && hint >= 0 && hint < len;

    // 移动的步长，比如从 a[0] 开始移动，步长为2，则到达 a[2]
    int ofs = 1;
    int lastOfs = 0;
    if (c.compare(key, a[base + hint]) < 0) {
        // Gallop left until a[b+hint - ofs] <= key < a[b+hint - lastOfs]
        int maxOfs = hint + 1;
        while (ofs < maxOfs && c.compare(key, a[base + hint - ofs]) < 0) {
            lastOfs = ofs;
            ofs = (ofs << 1) + 1;
            if (ofs <= 0)   // int overflow
                ofs = maxOfs;
        }
        if (ofs > maxOfs)
            ofs = maxOfs;

        // Make offsets relative to b
        int tmp = lastOfs;
        lastOfs = hint - ofs;
        ofs = hint - tmp;
    } else { // 这个 else 代码块内的逻辑是用在 mergeAt() 方法内，用于查找 run2 中的第一个元素要插入 run1 的具体下标
        // 当 a[base + hint] <= key
        // 急速往右移动到 [base + hint + lastOfs] <= key < a[base + hint + ofs]
        int maxOfs = len - hint;
        while (ofs < maxOfs && c.compare(key, a[base + hint + ofs]) >= 0) {
            lastOfs = ofs;
            // 移动步长为递增的奇数序列：1,3,5,...,2k + 1
            ofs = (ofs << 1) + 1;
            if (ofs <= 0)   // int 溢出
                ofs = maxOfs;
        }
        if (ofs > maxOfs)
            ofs = maxOfs;

        // 要插入的值在 run1 的下标在区间 [lastOfs,ofs)
        lastOfs += hint;
        ofs += hint;
    }
    assert -1 <= lastOfs && lastOfs < ofs && ofs <= len;

    // 使用二分查找搜索 [lastOfs,ofs)，找到 key 在 run1 的下标
    lastOfs++;
    while (lastOfs < ofs) {
        int m = lastOfs + ((ofs - lastOfs) >>> 1);

        if (c.compare(key, a[base + m]) < 0)
            ofs = m;          // key < a[base + m]
        else
            lastOfs = m + 1;  // a[base + m] <= key
    }
    assert lastOfs == ofs;
    return ofs;
}

mergeLo()

// 以稳定的方式合并两个相邻的 run，len1 <= len2
private void mergeLo(int base1, int len1, int base2, int len2) {
    assert len1 > 0 && len2 > 0 && base1 + len1 == base2;

    // 处于性能考虑，将 run1 复制到 temp 数组中
    T[] a = this.a;
    T[] tmp = ensureCapacity(len1);
    int cursor1 = tmpBase; // Indexes into tmp array
    int cursor2 = base2;   // Indexes int a
    int dest = base1;      // Indexes int a
    // 将 a 从下标 bas1 开始 len1 长的数组元素复制到 tmp 的 cursor1 下标
    System.arraycopy(a, base1, tmp, cursor1, len1);

    // Move first element of second run and deal with degenerate cases
    a[dest++] = a[cursor2++];
    if (--len2 == 0) {
        System.arraycopy(tmp, cursor1, a, dest, len1);
        return;
    }
    if (len1 == 1) {
        System.arraycopy(a, cursor2, a, dest, len2);
        a[dest + len2] = tmp[cursor1]; // Last elt of run 1 to end of merge
        return;
    }

    Comparator<? super T> c = this.c;  // Use local variable for performance
    int minGallop = this.minGallop;    //  "    "       "     "      "
    outer:
    while (true) {
        int count1 = 0; // Number of times in a row that first run won
        int count2 = 0; // Number of times in a row that second run won

        // “一次比较一对”模式（one pair at a time）
        do {
            assert len1 > 1 && len2 > 0;
            // run2 与 tmp 数组比较
            // 如果 run2 的元素小于 tmp 的，则将 run2 的元素覆盖在 a[dest] 上，否则将 tmp 的元素覆盖在 a[dest] 上。run2 和 tmp 中有哪一方连续覆盖了 7 次，则跳出该循环
            if (c.compare(a[cursor2], tmp[cursor1]) < 0) {
                a[dest++] = a[cursor2++];
                count2++;
                count1 = 0;
                if (--len2 == 0)
                    break outer;
            } else {
                a[dest++] = tmp[cursor1++];
                count1++;
                count2 = 0;
                if (--len1 == 1)
                    break outer;
            }
        } while ((count1 | count2) < minGallop);

        // “急速移动”模式（galloping mode）
        do {
            assert len1 > 1 && len2 > 0;
            // 查找 a[cursor2] 要插入 tmp 的位置到下标 cursor1 的距离 
            count1 = gallopRight(a[cursor2], tmp, cursor1, len1, 0, c);
            if (count1 != 0) {
                // 将 tmp 中比 a[cursor2] 小的多个数组元素复制到 a 中的 dest 处
                System.arraycopy(tmp, cursor1, a, dest, count1);
                dest += count1;
                cursor1 += count1;
                len1 -= count1;
                if (len1 <= 1) // len1 == 1 || len1 == 0
                    break outer;
            }
            // 将 a[cursor2] 放入 a[dest]
            a[dest++] = a[cursor2++];
            if (--len2 == 0)
                break outer;

            // 与上面 gallopRight() 处理类似
            count2 = gallopLeft(tmp[cursor1], a, cursor2, len2, 0, c);
            if (count2 != 0) {
                System.arraycopy(a, cursor2, a, dest, count2);
                dest += count2;
                cursor2 += count2;
                len2 -= count2;
                if (len2 == 0)
                    break outer;
            }
            a[dest++] = tmp[cursor1++];
            if (--len1 == 1)
                break outer;
            minGallop--;
        } while (count1 >= MIN_GALLOP | count2 >= MIN_GALLOP);
        if (minGallop < 0)
            minGallop = 0;
        minGallop += 2;  // Penalize for leaving gallop mode
    }  // End of "outer" loop
    this.minGallop = minGallop < 1 ? 1 : minGallop;  // Write back to field

    if (len1 == 1) {
        assert len2 > 0;
        // 将 run2 的剩余元素复制到 a[dest]
        System.arraycopy(a, cursor2, a, dest, len2);
        a[dest + len2] = tmp[cursor1]; //  Last elt of run 1 to end of merge
    } else if (len1 == 0) {
        throw new IllegalArgumentException(
            "Comparison method violates its general contract!");
    } else {
        assert len2 == 0;
        assert len1 > 1;
        System.arraycopy(tmp, cursor1, a, dest, len1);
    }
}

ensureCapacity()

// 确保外部数组 tmp 至少具有指定数量的元素，必要时增加其大小。大小呈指数增长，以确保摊销线性时间复杂度
private T[] ensureCapacity(int minCapacity) {
    if (tmpLen < minCapacity) {
        // 计算大于且最接近于32位int的 minCapacity 的2的幂
        // 具体过程后面会解释“如何求接近某个数的下一个二的幂”
        int newSize = minCapacity;
        newSize |= newSize >> 1;
        newSize |= newSize >> 2;
        newSize |= newSize >> 4;
        newSize |= newSize >> 8;
        newSize |= newSize >> 16;
        newSize++;

        if (newSize < 0) // 不太可能
            newSize = minCapacity;
        else
            newSize = Math.min(newSize, a.length >>> 1);

        @SuppressWarnings({"unchecked", "UnnecessaryLocalVariable"})
        T[] newArray = (T[])java.lang.reflect.Array.newInstance
            (a.getClass().getComponentType(), newSize);
        tmp = newArray;
        tmpLen = newSize;
        tmpBase = 0;
    }
    return tmp;
}

如何求接近某个数 k 的下一个二的幂？

现在假设 k 只有一位为 1，取 64

k	k>> 1,2,4,8,16	\|=
1000000	0100000	1100000
1100000	0011000	1111000
1111000	0001111	1111111
1111111	0000000	1111111
1111111	0000000	1111111

通过上面的测试，可以知道该算法的原理是：按照序列 {1,2,4,8,16} 依次右移再按位或，最多可以让 1 的个数 *2。到右移 16 时，1 的个数最多可以达到 32 个，而 Java 中 int 最多取 31 位，所以该算法完全够处理 int 型。既然只有一个 1 都能得到结果，那么如果数字 k 的二进制不止一个 1，肯定也可以得出结果。

DualPivotQuicksort 的实现

/*
 * run[i] 是第 i 次 run 的起始索引 
 */
int[] run = new int[MAX_RUN_COUNT + 1];
// count 表示 run 数组的个数
int count = 0; run[0] = left;

// 检查数组是否部分有序，计算 run 数组
for (int k = left; k < right; run[count] = k) {
    if (a[k] < a[k + 1]) { // 升序
        while (++k <= right && a[k - 1] <= a[k]);
    } else if (a[k] > a[k + 1]) { // 降序
        while (++k <= right && a[k - 1] >= a[k]);
        for (int lo = run[count] - 1, hi = k; ++lo < --hi; ) {
            int t = a[lo]; a[lo] = a[hi]; a[hi] = t;
        }
    } else { // 相等
        for (int m = MAX_RUN_LENGTH; ++k <= right && a[k - 1] == a[k]; ) {
            if (--m == 0) {
                sort(a, left, right, true);
                return;
            }
        }
    }

    /*
     * 数组不是高度结构化的，使用快速排序而不是归并排序
     */
    if (++count == MAX_RUN_COUNT) {
        sort(a, left, right, true);
        return;
    }
}

// Check special cases
// Implementation note: variable "right" is increased by 1.
if (run[count] == right++) { // The last run contains one element
    run[++count] = right;
} else if (count == 1) { // 只有一个 run 了表示排序完毕
    return;
}

// Determine alternation base for merge
byte odd = 0;
for (int n = 1; (n <<= 1) < count; odd ^= 1);

// Use or create temporary array b for merging
int[] b;                 // temp array; alternates with a
int ao, bo;              // array offsets from 'left'
int blen = right - left; // space needed for b
if (work == null || workLen < blen || workBase + blen > work.length) {
    work = new int[blen];
    workBase = 0;
}
if (odd == 0) {
    System.arraycopy(a, left, work, workBase, blen);
    b = a;
    bo = 0;
    a = work;
    ao = workBase - left;
} else {
    b = work;
    ao = 0;
    bo = workBase - left;
}

// Merging
for (int last; count > 1; count = last) {
    for (int k = (last = 0) + 2; k <= count; k += 2) {
        int hi = run[k], mi = run[k - 1];
        for (int i = run[k - 2], p = i, q = mi; i < hi; ++i) {
            if (q >= hi || p < mi && a[p + ao] <= a[q + ao]) {
                b[i + bo] = a[p++ + ao];
            } else {
                b[i + bo] = a[q++ + ao];
            }
        }
        run[++last] = hi;
    }
    if ((count & 1) != 0) {
        for (int i = right, lo = run[count - 1]; --i >= lo;
             b[i + bo] = a[i + ao]
            );
        run[++last] = right;
    }
    int[] t = a; a = b; b = t;
    int o = ao; ao = bo; bo = o;
}

TimSort 与完整的 DualPivotQuicksort 性能比较

随机数数组：

有序数组：

计数排序

按照升序排列

算法描述

找出待排序的数组中最大和最小的元素
统计数组中每个值为 i 的元素出现的次数，存入数组 C 的第 i 项
对所有的计数累加（从 C 中的第一个元素开始，每一项和前一项相加）
反向填充目标数组：将每个元素 i 放在新数组的第 C[i] 项，每放一个元素就将 C[i] 减去1

复杂度


平均时间复杂度	O(n + k)
最坏时间复杂度	O(n + k)
最优时间复杂度	O(n + k)
空间复杂度	O(n + k)

算法特点

不是比较排序

算法实现

public static int[] countSort(int[] a) {
    int b[] = new int[a.length];
    int max = a[0], min = a[0];
    for (int i : a) {
        if (i > max) {
            max = i;
        }
        if (i < min) {
            min = i;
        }
    }
    int k = max - min + 1;
    int c[] = new int[k];
    // 记录第 i 个数出现的次数到 c[i]
    for (int i = 0; i < a.length; ++i) {
        c[a[i] - min] += 1; 
    }
    // 将 c[i] 更新为要放到 b 时的起始下标
    for (int i = 1; i < c.length; ++i) {
        c[i] = c[i] + c[i - 1];
    }
    // 将元素存到 b 中
    for (int i = a.length - 1; i >= 0; --i) {
        b[--c[a[i] - min]] = a[i]; 
    }
    return b;
}

DualPivotQuicksort 的实现

int NUM_SHORT_VALUES = 1 << 16;

int[] count = new int[NUM_SHORT_VALUES];

for (int i = left - 1; ++i <= right;
     count[a[i] - Short.MIN_VALUE]++);
for (int i = NUM_SHORT_VALUES, k = right + 1; k > left; ) {
    while (count[--i] == 0);
    short value = (short) (i + Short.MIN_VALUE);
    int s = count[i];

    do {
        a[--k] = value;
    } while (--s > 0);
}

传统计数排序与 DualPivotQuicksort 的性能比较

因为 DualPivotQuicksort 中的计数排序是针对 short 型数组的，所以测试的时候把传统计数排序的类型也改成 short

算法时间复杂度总结

来源：www.bigocheatsheet.com/

参考

[1]Robert Sedgewick, Kevin Wayne.Algorithms (4th Edition)[M].America: Addison-Wesley Professional, 2011.

[2]Vladimir Yaroslavskiy, Replacement of quicksort in java.util.arrays with new dual-pivot quick- sort, mail.openjdk.java.net/pipermail/c…. html, 2009, Archived version of the discussion in the OpenJDK mailing list.

[3]Oracle.Java Source Code[DB]

本账号所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！