Pandas提供的两种排序方法

165 阅读3分钟

Pandas提供了两种排序方法,即按标签排序和按值排序。本文介绍了如何在Pandas中用这两种方法进行排序。

1.潘达斯排序示例DataFrame数据

  1. 下面的代码将创建DataFrame对象的例子用于排序:

    import numpy as np
    
    import pandas as pd
    
    def pandas_dataframe_sorting_example():
        
        # create a 2 dimensional array with 5 rows and 3 columns, each element value is a floating number. 
        data_array = np.random.randn(5, 3)
        
        # the index array contains unsorted index number.
        index_array = [0,2,1,6,3]
        
        # the column name array.
        columns_array = ['column-3','column-1','column-2']
        
        # create the unsorted DataFrame object with the above .
        dataframe_unsorted = pd.DataFrame(data = data_array,index = index_array,columns = columns_array)
        print('dataframe_unsorted\r')
        print(dataframe_unsorted)
        
    
    if __name__ == '__main__':
        
        pandas_dataframe_sorting_example()
    
  2. 下面是上述代码的执行结果,从结果中我们可以看到,行标签和数字元素都没有被排序。让我们在下面的例子中分别用标签排序和数字排序对它们进行操作。

    dataframe_unsorted
       column-3  column-1  column-2
    0  0.395101 -0.051456  0.327673
    2 -1.417987  0.636136 -0.068395
    1  0.088765  0.672521 -0.195716
    6  0.600821  0.814108 -0.086112
    3  1.243266  0.558752 -0.703006
    
    

2.Pandas DataFrame按标签排序的例子

  1. Pandas DataFrame的**sort_index(axis, ascending)**方法可以用来对DataFrame对象按标签排序。

2.1 通过行标签对DataFrame对象进行排序

  1. 当你不向该方法传递任何参数时,它将按行标签以升序对DataFrame对象进行排序。

    dataframe_unsorted.sort_index()
    
  2. 这是因为默认参数的值是0,默认升序参数的值是True

    dataframe_unsorted.sort_index(axis = 0, ascending = True)
    

2.2 按列标签对DataFrame对象进行排序

  1. 如果你想按列标对DataFrame对象进行排序,你可以向sort_index()方法传递axis = 1

    dataframe_sort_by_column_label = dataframe_unsorted.sort_index(axis = 1, ascending=False)
    
  2. 如果你向该方法传递ascending = False参数,它将按降序对列标进行排序。

3.Pandas DataFrame按值排序示例

  1. DataFrame对象的**sort_values(by, kind)**方法可以用来对DataFrame对象的值进行排序。
  2. 参数by是用来指定一列或多列。
  3. kind参数指定了排序算法,它有3个值,它们是heapsortmergesortquicksort。
  4. kind参数只在按一列排序时生效,默认值是quicksort,而mergesort算法是最稳定的选择。

4.Pandas数据框排序示例源代码

  1. 下面是这个例子的完整源代码。

    import pandas as pd
    
    import numpy as np
    
    def pandas_dataframe_sorting_example():
        
        # create a 2 dimensional array with 5 rows and 3 columns, each element value is a floating number. 
        data_array = np.random.randn(5, 3)
        
        # the index array contains unsorted index number.
        index_array = [0,2,1,6,3]
        
        # the column name array.
        columns_array = ['column-3','column-1','column-2']
        
        # create the unsorted DataFrame object with the above .
        dataframe_unsorted = pd.DataFrame(data = data_array,index = index_array,columns = columns_array)
        print('dataframe_unsorted\r')
        print(dataframe_unsorted)
        
        # sort DataFrame by row index label in ascending order.
        dataframe_sort_by_row_index_label_ascending = dataframe_unsorted.sort_index(ascending=True)
        print('\ndataframe_sort_by_row_index_label_ascending = dataframe_unsorted.sort_index(ascending=True)\r')
        print(dataframe_sort_by_row_index_label_ascending)
        
        # sort DataFrame by row index label in descending order.
        dataframe_sort_by_row_index_label_descending = dataframe_unsorted.sort_index(ascending=False)
        print('\ndataframe_sort_by_row_index_label_descending = dataframe_unsorted.sort_index(ascending=False)\r')
        print(dataframe_sort_by_row_index_label_descending)    
        
        # sort DataFrame by column index label.
        dataframe_sort_by_column_label = dataframe_unsorted.sort_index(axis = 1, ascending=False)
        print('\ndataframe_sort_by_column_label = dataframe_unsorted.sort_index(axis = 1, ascending=False)\r')
        print(dataframe_sort_by_column_label)
        
        # sort DataFrame by column value.
        dataframe_sort_by_column_value = dataframe_unsorted.sort_values(by='column-1')
        print('\ndataframe_sort_by_column_value = dataframe_unsorted.sort_values(by=\'column-1\')\r')
        print(dataframe_sort_by_column_value)
        
        # when 2 rows has same colimn-1 value then order by the column-2 value.
        dataframe_sort_by_multiple_columns_value = dataframe_unsorted.sort_values(by=['column-1','column-2'], ascending=False)
        print('\ndataframe_sort_by_multiple_columns_value = dataframe_unsorted.sort_values(by=[\'column-1\',\'column-2\'], ascending=False)\r')
        print(dataframe_sort_by_multiple_columns_value)
        
        dataframe_sorting_algorithm = dataframe_unsorted.sort_values(by='column-1' ,kind='heapsort')
        print('\ndataframe_sorting_algorithm = dataframe_unsorted.sort_values(by=\'column-1\' ,kind=\'heapsort\')\r')
        print (dataframe_sorting_algorithm)
        
        
    
    if __name__ == '__main__':
        
        pandas_dataframe_sorting_example()
    
  2. 下面是上述例子的源代码执行输出:

    dataframe_unsorted
       column-3  column-1  column-2
    0  0.395101 -0.051456  0.327673
    2 -1.417987  0.636136 -0.068395
    1  0.088765  0.672521 -0.195716
    6  0.600821  0.814108 -0.086112
    3  1.243266  0.558752 -0.703006
    
    dataframe_sort_by_row_index_label_ascending = dataframe_unsorted.sort_index(ascending=True)
       column-3  column-1  column-2
    0  0.395101 -0.051456  0.327673
    1  0.088765  0.672521 -0.195716
    2 -1.417987  0.636136 -0.068395
    3  1.243266  0.558752 -0.703006
    6  0.600821  0.814108 -0.086112
    
    dataframe_sort_by_row_index_label_descending = dataframe_unsorted.sort_index(ascending=False)
       column-3  column-1  column-2
    6  0.600821  0.814108 -0.086112
    3  1.243266  0.558752 -0.703006
    2 -1.417987  0.636136 -0.068395
    1  0.088765  0.672521 -0.195716
    0  0.395101 -0.051456  0.327673
    
    dataframe_sort_by_column_label = dataframe_unsorted.sort_index(axis = 1, ascending=False)
       column-3  column-2  column-1
    0  0.395101  0.327673 -0.051456
    2 -1.417987 -0.068395  0.636136
    1  0.088765 -0.195716  0.672521
    6  0.600821 -0.086112  0.814108
    3  1.243266 -0.703006  0.558752
    
    dataframe_sort_by_column_value = dataframe_unsorted.sort_values(by='column-1')
       column-3  column-1  column-2
    0  0.395101 -0.051456  0.327673
    3  1.243266  0.558752 -0.703006
    2 -1.417987  0.636136 -0.068395
    1  0.088765  0.672521 -0.195716
    6  0.600821  0.814108 -0.086112
    
    dataframe_sort_by_multiple_columns_value = dataframe_unsorted.sort_values(by=['column-1','column-2'], ascending=False)
       column-3  column-1  column-2
    6  0.600821  0.814108 -0.086112
    1  0.088765  0.672521 -0.195716
    2 -1.417987  0.636136 -0.068395
    3  1.243266  0.558752 -0.703006
    0  0.395101 -0.051456  0.327673
    
    dataframe_sorting_algorithm = dataframe_unsorted.sort_values(by='column-1' ,kind='heapsort')
       column-3  column-1  column-2
    0  0.395101 -0.051456  0.327673
    3  1.243266  0.558752 -0.703006
    2 -1.417987  0.636136 -0.068395
    1  0.088765  0.672521 -0.195716
    6  0.600821  0.814108 -0.086112