python路径拼接：如果各组件名首字母不包含’/’，则函数会自动加上；如果有一个组件是一个绝对路径，则在它之前的所

python

注意：一个list可以包含不同类型的元素。numpy数组可以作为list的元素。

int(x)python内置函数，返回一个使用数字或字符串x生成的整数对象。

len(s)python内置函数，返回对象的长度（元素个数）。实参可以是序列（如 string、bytes、tuple、list 或 range 等）或集合（如 dictionary、set 或 frozen set 等）。

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

open是python内置函数。用于打开一个文件，并返回对应的file对象，若打开失败则触发OSError。 file参数为绝对路径或当前工作目录的相对路径。

readline(size=-1)是IO方法。从流中读取并返回一行，若指定了size，则最多读取size个字节。

readlines(hint=-1)是IO方法。从流中读取并返回包含多行的列表。可以指定hint来控制要读取的行数。

注意一般使用for line in file: ...即可对文件对象迭代，不必使用file.readlines()。

path = os.path.join(file, file_name)
f = open(path)
title = f.readline()

str.split(sep=None, maxsplit=-1)

split方法是python内置方法。通过指定分隔符对字符串进行切片，返回子串列表。参数sep为分隔符，如果不指定该参数，则默认sep为任意的空字符，如空格，换行符，制表符。参数maxsplit为最大分割次数，如果参数num有指定值，则分割出num+1个子串，无指定值则默认为-1，即分割所有。

str = "Line1-abcdef \nLine2-abc \nLine4-abcd";
print str.split( ); # 以空格为分隔符，包含 \n 
['Line1-abcdef', 'Line2-abc', 'Line4-abcd']
print str.split(' ', 1 ); # 以空格为分隔符，分隔成两个
['Line1-abcdef', '\nLine2-abc \nLine4-abcd']

list.append(x) 在列表末尾添加一个元素。 array.append(x) 添加一个值为x的新项到数组末尾。

注意：列表list是python的内置数据结构，数组array是Python的模块（很少用，被numpy代替）。

路径拼接：

# 连接两个或更多的路径名，返回拼接好的路径名
path_new = os.path.join(file_dir, single_csv)
Path20 = os.path.join(Path1,Path2,Path3)

如果各组件名首字母不包含’/’，则函数会自动加上；如果有一个组件是一个绝对路径，则在它之前的所有组件均会被舍弃；如果最后一个组件为空，则生成的路径以一个’/’分隔符结尾。

numpy库

numpy.append(arr, values, axis=None)numpy也有一个append。参数axis指定values添加的维度，axis=0代表添加到行。

np.unique()Returns the sorted unique elements of an array.

numpy.unique(*ar*, *return_index=False*, *return_inverse=False*, 
             *return_counts=False*, *axis=None*)

参数axis：The axis to operate on. If None, ar will be flattened. If an integer, the subarrays indexed by the given axis will be flattened and treated as the elements of a 1-D array with the dimension of the given axis.

np.argwhere(a)Find the indices of array elements that are non-zero, grouped by element. 注意理解non-zero的意义。

array([[0, 1, 2],
       [3, 4, 5]])
np.argwhere(x>1)
array([[0, 2],
       [1, 0],
       [1, 1],
       [1, 2]])
np.argwhere(x)
array([[0, 1],
       [0, 2],
       [1, 0],
       [1, 1],
       [1, 2]])

pandas库

pd.read_csv()可以读取CSV文件，也可以读取txt文件

# 读取CSV文件
import pandas as pd
single_data_frame = pd.read_csv(os.path.join(file_dir, single_csv), encoding='gbk')

函数返回参数encoding是编码，gbk编码包括所有汉字，UTF-8编码更普遍

txt = pd.read_csv(path, encoding='gbk', skiprows=[0, 1], skipfooter=1, 
                  names=['日期', '开盘', '最高', '最低', '收盘', '成交量', '成交额'])

sep参数表示用来分隔数据的字符，默认为','。

skiprows参数代表文档开头需要忽略的行号或者行数，skiprows=[0, 1]代表忽略第0行和第1行，skiprows=2代表忽略前两行。

skipfooter参数代表文档结尾需要忽略的行，skipfooter=1代表忽略最后一行。

header参数具体用法看官方文档，用作列名和数据开头的行号，如果没有指定names参数，则header=0，即使用第0行作为数据的列名，剩下的行作为数据读入。

names参数用于指定文档数据的列名，若使用names参数，则header参数置为'None'。

。。。。。。

pd.concat()合并操作

all_data_frame = pd.concat([all_data_frame, single_data_frame], ignore_index=True)

pandas库合并操作，类似numpy.concatenate函数。参数ignore_index默认为False，为合并的结构添加标签值，True则不加标签值。

os库 [os 模块提供了非常丰富的方法用来处理文件和目录’](Python OS 文件/目录方法 | 菜鸟教程 (runoob.com))

os.listdir()用于返回指定的文件夹包含的文件或文件夹的名字的列表

all_csv_list = os.listdir(file_dir)

os.open(path, flags, mode=0o777, *, dir_fd=None)

OS（操作系统模块）函数。打开path路径的文件，根据参数flags设置各种标志位，并根据mode设置其权限模式。这个函数适用于底层的 I/O。一般的常规用途使用内置函数open()即可。

读取一个文件夹里的所有CSV文件

file_dir = r".\processed_126_data"  # file directory 数据文件的最高目录
all_csv_list = os.listdir(file_dir)  # get folder list
for single_csv in all_csv_list:
    single_data_frame = pd.read_csv(os.path.join(file_dir, single_csv), encoding='gbk')
    if single_csv == all_csv_list[0]:
        all_data_frame = single_data_frame
    else:
        all_data_frame = pd.concat([all_data_frame, single_data_frame], ignore_index=True)

data = np.array(all_data_frame)