我们有一组数据,以以下格式存储:
Accuracy 26.15%, error rate 0.00%, not classified 73.85% Accuracy 29.68%, error rate 0.00%, not classified 70.32% Accuracy 33.98%, error rate 0.00%, not classified 66.02% Accuracy 35.34%, error rate 0.00%, not classified 64.66% Accuracy 35.75%, error rate 0.00%, not classified 64.25% Accuracy 37.51%, error rate 0.00%, not classified 62.49% Accuracy 38.63%, error rate 0.00%, not classified 61.37% Accuracy 40.81%, error rate 0.00%, not classified 59.19% Accuracy 41.22%, error rate 0.00%, not classified 58.78% Accuracy 41.99%, error rate 0.00%, not classified 58.01% Accuracy 42.34%, error rate 0.00%, not classified 57.66% Accuracy 42.40%, error rate 0.00%, not classified 57.60% Accuracy 43.05%, error rate 0.00%, not classified 56.95% Accuracy 44.29%, error rate 0.00%, not classified 55.71% Accuracy 44.35%, error rate 0.00%, not classified 55.65% Accuracy 44.76%, error rate 0.00%, not classified 55.24% Accuracy 45.29%, error rate 0.00%, not classified 54.71% Accuracy 45.35%, error rate 0.00%, not classified 54.65% Accuracy 95.35%, error rate 4.24%, not classified 0.41% Accuracy 95.76%, error rate 4.24%, not classified 0.00% Stats on test data Accuracy 94.74%, error rate 5.26%, not classified 0.00%
我们希望将这些数据加载到 Pandas dataframe 中,并命名为 'Accuracy', 'Error rate' 和 'Not classified'。同时,我们希望从数据字段中去除非数字字符。
2、解决方案 方法1:使用 pandas.DataFrame.replace()
import pandas as pd
df = pd.read_csv("test.csv", names=['Accuracy', 'Error rate', 'Not classified'])
df.replace(r'[a-zA-Z%]', '', regex=True, inplace=True)
if your ultimate goal is to convert those values to numbers perform
df.apply(pd.to_numeric)
# or do it column by column
df['Accuracy'] = pd.to_numeric(df['Accuracy']) # and so on
方法2:使用 str.replace(r"[a-zA-Z]",'')
import pandas as pd
df = pd.read_csv("test.csv", names=['Accuracy', 'Error rate', 'Not classified'])
df['Accuracy'] = df['Error rate'].str.replace(r"[a-zA-Z]",'')
df['Error rate'] = df['Error rate'].str.replace(r"[a-zA-Z]",'')
df['Not classified'] = df['Not classified'].str.replace(r"[a-zA-Z]",'')
print(df)
演示:
运行结果:
Accuracy Error rate Not classified
0 26.15 0.00 73.85
1 29.68 0.00 70.32
2 33.98 0.00 66.02
3 35.34 0.00 64.66
4 35.75 0.00 64.25
5 37.51 0.00 62.49
6 38.63 0.00 61.37
7 40.81 0.00 59.19
8 41.22 0.00 58.78
9 41.99 0.00 58.01
10 42.34 0.00 57.66
11 42.40 0.00 57.60
12 43.05 0.00 56.95
13 44.29 0.00 55.71
14 44.35 0.00 55.65
15 44.76 0.00 55.24
16 45.29 0.00 54.71
17 45.35 0.00 54.65
18 95.35 4.24 0.41
19 95.76 4.24 0.00
20 94.74 5.26 0.00