EldenRing DataAnalsis By CloudH2O Lv 研究目标和主要内容: 项目的主要内容包括以下几个方面:
数据收集与清洗:收集《Elden Ring》游戏中的角色属性和装备数据,并对数据进行清洗和预处理,确保数据的质量和可用性。
属性评级分析:对各属性(如力气、灵巧、智力等)的评级频数和占比进行分析,了解不同评级的分布情况,发现评级偏好和趋势。
属性关联分析:探索属性之间的相关性,使用统计方法和可视化工具分析属性之间的关系,如力气与灵巧的关联性、智力与信仰的关联性等。
属性对角色类型的影响:通过建立模型(如决策树、随机森林等),分析属性对角色类型的影响程度,揭示不同属性在角色分类中的重要性和权重。
装备属性分析:分析装备属性的分布情况和对角色战斗能力的影响,通过统计和可视化方法,了解装备属性的重要性和选择策略。
模型评估与性能指标分析:对建立的模型进行评估,计算准确率等性能指标,评估模型的预测能力和泛化能力,为玩家提供准确的决策支持。
数据源:
https://eldenring.wiki.fextralife.com/Weapons+Comparison+Tables
文件读取 1 2 3 4 5 import pandas as pddata = pd.read_csv('ElderRingData.csv' ) data
查询数据类别
输出
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 <class 'pandas.core.frame.DataFrame'> RangeIndex: 307 entries, 0 to 306 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 307 non-null object 1 Type 307 non-null object 2 Phy 307 non-null object 3 Mag 37 non-null object 4 Fir 21 non-null object 5 Lit 4 non-null object 6 Hol 33 non-null object 7 Cri 307 non-null int64 8 Sta 307 non-null int64 9 Str 307 non-null int64 10 Dex 307 non-null int64 11 Int 307 non-null int64 12 Fai 307 non-null int64 13 Arc 307 non-null int64 14 PhyD 282 non-null object 15 MagD 282 non-null object 16 FirD 282 non-null object 17 LitD 282 non-null object 18 HolD 282 non-null object 19 Bst 282 non-null object ... 22 Upgrade 307 non-null object dtypes: int64(7), object(16) memory usage: 55.3+ KB None Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
计算均值、中位数、标准差 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 import pandas as pdimport numpy as npdata = pd.read_csv('ElderRingData.csv' ) rating_dict = {'S' : 6 , 'A' : 5 , 'B' : 4 , 'C' : 3 , 'D' : 2 , 'E' : 1 , '-' : 0 } data['Str' ] = data['Str' ].apply(lambda x: rating_dict[x]) data['Dex' ] = data['Dex' ].apply(lambda x: rating_dict[x]) data['Int' ] = data['Int' ].apply(lambda x: rating_dict[x]) data['Fai' ] = data['Fai' ].apply(lambda x: rating_dict[x]) data['Arc' ] = data['Arc' ].apply(lambda x: rating_dict[x]) data['Mag' ] = data['Mag' ].replace('-' , np.nan) data['Fir' ] = data['Fir' ].replace('-' , np.nan) data['Lit' ] = data['Lit' ].replace('-' , np.nan) data['Hol' ] = data['Hol' ].replace('-' , np.nan) data['PhyD' ] = data['PhyD' ].replace('-' , np.nan) data['MagD' ] = data['MagD' ].replace('-' , np.nan) data['FirD' ] = data['FirD' ].replace('-' , np.nan) data['LitD' ] = data['LitD' ].replace('-' , np.nan) data['HolD' ] = data['HolD' ].replace('-' , np.nan) data['Bst' ] = data['Bst' ].replace('-' , np.nan) data['Rst' ] = data['Rst' ].replace('-' , np.nan) mean = data.mean() median = data.median() std = data.std() print ("均值:" )print (mean)print ("中位数:" )print (median)print ("标准差:" )print (std)
输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 均值: Cri 101.169381 Sta 105.335505 Str 2.491857 Dex 2.237785 Int 0.615635 Fai 0.657980 Arc 0.179153 dtype: float64 中位数: Mag 166.0 Fir 176.0 Lit 149.0 Hol 191.0 Cri 100.0 Sta 100.0 Str 2.0 Dex 2.0 Int 0.0 Fai 0.0 Arc 0.0 PhyD 47.0 MagD 33.0 FirD 31.0 LitD 31.0 ... Int 1.530352 Fai 1.458677 Arc 0.865358 dtype: float64
统计每种武器类型的数量和平均物理伤害 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 import pandas as pdimport numpy as npdata = pd.read_csv('ElderRingData.csv' ) rating_dict = {'S' : 6 , 'A' : 5 , 'B' : 4 , 'C' : 3 , 'D' : 2 , 'E' : 1 , '-' : 0 } data['Str' ] = data['Str' ].apply(lambda x: rating_dict[x]) data['Dex' ] = data['Dex' ].apply(lambda x: rating_dict[x]) data['Int' ] = data['Int' ].apply(lambda x: rating_dict[x]) data['Fai' ] = data['Fai' ].apply(lambda x: rating_dict[x]) data['Arc' ] = data['Arc' ].apply(lambda x: rating_dict[x]) cols_to_replace = ['Mag' , 'Fir' , 'Lit' , 'Hol' , 'PhyD' , 'MagD' , 'FirD' , 'LitD' , 'HolD' , 'Bst' , 'Rst' ] data[cols_to_replace] = data[cols_to_replace].replace('-' , np.nan) data['PhyD' ] = data['PhyD' ].astype(float ) data['MagD' ] = data['MagD' ].astype(float ) data['FirD' ] = data['FirD' ].astype(float ) weapon_count = data.groupby(['Type' ])['Name' ].count() weapon_phy_mean = data.groupby(['Type' ])['PhyD' ].mean() print ("每种武器类型的数量:" )print (weapon_count)print ("每种武器类型的平均物理伤害:" )print (weapon_phy_mean)
输出结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 <class 'pandas.core.frame.DataFrame'> RangeIndex: 307 entries, 0 to 306 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 307 non-null object 1 Type 307 non-null object 2 Phy 307 non-null object 3 Mag 37 non-null object 4 Fir 21 non-null object 5 Lit 4 non-null object 6 Hol 33 non-null object 7 Cri 307 non-null int64 8 Sta 307 non-null int64 9 Str 307 non-null int64 10 Dex 307 non-null int64 11 Int 307 non-null int64 12 Fai 307 non-null int64 13 Arc 307 non-null int64 14 PhyD 282 non-null object 15 MagD 282 non-null object 16 FirD 282 non-null object 17 LitD 282 non-null object 18 HolD 282 non-null object 19 Bst 282 non-null object ... 22 Upgrade 307 non-null object dtypes: int64(7), object(16) memory usage: 55.3+ KB None Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings... 均值: Cri 101.169381 Sta 105.335505 Str 2.491857 Dex 2.237785 Int 0.615635 Fai 0.657980 Arc 0.179153 dtype: float64 中位数: Mag 166.0 Fir 176.0 Lit 149.0 Hol 191.0 Cri 100.0 Sta 100.0 Str 2.0 Dex 2.0 Int 0.0 Fai 0.0 Arc 0.0 PhyD 47.0 MagD 33.0 FirD 31.0 LitD 31.0 ... Int 1.530352 Fai 1.458677 Arc 0.865358 dtype: float64 Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings... Mag 166.0 Fir 176.0 Lit 149.0 Hol 191.0 Cri 100.0 Sta 100.0 Str 2.0 Dex 2.0 Int 0.0 Fai 0.0 Arc 0.0 PhyD 47.0 MagD 33.0 FirD 31.0 LitD 31.0 HolD 31.0 Bst 36.3 Rst 15.0 dtype: float64 Cri 4.421129 Sta 41.609782 Str 1.106619 Dex 1.251981 Int 1.530352 Fai 1.458677 Arc 0.865358 PhyD 16.214128 MagD 10.262827 FirD 9.266148 dtype: float64 每种武器类型的数量: Type Axe 12 Ballista 2 Bow 7 Claw 4 Colossal Sword 11 Colossal Weapon 15 Crossbow 7 Curved Greatsword 9 Curved Sword 14 Dagger 16 Fist 9 Flail 5 Glintstone Staff 17 Great Spear 6 Greataxe 12 Greatbow 4 Greatsword 20 Halberd 16 Hammer 15 Heavy Thrusting Sword 4 Katana 8 Light Bow 5 Reaper 4 ... Twinblade 42.000000 Warhammer 66.142857 Whip 25.166667 Name: PhyD, dtype: float64 Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
查看排列 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import pandas as pdimport numpy as npdata = pd.read_csv('ElderRingData.csv' ) rating_dict = {'S' : 6 , 'A' : 5 , 'B' : 4 , 'C' : 3 , 'D' : 2 , 'E' : 1 , '-' : 0 } data['Str' ] = data['Str' ].apply(lambda x: rating_dict[x]) data['Dex' ] = data['Dex' ].apply(lambda x: rating_dict[x]) data['Int' ] = data['Int' ].apply(lambda x: rating_dict[x]) data['Fai' ] = data['Fai' ].apply(lambda x: rating_dict[x]) data['Arc' ] = data['Arc' ].apply(lambda x: rating_dict[x]) cols_to_replace = ['Mag' , 'Fir' , 'Lit' , 'Hol' , 'PhyD' , 'MagD' , 'FirD' , 'LitD' , 'HolD' , 'Bst' , 'Rst' ] data[cols_to_replace] = data[cols_to_replace].replace('-' , np.nan) data['PhyD' ] = data['PhyD' ].astype(float ) data['MagD' ] = data['MagD' ].astype(float ) data['FirD' ] = data['FirD' ].astype(float ) data
输出结果如下:
Name
Type
Phy
Mag
Fir
Lit
Hol
Cri
Sta
Str
…
Arc
PhyD
MagD
FirD
LitD
HolD
Bst
Rst
Wgt
Upgrade
0
Academy Glintstone Staff
Glintstone Staff
43
NaN
NaN
NaN
NaN
100
40
2
…
0
25.0
15.0
15.0
15
15
15
10
3
Smithing Stones
1
Alabaster Lord’s Sword
Greatsword
313
93
NaN
NaN
NaN
100
126
4
…
0
56.0
33.0
27.0
27
27
42.9
19
8
Somber Smithing Stones
2
Albinauric Bow
Bow
200
NaN
NaN
NaN
NaN
100
60
1
…
0
NaN
NaN
NaN
NaN
NaN
NaN
NaN
4.5
Smithing Stones
3
Albinauric Staff
Glintstone Staff
29
NaN
NaN
NaN
NaN
100
38
2
…
6
23.0
14.0
14.0
14
14
14
9
2.5
Smithing Stones
4
Antspur Rapier
Thrusting Sword
240
NaN
NaN
NaN
NaN
100
62
2
…
0
47.0
31.0
31.0
31
31
25.2
10
3
Smithing Stones
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
302
Wing of Astel
Curved Sword
159
191
NaN
NaN
NaN
100
84
1
…
0
28.0
52.0
23.0
23
23
25.3
9
2.5
Somber Smithing Stones
303
Winged Greathorn
Greataxe
318
NaN
NaN
NaN
NaN
100
150
4
…
0
65.0
35.0
35.0
35
35
46.2
20
11
Somber Smithing Stones
304
Winged Scythe
Reaper
213
NaN
NaN
NaN
254
100
110
2
…
0
30.0
25.0
25.0
25
55
33
15
7.5
Somber Smithing Stones
305
Zamor Curved Sword
Curved Greatsword
306
NaN
NaN
NaN
NaN
100
128
3
…
0
61.0
33.0
33.0
33
33
42.9
19
9
Somber Smithing Stones
306
Zweihander
Colossal Sword
345
NaN
NaN
NaN
NaN
100
126
2
…
0
67.0
40.0
40.0
40
40
54
22
15.5
Smithing Stones
307 rows × 23 columns
评级平均值和标准差 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 import pandas as pdimport numpy as npdata = pd.read_csv('ElderRingData.csv' ) rating_dict = {'S' : 6 , 'A' : 5 , 'B' : 4 , 'C' : 3 , 'D' : 2 , 'E' : 1 , '-' : 0 } data['Str' ] = data['Str' ].apply(lambda x: rating_dict[x]) data['Dex' ] = data['Dex' ].apply(lambda x: rating_dict[x]) data['Int' ] = data['Int' ].apply(lambda x: rating_dict[x]) data['Fai' ] = data['Fai' ].apply(lambda x: rating_dict[x]) data['Arc' ] = data['Arc' ].apply(lambda x: rating_dict[x]) cols_to_replace = ['Mag' , 'Fir' , 'Lit' , 'Hol' , 'PhyD' , 'MagD' , 'FirD' , 'LitD' , 'HolD' , 'Bst' , 'Rst' ] data[cols_to_replace] = data[cols_to_replace].replace('-' , np.nan) data['PhyD' ] = data['PhyD' ].astype(float ) data['MagD' ] = data['MagD' ].astype(float ) data['FirD' ] = data['FirD' ].astype(float ) attrs = ['Str' , 'Dex' , 'Int' , 'Fai' , 'Arc' ] for attr in attrs: counts = data[attr].value_counts() freqs = counts / counts.sum () print (f"属性{attr} 评级频数:\n{counts} \n" ) print (f"属性{attr} 评级占比:\n{freqs} \n" ) attr_cols = ['Str' , 'Dex' , 'Int' , 'Fai' , 'Arc' ] attr_means = data[attr_cols].mean() attr_stds = data[attr_cols].std() print ("属性评级平均值:" )print (attr_means)print ("\n属性评级标准差:" )print (attr_stds)
输出结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 属性Str评级频数: 2 109 3 94 4 52 1 31 0 16 5 4 6 1 Name: Str, dtype: int64 属性Str评级占比: 2 0.355049 3 0.306189 4 0.169381 1 0.100977 0 0.052117 5 0.013029 6 0.003257 Name: Str, dtype: float64 属性Dex评级频数: 2 106 3 79 4 52 0 44 ... Int 1.530352 Fai 1.458677 Arc 0.865358 dtype: float64
随机森林模型 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.preprocessing import LabelEncoderfrom sklearn.metrics import accuracy_scoredata = pd.read_csv("ElderRingData.csv" ) data = data.drop(["Name" , "Mag" , "Fir" , "Lit" , "Hol" , "PhyD" , "MagD" , "FirD" , "LitD" , "HolD" , "Upgrade" ], axis=1 ) data = data.dropna() le = LabelEncoder() for col in data.columns: if data[col].dtype == "object" : data[col] = le.fit_transform(data[col]) X = data.drop(["Type" ], axis=1 ) y = data["Type" ] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2 , random_state=0 ) rf = RandomForestClassifier(n_estimators=100 , random_state=0 ) rf.fit(X_train, y_train) y_pred = rf.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print ("Accuracy:" , accuracy)
预测准确度:
1 Accuracy: 0.7903225806451613