EldenRing DataAnalsis

By CloudH2O Lv

研究目标和主要内容:

项目的主要内容包括以下几个方面:

数据收集与清洗:收集《Elden Ring》游戏中的角色属性和装备数据,并对数据进行清洗和预处理,确保数据的质量和可用性。

属性评级分析:对各属性(如力气、灵巧、智力等)的评级频数和占比进行分析,了解不同评级的分布情况,发现评级偏好和趋势。

属性关联分析:探索属性之间的相关性,使用统计方法和可视化工具分析属性之间的关系,如力气与灵巧的关联性、智力与信仰的关联性等。

属性对角色类型的影响:通过建立模型(如决策树、随机森林等),分析属性对角色类型的影响程度,揭示不同属性在角色分类中的重要性和权重。

装备属性分析:分析装备属性的分布情况和对角色战斗能力的影响,通过统计和可视化方法,了解装备属性的重要性和选择策略。

模型评估与性能指标分析:对建立的模型进行评估,计算准确率等性能指标,评估模型的预测能力和泛化能力,为玩家提供准确的决策支持。

数据源:

https://eldenring.wiki.fextralife.com/Weapons+Comparison+Tables

文件读取

1
2
3
4
5
import pandas as pd

# 读取csv文件
data = pd.read_csv('ElderRingData.csv')
data

查询数据类别

1
print(data.info())

输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307 entries, 0 to 306
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 307 non-null object
1 Type 307 non-null object
2 Phy 307 non-null object
3 Mag 37 non-null object
4 Fir 21 non-null object
5 Lit 4 non-null object
6 Hol 33 non-null object
7 Cri 307 non-null int64
8 Sta 307 non-null int64
9 Str 307 non-null int64
10 Dex 307 non-null int64
11 Int 307 non-null int64
12 Fai 307 non-null int64
13 Arc 307 non-null int64
14 PhyD 282 non-null object
15 MagD 282 non-null object
16 FirD 282 non-null object
17 LitD 282 non-null object
18 HolD 282 non-null object
19 Bst 282 non-null object
...
22 Upgrade 307 non-null object
dtypes: int64(7), object(16)
memory usage: 55.3+ KB
None
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

计算均值、中位数、标准差

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import pandas as pd
import numpy as np

# 读取csv文件
data = pd.read_csv('ElderRingData.csv')

# 将评级列转换为数值,按照S到E的顺序进行编号
rating_dict = {'S': 6, 'A': 5, 'B': 4, 'C': 3, 'D': 2, 'E': 1, '-': 0}
data['Str'] = data['Str'].apply(lambda x: rating_dict[x])
data['Dex'] = data['Dex'].apply(lambda x: rating_dict[x])
data['Int'] = data['Int'].apply(lambda x: rating_dict[x])
data['Fai'] = data['Fai'].apply(lambda x: rating_dict[x])
data['Arc'] = data['Arc'].apply(lambda x: rating_dict[x])

# 将Mag Fir Lit Hol PhyD MagD FirD LitD HolD Bst Rst的值为-时都用NaN替换
data['Mag'] = data['Mag'].replace('-', np.nan)
data['Fir'] = data['Fir'].replace('-', np.nan)
data['Lit'] = data['Lit'].replace('-', np.nan)
data['Hol'] = data['Hol'].replace('-', np.nan)
data['PhyD'] = data['PhyD'].replace('-', np.nan)
data['MagD'] = data['MagD'].replace('-', np.nan)
data['FirD'] = data['FirD'].replace('-', np.nan)
data['LitD'] = data['LitD'].replace('-', np.nan)
data['HolD'] = data['HolD'].replace('-', np.nan)
data['Bst'] = data['Bst'].replace('-', np.nan)
data['Rst'] = data['Rst'].replace('-', np.nan)

# 计算每个属性的均值、中位数和标准差
mean = data.mean()
median = data.median()
std = data.std()

# 输出结果
print("均值:")
print(mean)
print("中位数:")
print(median)
print("标准差:")
print(std)

输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
均值:
Cri 101.169381
Sta 105.335505
Str 2.491857
Dex 2.237785
Int 0.615635
Fai 0.657980
Arc 0.179153
dtype: float64
中位数:
Mag 166.0
Fir 176.0
Lit 149.0
Hol 191.0
Cri 100.0
Sta 100.0
Str 2.0
Dex 2.0
Int 0.0
Fai 0.0
Arc 0.0
PhyD 47.0
MagD 33.0
FirD 31.0
LitD 31.0
...
Int 1.530352
Fai 1.458677
Arc 0.865358
dtype: float64

统计每种武器类型的数量和平均物理伤害

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import pandas as pd
import numpy as np

# 读取csv文件
data = pd.read_csv('ElderRingData.csv')

# 将评级列转换为数值,按照S到E的顺序进行编号
rating_dict = {'S': 6, 'A': 5, 'B': 4, 'C': 3, 'D': 2, 'E': 1, '-': 0}
data['Str'] = data['Str'].apply(lambda x: rating_dict[x])
data['Dex'] = data['Dex'].apply(lambda x: rating_dict[x])
data['Int'] = data['Int'].apply(lambda x: rating_dict[x])
data['Fai'] = data['Fai'].apply(lambda x: rating_dict[x])
data['Arc'] = data['Arc'].apply(lambda x: rating_dict[x])

# 将Mag Fir Lit Hol PhyD MagD FirD LitD HolD Bst Rst的值为-时都用NaN替换
cols_to_replace = ['Mag', 'Fir', 'Lit', 'Hol', 'PhyD', 'MagD', 'FirD', 'LitD', 'HolD', 'Bst', 'Rst']
data[cols_to_replace] = data[cols_to_replace].replace('-', np.nan)

#object列数值类型换成float
data['PhyD'] = data['PhyD'].astype(float)
data['MagD'] = data['MagD'].astype(float)
data['FirD'] = data['FirD'].astype(float)


# 统计每种武器类型的数量和平均物理伤害
weapon_count = data.groupby(['Type'])['Name'].count()
weapon_phy_mean = data.groupby(['Type'])['PhyD'].mean()

# 输出结果
print("每种武器类型的数量:")
print(weapon_count)
print("每种武器类型的平均物理伤害:")
print(weapon_phy_mean)

输出结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307 entries, 0 to 306
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 307 non-null object
1 Type 307 non-null object
2 Phy 307 non-null object
3 Mag 37 non-null object
4 Fir 21 non-null object
5 Lit 4 non-null object
6 Hol 33 non-null object
7 Cri 307 non-null int64
8 Sta 307 non-null int64
9 Str 307 non-null int64
10 Dex 307 non-null int64
11 Int 307 non-null int64
12 Fai 307 non-null int64
13 Arc 307 non-null int64
14 PhyD 282 non-null object
15 MagD 282 non-null object
16 FirD 282 non-null object
17 LitD 282 non-null object
18 HolD 282 non-null object
19 Bst 282 non-null object
...
22 Upgrade 307 non-null object
dtypes: int64(7), object(16)
memory usage: 55.3+ KB
None
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
均值:
Cri 101.169381
Sta 105.335505
Str 2.491857
Dex 2.237785
Int 0.615635
Fai 0.657980
Arc 0.179153
dtype: float64
中位数:
Mag 166.0
Fir 176.0
Lit 149.0
Hol 191.0
Cri 100.0
Sta 100.0
Str 2.0
Dex 2.0
Int 0.0
Fai 0.0
Arc 0.0
PhyD 47.0
MagD 33.0
FirD 31.0
LitD 31.0
...
Int 1.530352
Fai 1.458677
Arc 0.865358
dtype: float64
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
Mag 166.0
Fir 176.0
Lit 149.0
Hol 191.0
Cri 100.0
Sta 100.0
Str 2.0
Dex 2.0
Int 0.0
Fai 0.0
Arc 0.0
PhyD 47.0
MagD 33.0
FirD 31.0
LitD 31.0
HolD 31.0
Bst 36.3
Rst 15.0
dtype: float64
Cri 4.421129
Sta 41.609782
Str 1.106619
Dex 1.251981
Int 1.530352
Fai 1.458677
Arc 0.865358
PhyD 16.214128
MagD 10.262827
FirD 9.266148
dtype: float64
每种武器类型的数量:
Type
Axe 12
Ballista 2
Bow 7
Claw 4
Colossal Sword 11
Colossal Weapon 15
Crossbow 7
Curved Greatsword 9
Curved Sword 14
Dagger 16
Fist 9
Flail 5
Glintstone Staff 17
Great Spear 6
Greataxe 12
Greatbow 4
Greatsword 20
Halberd 16
Hammer 15
Heavy Thrusting Sword 4
Katana 8
Light Bow 5
Reaper 4
...
Twinblade 42.000000
Warhammer 66.142857
Whip 25.166667
Name: PhyD, dtype: float64
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

查看排列

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import pandas as pd
import numpy as np

# 读取csv文件
data = pd.read_csv('ElderRingData.csv')

# 将评级列转换为数值,按照S到E的顺序进行编号
rating_dict = {'S': 6, 'A': 5, 'B': 4, 'C': 3, 'D': 2, 'E': 1, '-': 0}
data['Str'] = data['Str'].apply(lambda x: rating_dict[x])
data['Dex'] = data['Dex'].apply(lambda x: rating_dict[x])
data['Int'] = data['Int'].apply(lambda x: rating_dict[x])
data['Fai'] = data['Fai'].apply(lambda x: rating_dict[x])
data['Arc'] = data['Arc'].apply(lambda x: rating_dict[x])

# 将Mag Fir Lit Hol PhyD MagD FirD LitD HolD Bst Rst的值为-时都用NaN替换
cols_to_replace = ['Mag', 'Fir', 'Lit', 'Hol', 'PhyD', 'MagD', 'FirD', 'LitD', 'HolD', 'Bst', 'Rst']
data[cols_to_replace] = data[cols_to_replace].replace('-', np.nan)

#object列数值类型换成float
data['PhyD'] = data['PhyD'].astype(float)
data['MagD'] = data['MagD'].astype(float)
data['FirD'] = data['FirD'].astype(float)

data

输出结果如下:

Name Type Phy Mag Fir Lit Hol Cri Sta Str Arc PhyD MagD FirD LitD HolD Bst Rst Wgt Upgrade
0 Academy Glintstone Staff Glintstone Staff 43 NaN NaN NaN NaN 100 40 2 0 25.0 15.0 15.0 15 15 15 10 3 Smithing Stones
1 Alabaster Lord’s Sword Greatsword 313 93 NaN NaN NaN 100 126 4 0 56.0 33.0 27.0 27 27 42.9 19 8 Somber Smithing Stones
2 Albinauric Bow Bow 200 NaN NaN NaN NaN 100 60 1 0 NaN NaN NaN NaN NaN NaN NaN 4.5 Smithing Stones
3 Albinauric Staff Glintstone Staff 29 NaN NaN NaN NaN 100 38 2 6 23.0 14.0 14.0 14 14 14 9 2.5 Smithing Stones
4 Antspur Rapier Thrusting Sword 240 NaN NaN NaN NaN 100 62 2 0 47.0 31.0 31.0 31 31 25.2 10 3 Smithing Stones
302 Wing of Astel Curved Sword 159 191 NaN NaN NaN 100 84 1 0 28.0 52.0 23.0 23 23 25.3 9 2.5 Somber Smithing Stones
303 Winged Greathorn Greataxe 318 NaN NaN NaN NaN 100 150 4 0 65.0 35.0 35.0 35 35 46.2 20 11 Somber Smithing Stones
304 Winged Scythe Reaper 213 NaN NaN NaN 254 100 110 2 0 30.0 25.0 25.0 25 55 33 15 7.5 Somber Smithing Stones
305 Zamor Curved Sword Curved Greatsword 306 NaN NaN NaN NaN 100 128 3 0 61.0 33.0 33.0 33 33 42.9 19 9 Somber Smithing Stones
306 Zweihander Colossal Sword 345 NaN NaN NaN NaN 100 126 2 0 67.0 40.0 40.0 40 40 54 22 15.5 Smithing Stones

307 rows × 23 columns

评级平均值和标准差

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import pandas as pd
import numpy as np

# 读取csv文件
data = pd.read_csv('ElderRingData.csv')

# 将评级列转换为数值,按照S到E的顺序进行编号
rating_dict = {'S': 6, 'A': 5, 'B': 4, 'C': 3, 'D': 2, 'E': 1, '-': 0}
data['Str'] = data['Str'].apply(lambda x: rating_dict[x])
data['Dex'] = data['Dex'].apply(lambda x: rating_dict[x])
data['Int'] = data['Int'].apply(lambda x: rating_dict[x])
data['Fai'] = data['Fai'].apply(lambda x: rating_dict[x])
data['Arc'] = data['Arc'].apply(lambda x: rating_dict[x])

# 将Mag Fir Lit Hol PhyD MagD FirD LitD HolD Bst Rst的值为-时都用NaN替换
cols_to_replace = ['Mag', 'Fir', 'Lit', 'Hol', 'PhyD', 'MagD', 'FirD', 'LitD', 'HolD', 'Bst', 'Rst']
data[cols_to_replace] = data[cols_to_replace].replace('-', np.nan)

#object列数值类型换成float
data['PhyD'] = data['PhyD'].astype(float)
data['MagD'] = data['MagD'].astype(float)
data['FirD'] = data['FirD'].astype(float)

# 统计每个属性的评级频数和占比
attrs = ['Str', 'Dex', 'Int', 'Fai', 'Arc']
for attr in attrs:
counts = data[attr].value_counts()
freqs = counts / counts.sum()
print(f"属性{attr}评级频数:\n{counts}\n")
print(f"属性{attr}评级占比:\n{freqs}\n")

# 统计每个属性的评级平均值和标准差
attr_cols = ['Str', 'Dex', 'Int', 'Fai', 'Arc']
attr_means = data[attr_cols].mean()
attr_stds = data[attr_cols].std()
print("属性评级平均值:")
print(attr_means)
print("\n属性评级标准差:")
print(attr_stds)

输出结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
属性Str评级频数:
2 109
3 94
4 52
1 31
0 16
5 4
6 1
Name: Str, dtype: int64

属性Str评级占比:
2 0.355049
3 0.306189
4 0.169381
1 0.100977
0 0.052117
5 0.013029
6 0.003257
Name: Str, dtype: float64

属性Dex评级频数:
2 106
3 79
4 52
0 44
...
Int 1.530352
Fai 1.458677
Arc 0.865358
dtype: float64

随机森林模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

# 读取数据集
data = pd.read_csv("ElderRingData.csv")

# 删除不需要的列
data = data.drop(["Name", "Mag", "Fir", "Lit", "Hol", "PhyD", "MagD", "FirD", "LitD", "HolD", "Upgrade"], axis=1)

# 处理缺失值
data = data.dropna()

# 将文本数据转换成数值数据
le = LabelEncoder()
for col in data.columns:
if data[col].dtype == "object":
data[col] = le.fit_transform(data[col])

# 将数据分为训练集和测试集
X = data.drop(["Type"], axis=1)
y = data["Type"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# 构建随机森林模型并进行训练
rf = RandomForestClassifier(n_estimators=100, random_state=0)
rf.fit(X_train, y_train)

# 进行预测并计算准确率
y_pred = rf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

预测准确度:

1
Accuracy: 0.7903225806451613