Note
期刊算法解釋:
Step 1:先把 ECG 分段,取每段最後 1 分鐘
每個刺激段落都切出一些 1 分鐘 epoch,然後每個 epoch 算一組 HRV 特徵(F)。
Step 2:對每個受試者 s,先收集「平靜狀態」下的特徵
他寫的這個集合:
意思是:
同一個人 s 在 calm 狀態下,有 n 個 1 分鐘時段,每個時段都會算出一個特徵值(比如 RMSSD)。所以你會得到 n 個 RMSSD 值。
Step 3:用 calm 的 n 個值算「這個人的基線平均」和「這個人的基線變動」
他那兩個式子就是:
- 基線平均(個體內均值)
- 基線標準差(個體內變異)
白話:
「這個人平靜時 RMSSD 平均是多少、平靜時 RMSSD 自己本來就會飄多少」
Step 4:把任何刺激/情緒 epoch 的特徵,轉成 z-score
白話:
「這個人在某個刺激(快樂/悲傷)的 RMSSD,比他自己 calm 平均高/低多少個『他自己的標準差』」
計算結果:
Lasso Ranking
HRV_RecAnx_RMSSD有相關:- 失眠組在「焦慮恢復期」的 RMSSD 回彈較小(恢復能力較弱)
- LASSO 會給負係數 👉 數值越小 → 越偏向失眠組(正類)。
- RMSSD:相鄰兩次心跳間期差異的平均變動大小 👉 反應副交感神經
/Users/yuchi/PycharmProjects/PsyMl_ISI/.venv/bin/python /Users/yuchi/PycharmProjects/PsyMl_ISI/ML/tools/lasso_ranking/lasso_ranking.py
[模式] 允許模式
[資料] 來源=isi_raw_data_hrv_recalc_zscore 目標=3TP 樣本=33 特徵=9
已排除群組:['ACS', 'CPT', 'EEG', 'IGT', 'ISI', 'PSQI', 'WM']
總缺值比例:18.18%
[CV結果](分數 = neg_log_loss,越大越好 → log_loss 越小)
lambda_min:C = 0.240593 | lambda = 4.15639
lambda_1SE:C = 0.0316228 | lambda = 31.6228
[選入變項(1SE)] 以 |係數| 排序(前 30)
(無變項被選入;可放寬正則或檢查特徵)
[對照] lambda_min 非零變項數:1,lambda_1SE 非零變項數:0
[選入變項(lambda_min)] 以 |係數| 排序(前 30)
coef abs_coef
HRV_RecAnx_RMSSD -0.009187 0.009187
[Top 10(路徑峰值)] 不綁定單一 C
HRV_RecAnx_RMSSD
HRV_ReturnAnx_RMSSD
HRV_RecPos_RMSSD
HRV_ReturnPos_RMSSD
HRV_ReturnPos_MeanNN
HRV_DeltaAttn_SD1
HRV_DeltaAttn_MeanNN
HRV_DeltaAttn_RMSSD
HRV_DeltaAnx_RMSSD
統計算法:
- HRV_RecAnx_RMSSD r = -0.282, mean(y=1) = -1.9006, mean(y=0) = 1.2529
- HRV_ReturnAnx_RMSSD r = -0.249, mean(y=1) = -0.3475, mean(y=0) = 1.4273
- HRV_ReturnPos_RMSSD r = -0.205, mean(y=1) = -1.0670, mean(y=0) = 1.1761
- HRV_DeltaAttn_MeanNN r = 0.121
- HRV_ReturnPos_MeanNN r = 0.105
Normal Ranking (單變項排序)
/Users/yuchi/PycharmProjects/PsyMl_ISI/.venv/bin/python /Users/yuchi/PycharmProjects/PsyMl_ISI/ML/tools/normal_ranking.py
========================================================================
[3TP] 來源:isi_raw_data_hrv_recalc_zscore(已在 SQL where 過濾 3TP IS NOT NULL)
[概況] 列=33 欄=1026 總缺值=11390
[目標 '3TP'] 樣本數:33
- 類別 0: 19 (57.6%)
- 類別 1: 14 (42.4%)
------------------------------------------------------------------------
[3TP] 模式=允許 | 允許欄位數=9
- 實際可用特徵數:9
- 排序依據:info_gain | Top 9
info_gain gain_ratio SU gini_decrease anova_f chi2
HRV_DeltaAttn_RMSSD 0.481 0.199 0.283 0.154 0.004 13.500
HRV_DeltaAttn_SD1 0.481 0.199 0.283 0.000 0.004 13.500
HRV_RecPos_RMSSD 0.476 0.183 0.265 0.153 0.054 13.500
HRV_ReturnPos_RMSSD 0.438 0.181 0.257 0.167 1.107 12.263
HRV_RecAnx_RMSSD 0.335 0.131 0.189 0.303 2.236 9.518
HRV_ReturnAnx_RMSSD 0.330 0.210 0.257 0.000 1.624 9.000
HRV_ReturnPos_MeanNN 0.268 0.095 0.140 0.000 0.296 7.762
HRV_DeltaAnx_RMSSD 0.158 0.086 0.112 0.222 0.276 5.439
HRV_DeltaAttn_MeanNN 0.136 0.059 0.082 0.000 0.353 4.310
进程已结束,退出代码为 0
ML結果
單獨BAI_T1, BDI_T1
[資料] 來源=isi_raw_data_hrv_recalc_zscore 目標=3TP 列數=27 特徵數=2
[特徵] 使用欄位(前 15):['BDI_T1', 'BAI_T1']
[CV] Stratified 5-fold, seed=43 | class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted
[Leakage check] Class balance
count percent%
3TP
0 15 55.6
1 12 44.4
[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。
=== Basic ML Benchmark (Stratified 5-fold CV) ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
LogisticRegression 0.911 0.762 0.889 0.667 0.848 0.778 0.933 0.632 0.815 1.800 3.600
RandomForest 0.900 0.696 0.727 0.667 0.774 0.750 0.800 0.472 0.741 2.200 3.200
XGBoost 0.897 0.727 0.800 0.667 0.812 0.765 0.867 0.549 0.778 2.000 3.400
NaiveBayes 0.889 0.762 0.889 0.667 0.848 0.778 0.933 0.632 0.815 1.800 3.600
KNN 0.881 0.783 0.818 0.750 0.839 0.812 0.867 0.624 0.815 2.200 3.200
SVM 0.867 0.762 0.889 0.667 0.848 0.778 0.933 0.632 0.815 1.800 3.600
MLP 0.767 0.632 0.462 1.000 0.125 1.000 0.067 0.175 0.481 5.200 0.200
DecisionTree 0.725 0.667 0.778 0.583 0.788 0.722 0.867 0.474 0.741 1.800 3.600
--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
model TN_sum FP_sum FN_sum TP_sum
LogisticRegression 14 1 4 8
RandomForest 12 3 4 8
XGBoost 13 2 4 8
NaiveBayes 14 1 4 8
KNN 13 2 3 9
SVM 14 1 4 8
MLP 1 14 0 12
DecisionTree 13 2 5 7
加上HRV_RecAnx_RMSSD
[資料] 來源=isi_raw_data_hrv_recalc_zscore 目標=3TP 列數=27 特徵數=3
[特徵] 使用欄位(前 15):['BDI_T1', 'BAI_T1', 'HRV_RecAnx_RMSSD']
[CV] Stratified 5-fold, seed=43 | class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted
[Leakage check] Class balance
count percent%
3TP
0 15 55.6
1 12 44.4
[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。
=== Basic ML Benchmark (Stratified 5-fold CV) ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
MLP 0.983 0.857 0.750 1.000 0.846 1.000 0.733 0.742 0.852 3.200 2.200
LogisticRegression 0.961 0.870 0.909 0.833 0.903 0.875 0.933 0.775 0.889 2.200 3.200
KNN 0.925 0.762 0.889 0.667 0.848 0.778 0.933 0.632 0.815 1.800 3.600
SVM 0.917 0.870 0.909 0.833 0.903 0.875 0.933 0.775 0.889 2.200 3.200
RandomForest 0.917 0.818 0.900 0.750 0.875 0.824 0.933 0.703 0.852 2.000 3.400
XGBoost 0.894 0.750 0.750 0.750 0.800 0.800 0.800 0.550 0.778 2.400 3.000
NaiveBayes 0.861 0.727 0.800 0.667 0.812 0.765 0.867 0.549 0.778 2.000 3.400
DecisionTree 0.650 0.571 0.667 0.500 0.727 0.667 0.800 0.316 0.667 1.800 3.600
--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
model TN_sum FP_sum FN_sum TP_sum
MLP 11 4 0 12
LogisticRegression 14 1 2 10
KNN 14 1 4 8
SVM 14 1 2 10
RandomForest 14 1 3 9
XGBoost 12 3 3 9
NaiveBayes 13 2 4 8
DecisionTree 12 3 6 6
加上HRV_ReturnAnx_RMSSD, HRV_RecAnx_RMSSD
[資料] 來源=isi_raw_data_hrv_recalc_zscore 目標=3TP 列數=27 特徵數=4
[特徵] 使用欄位(前 15):['BDI_T1', 'BAI_T1', 'HRV_RecAnx_RMSSD', 'HRV_ReturnAnx_RMSSD']
[CV] Stratified 5-fold, seed=43 | class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted
[Leakage check] Class balance
count percent%
3TP
0 15 55.6
1 12 44.4
[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。
=== Basic ML Benchmark (Stratified 5-fold CV) ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
LogisticRegression 0.956 0.818 0.900 0.750 0.875 0.824 0.933 0.703 0.852 2.000 3.400
SVM 0.906 0.870 0.909 0.833 0.903 0.875 0.933 0.775 0.889 2.200 3.200
RandomForest 0.900 0.762 0.889 0.667 0.848 0.778 0.933 0.632 0.815 1.800 3.600
XGBoost 0.894 0.750 0.750 0.750 0.800 0.800 0.800 0.550 0.778 2.400 3.000
KNN 0.875 0.700 0.875 0.583 0.824 0.737 0.933 0.562 0.778 1.600 3.800
NaiveBayes 0.817 0.727 0.800 0.667 0.812 0.765 0.867 0.549 0.778 2.000 3.400
MLP 0.706 0.471 0.800 0.333 0.757 0.636 0.933 0.341 0.667 1.000 4.400
DecisionTree 0.567 0.421 0.571 0.333 0.686 0.600 0.800 0.151 0.593 1.400 4.000
--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
model TN_sum FP_sum FN_sum TP_sum
LogisticRegression 14 1 3 9
SVM 14 1 2 10
RandomForest 14 1 4 8
XGBoost 12 3 3 9
KNN 14 1 5 7
NaiveBayes 13 2 4 8
MLP 14 1 8 4
DecisionTree 12 3 8 4
多輪運算
單獨BAI_T1, BDI_T1
[資料] 來源=isi_raw_data_hrv_recalc_zscore 目標=3TP 列數=27 特徵數=2
[特徵] 使用欄位(前 15):['BDI_T1', 'BAI_T1']
[CV] Repeated Stratified 5-fold x 100, seed=43 | class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted
[Leakage check] Class balance
count percent%
3TP
0 15 55.6
1 12 44.4
[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。
=== Basic ML Benchmark (Repeated Stratified 5-fold x 100 CV) ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
NaiveBayes NaN 0.775 0.875 0.695 0.851 0.790 0.921 0.640 0.820 1.906 3.494
LogisticRegression NaN 0.762 0.865 0.681 0.843 0.782 0.915 0.620 0.811 1.890 3.510
SVM NaN 0.737 0.813 0.674 0.820 0.771 0.876 0.567 0.786 1.990 3.410
KNN NaN 0.725 0.792 0.668 0.809 0.764 0.859 0.542 0.774 2.026 3.374
XGBoost NaN 0.718 0.726 0.710 0.779 0.772 0.785 0.497 0.752 2.348 3.052
RandomForest NaN 0.701 0.732 0.672 0.778 0.754 0.803 0.481 0.745 2.204 3.196
DecisionTree NaN 0.698 0.713 0.683 0.767 0.755 0.780 0.466 0.737 2.300 3.100
MLP NaN 0.641 0.474 0.989 0.216 0.934 0.122 0.213 0.507 5.008 0.392
--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
model TN_sum FP_sum FN_sum TP_sum
NaiveBayes 1381 119 366 834
LogisticRegression 1372 128 383 817
SVM 1314 186 391 809
KNN 1289 211 398 802
XGBoost 1178 322 348 852
RandomForest 1205 295 393 807
DecisionTree 1170 330 380 820
MLP 183 1317 13 1187
加上HRV_RecAnx_RMSSD
[資料] 來源=isi_raw_data_hrv_recalc_zscore 目標=3TP 列數=27 特徵數=3
[特徵] 使用欄位(前 15):['BDI_T1', 'BAI_T1', 'HRV_RecAnx_RMSSD']
[CV] Repeated Stratified 5-fold x 100, seed=43 | class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted
[Leakage check] Class balance
count percent%
3TP
0 15 55.6
1 12 44.4
[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。
=== Basic ML Benchmark (Repeated Stratified 5-fold x 100 CV) ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
LogisticRegression NaN 0.871 0.909 0.836 0.904 0.877 0.933 0.778 0.890 2.206 3.194
SVM NaN 0.869 0.897 0.843 0.901 0.880 0.923 0.772 0.887 2.256 3.144
MLP NaN 0.865 0.762 1.000 0.858 1.000 0.751 0.757 0.861 3.148 2.252
NaiveBayes NaN 0.755 0.810 0.707 0.825 0.787 0.867 0.585 0.796 2.094 3.306
RandomForest NaN 0.753 0.846 0.678 0.835 0.778 0.901 0.601 0.802 1.924 3.476
KNN NaN 0.746 0.885 0.644 0.842 0.766 0.933 0.613 0.805 1.746 3.654
XGBoost NaN 0.727 0.749 0.706 0.792 0.775 0.811 0.520 0.764 2.262 3.138
DecisionTree NaN 0.689 0.722 0.658 0.770 0.745 0.797 0.461 0.736 2.188 3.212
--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
model TN_sum FP_sum FN_sum TP_sum
LogisticRegression 1400 100 197 1003
SVM 1384 116 188 1012
MLP 1126 374 0 1200
NaiveBayes 1301 199 352 848
RandomForest 1352 148 386 814
KNN 1400 100 427 773
XGBoost 1216 284 353 847
DecisionTree 1196 304 410 790
=== LOSO (Leave-One-Subject-Out) — WEIGHTED metrics ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
LogisticRegression 0.967 0.870 0.909 0.833 0.903 0.875 0.933 0.775 0.889 11.000 16.000
MLP 0.967 0.828 0.706 1.000 0.800 1.000 0.667 0.686 0.815 17.000 10.000
KNN 0.942 0.762 0.889 0.667 0.848 0.778 0.933 0.632 0.815 9.000 18.000
SVM 0.922 0.870 0.909 0.833 0.903 0.875 0.933 0.775 0.889 11.000 16.000
RandomForest 0.900 0.727 0.800 0.667 0.812 0.765 0.867 0.549 0.778 10.000 17.000
XGBoost 0.878 0.696 0.727 0.667 0.774 0.750 0.800 0.472 0.741 11.000 16.000
NaiveBayes 0.867 0.727 0.800 0.667 0.812 0.765 0.867 0.549 0.778 10.000 17.000
DecisionTree 0.733 0.696 0.727 0.667 0.774 0.750 0.800 0.472 0.741 11.000 16.000
=== CV vs LOSO 差異比較(高代表可能過擬合)===
model AUC_LOSO Accuracy_CV Accuracy_LOSO Acc_gap
LogisticRegression 0.967 0.890 0.889 0.001
SVM 0.922 0.887 0.889 -0.001
MLP 0.967 0.861 0.815 0.047
NaiveBayes 0.867 0.796 0.778 0.018
RandomForest 0.900 0.802 0.778 0.024
KNN 0.942 0.805 0.815 -0.010
XGBoost 0.878 0.764 0.741 0.023
DecisionTree 0.733 0.736 0.741 -0.005
ROC曲線
MLP(紅色)表現夭壽讚
SHAP
Logistic Regression
feature,mean_abs_shap
BAI_T1,1.031575879138239
BDI_T1,0.9923127783210869
HRV_RecAnx_RMSSD,0.4060877756469311

XGBoost
feature,mean_abs_shap
BDI_T1,1.579461
BAI_T1,1.3617431
HRV_RecAnx_RMSSD,0.82197875
