Note

期刊算法解釋:

Step 1:先把 ECG 分段,取每段最後 1 分鐘

每個刺激段落都切出一些 1 分鐘 epoch,然後每個 epoch 算一組 HRV 特徵(F)。


Step 2:對每個受試者 s,先收集「平靜狀態」下的特徵

他寫的這個集合:

意思是:

同一個人 s 在 calm 狀態下,有 n 個 1 分鐘時段,每個時段都會算出一個特徵值(比如 RMSSD)。所以你會得到 n 個 RMSSD 值。


Step 3:用 calm 的 n 個值算「這個人的基線平均」和「這個人的基線變動」

他那兩個式子就是:

  • 基線平均(個體內均值)
  • 基線標準差(個體內變異)

白話:

「這個人平靜時 RMSSD 平均是多少、平靜時 RMSSD 自己本來就會飄多少」


Step 4:把任何刺激/情緒 epoch 的特徵,轉成 z-score

白話:

「這個人在某個刺激(快樂/悲傷)的 RMSSD,比他自己 calm 平均高/低多少個『他自己的標準差』」


計算結果:

Lasso Ranking

  • HRV_RecAnx_RMSSD 有相關:
    • 失眠組在「焦慮恢復期」的 RMSSD 回彈較小(恢復能力較弱)
    • LASSO 會給負係數 👉 數值越小 → 越偏向失眠組(正類)。
    • RMSSD:相鄰兩次心跳間期差異的平均變動大小 👉 反應副交感神經
/Users/yuchi/PycharmProjects/PsyMl_ISI/.venv/bin/python /Users/yuchi/PycharmProjects/PsyMl_ISI/ML/tools/lasso_ranking/lasso_ranking.py 
[模式] 允許模式

[資料] 來源=isi_raw_data_hrv_recalc_zscore  目標=3TP  樣本=33  特徵=9
  已排除群組:['ACS', 'CPT', 'EEG', 'IGT', 'ISI', 'PSQI', 'WM']
  總缺值比例:18.18%

[CV結果](分數 = neg_log_loss,越大越好 → log_loss 越小)
  lambda_min:C = 0.240593  |  lambda = 4.15639
  lambda_1SE:C = 0.0316228  |  lambda = 31.6228

[選入變項(1SE)] 以 |係數| 排序(前 30)
  (無變項被選入;可放寬正則或檢查特徵)

[對照] lambda_min 非零變項數:1,lambda_1SE 非零變項數:0

[選入變項(lambda_min)] 以 |係數| 排序(前 30)
                      coef  abs_coef
HRV_RecAnx_RMSSD -0.009187  0.009187

[Top 10(路徑峰值)] 不綁定單一 C
    HRV_RecAnx_RMSSD
 HRV_ReturnAnx_RMSSD
    HRV_RecPos_RMSSD
 HRV_ReturnPos_RMSSD
HRV_ReturnPos_MeanNN
   HRV_DeltaAttn_SD1
HRV_DeltaAttn_MeanNN
 HRV_DeltaAttn_RMSSD
  HRV_DeltaAnx_RMSSD

統計算法:

  - HRV_RecAnx_RMSSD r = -0.282, mean(y=1) = -1.9006, mean(y=0) = 1.2529

  - HRV_ReturnAnx_RMSSD r = -0.249, mean(y=1) = -0.3475, mean(y=0) = 1.4273

  - HRV_ReturnPos_RMSSD r = -0.205, mean(y=1) = -1.0670, mean(y=0) = 1.1761

  - HRV_DeltaAttn_MeanNN r = 0.121

  - HRV_ReturnPos_MeanNN r = 0.105

Normal Ranking (單變項排序)

/Users/yuchi/PycharmProjects/PsyMl_ISI/.venv/bin/python /Users/yuchi/PycharmProjects/PsyMl_ISI/ML/tools/normal_ranking.py 

========================================================================
[3TP] 來源:isi_raw_data_hrv_recalc_zscore(已在 SQL where 過濾 3TP IS NOT NULL)
[概況] 列=33 欄=1026  總缺值=11390

[目標 '3TP'] 樣本數:33
  - 類別 0: 19 (57.6%)
  - 類別 1: 14 (42.4%)

------------------------------------------------------------------------
[3TP] 模式=允許 | 允許欄位數=9
  - 實際可用特徵數:9
  - 排序依據:info_gain | Top 9
                      info_gain  gain_ratio     SU  gini_decrease  anova_f    chi2
HRV_DeltaAttn_RMSSD       0.481       0.199  0.283          0.154    0.004  13.500
HRV_DeltaAttn_SD1         0.481       0.199  0.283          0.000    0.004  13.500
HRV_RecPos_RMSSD          0.476       0.183  0.265          0.153    0.054  13.500
HRV_ReturnPos_RMSSD       0.438       0.181  0.257          0.167    1.107  12.263
HRV_RecAnx_RMSSD          0.335       0.131  0.189          0.303    2.236   9.518
HRV_ReturnAnx_RMSSD       0.330       0.210  0.257          0.000    1.624   9.000
HRV_ReturnPos_MeanNN      0.268       0.095  0.140          0.000    0.296   7.762
HRV_DeltaAnx_RMSSD        0.158       0.086  0.112          0.222    0.276   5.439
HRV_DeltaAttn_MeanNN      0.136       0.059  0.082          0.000    0.353   4.310

进程已结束,退出代码为 0


ML結果

單獨BAI_T1, BDI_T1

[資料] 來源=isi_raw_data_hrv_recalc_zscore  目標=3TP  列數=27  特徵數=2
[特徵] 使用欄位(前 15):['BDI_T1', 'BAI_T1']
[CV] Stratified 5-fold, seed=43  |  class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted
 
[Leakage check] Class balance
     count  percent%
3TP                 
0       15      55.6
1       12      44.4
 
[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。
 
=== Basic ML Benchmark (Stratified 5-fold CV) ===
             model   AUC  F1_pos(=1)  Prec_pos  Rec_pos  F1_neg(=0)  Prec_neg  Rec_neg   MCC  Accuracy  Pred1_mean  Pred0_mean
LogisticRegression 0.911       0.762     0.889    0.667       0.848     0.778    0.933 0.632     0.815       1.800       3.600
      RandomForest 0.900       0.696     0.727    0.667       0.774     0.750    0.800 0.472     0.741       2.200       3.200
           XGBoost 0.897       0.727     0.800    0.667       0.812     0.765    0.867 0.549     0.778       2.000       3.400
        NaiveBayes 0.889       0.762     0.889    0.667       0.848     0.778    0.933 0.632     0.815       1.800       3.600
               KNN 0.881       0.783     0.818    0.750       0.839     0.812    0.867 0.624     0.815       2.200       3.200
               SVM 0.867       0.762     0.889    0.667       0.848     0.778    0.933 0.632     0.815       1.800       3.600
               MLP 0.767       0.632     0.462    1.000       0.125     1.000    0.067 0.175     0.481       5.200       0.200
      DecisionTree 0.725       0.667     0.778    0.583       0.788     0.722    0.867 0.474     0.741       1.800       3.600
 
--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
             model  TN_sum  FP_sum  FN_sum  TP_sum
LogisticRegression      14       1       4       8
      RandomForest      12       3       4       8
           XGBoost      13       2       4       8
        NaiveBayes      14       1       4       8
               KNN      13       2       3       9
               SVM      14       1       4       8
               MLP       1      14       0      12
      DecisionTree      13       2       5       7
 

加上HRV_RecAnx_RMSSD


[資料] 來源=isi_raw_data_hrv_recalc_zscore  目標=3TP  列數=27  特徵數=3
[特徵] 使用欄位(前 15):['BDI_T1', 'BAI_T1', 'HRV_RecAnx_RMSSD']
[CV] Stratified 5-fold, seed=43  |  class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted

[Leakage check] Class balance
     count  percent%
3TP                 
0       15      55.6
1       12      44.4

[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。

=== Basic ML Benchmark (Stratified 5-fold CV) ===
             model   AUC  F1_pos(=1)  Prec_pos  Rec_pos  F1_neg(=0)  Prec_neg  Rec_neg   MCC  Accuracy  Pred1_mean  Pred0_mean
               MLP 0.983       0.857     0.750    1.000       0.846     1.000    0.733 0.742     0.852       3.200       2.200
LogisticRegression 0.961       0.870     0.909    0.833       0.903     0.875    0.933 0.775     0.889       2.200       3.200
               KNN 0.925       0.762     0.889    0.667       0.848     0.778    0.933 0.632     0.815       1.800       3.600
               SVM 0.917       0.870     0.909    0.833       0.903     0.875    0.933 0.775     0.889       2.200       3.200
      RandomForest 0.917       0.818     0.900    0.750       0.875     0.824    0.933 0.703     0.852       2.000       3.400
           XGBoost 0.894       0.750     0.750    0.750       0.800     0.800    0.800 0.550     0.778       2.400       3.000
        NaiveBayes 0.861       0.727     0.800    0.667       0.812     0.765    0.867 0.549     0.778       2.000       3.400
      DecisionTree 0.650       0.571     0.667    0.500       0.727     0.667    0.800 0.316     0.667       1.800       3.600

--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
             model  TN_sum  FP_sum  FN_sum  TP_sum
               MLP      11       4       0      12
LogisticRegression      14       1       2      10
               KNN      14       1       4       8
               SVM      14       1       2      10
      RandomForest      14       1       3       9
           XGBoost      12       3       3       9
        NaiveBayes      13       2       4       8
      DecisionTree      12       3       6       6

加上HRV_ReturnAnx_RMSSD, HRV_RecAnx_RMSSD


[資料] 來源=isi_raw_data_hrv_recalc_zscore  目標=3TP  列數=27  特徵數=4
[特徵] 使用欄位(前 15):['BDI_T1', 'BAI_T1', 'HRV_RecAnx_RMSSD', 'HRV_ReturnAnx_RMSSD']
[CV] Stratified 5-fold, seed=43  |  class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted

[Leakage check] Class balance
     count  percent%
3TP                 
0       15      55.6
1       12      44.4

[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。

=== Basic ML Benchmark (Stratified 5-fold CV) ===
             model   AUC  F1_pos(=1)  Prec_pos  Rec_pos  F1_neg(=0)  Prec_neg  Rec_neg   MCC  Accuracy  Pred1_mean  Pred0_mean
LogisticRegression 0.956       0.818     0.900    0.750       0.875     0.824    0.933 0.703     0.852       2.000       3.400
               SVM 0.906       0.870     0.909    0.833       0.903     0.875    0.933 0.775     0.889       2.200       3.200
      RandomForest 0.900       0.762     0.889    0.667       0.848     0.778    0.933 0.632     0.815       1.800       3.600
           XGBoost 0.894       0.750     0.750    0.750       0.800     0.800    0.800 0.550     0.778       2.400       3.000
               KNN 0.875       0.700     0.875    0.583       0.824     0.737    0.933 0.562     0.778       1.600       3.800
        NaiveBayes 0.817       0.727     0.800    0.667       0.812     0.765    0.867 0.549     0.778       2.000       3.400
               MLP 0.706       0.471     0.800    0.333       0.757     0.636    0.933 0.341     0.667       1.000       4.400
      DecisionTree 0.567       0.421     0.571    0.333       0.686     0.600    0.800 0.151     0.593       1.400       4.000

--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
             model  TN_sum  FP_sum  FN_sum  TP_sum
LogisticRegression      14       1       3       9
               SVM      14       1       2      10
      RandomForest      14       1       4       8
           XGBoost      12       3       3       9
               KNN      14       1       5       7
        NaiveBayes      13       2       4       8
               MLP      14       1       8       4
      DecisionTree      12       3       8       4


多輪運算

單獨BAI_T1, BDI_T1

[資料] 來源=isi_raw_data_hrv_recalc_zscore  目標=3TP  列數=27  特徵數=2
[特徵] 使用欄位(前 15):['BDI_T1', 'BAI_T1']
[CV] Repeated Stratified 5-fold x 100, seed=43  |  class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted

[Leakage check] Class balance
     count  percent%
3TP                 
0       15      55.6
1       12      44.4

[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。

=== Basic ML Benchmark (Repeated Stratified 5-fold x 100 CV) ===
             model  AUC  F1_pos(=1)  Prec_pos  Rec_pos  F1_neg(=0)  Prec_neg  Rec_neg   MCC  Accuracy  Pred1_mean  Pred0_mean
        NaiveBayes  NaN       0.775     0.875    0.695       0.851     0.790    0.921 0.640     0.820       1.906       3.494
LogisticRegression  NaN       0.762     0.865    0.681       0.843     0.782    0.915 0.620     0.811       1.890       3.510
               SVM  NaN       0.737     0.813    0.674       0.820     0.771    0.876 0.567     0.786       1.990       3.410
               KNN  NaN       0.725     0.792    0.668       0.809     0.764    0.859 0.542     0.774       2.026       3.374
           XGBoost  NaN       0.718     0.726    0.710       0.779     0.772    0.785 0.497     0.752       2.348       3.052
      RandomForest  NaN       0.701     0.732    0.672       0.778     0.754    0.803 0.481     0.745       2.204       3.196
      DecisionTree  NaN       0.698     0.713    0.683       0.767     0.755    0.780 0.466     0.737       2.300       3.100
               MLP  NaN       0.641     0.474    0.989       0.216     0.934    0.122 0.213     0.507       5.008       0.392

--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
             model  TN_sum  FP_sum  FN_sum  TP_sum
        NaiveBayes    1381     119     366     834
LogisticRegression    1372     128     383     817
               SVM    1314     186     391     809
               KNN    1289     211     398     802
           XGBoost    1178     322     348     852
      RandomForest    1205     295     393     807
      DecisionTree    1170     330     380     820
               MLP     183    1317      13    1187

加上HRV_RecAnx_RMSSD


[資料] 來源=isi_raw_data_hrv_recalc_zscore  目標=3TP  列數=27  特徵數=3
[特徵] 使用欄位(前 15):['BDI_T1', 'BAI_T1', 'HRV_RecAnx_RMSSD']
[CV] Repeated Stratified 5-fold x 100, seed=43  |  class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted

[Leakage check] Class balance
     count  percent%
3TP                 
0       15      55.6
1       12      44.4

[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。

=== Basic ML Benchmark (Repeated Stratified 5-fold x 100 CV) ===
             model  AUC  F1_pos(=1)  Prec_pos  Rec_pos  F1_neg(=0)  Prec_neg  Rec_neg   MCC  Accuracy  Pred1_mean  Pred0_mean
LogisticRegression  NaN       0.871     0.909    0.836       0.904     0.877    0.933 0.778     0.890       2.206       3.194
               SVM  NaN       0.869     0.897    0.843       0.901     0.880    0.923 0.772     0.887       2.256       3.144
               MLP  NaN       0.865     0.762    1.000       0.858     1.000    0.751 0.757     0.861       3.148       2.252
        NaiveBayes  NaN       0.755     0.810    0.707       0.825     0.787    0.867 0.585     0.796       2.094       3.306
      RandomForest  NaN       0.753     0.846    0.678       0.835     0.778    0.901 0.601     0.802       1.924       3.476
               KNN  NaN       0.746     0.885    0.644       0.842     0.766    0.933 0.613     0.805       1.746       3.654
           XGBoost  NaN       0.727     0.749    0.706       0.792     0.775    0.811 0.520     0.764       2.262       3.138
      DecisionTree  NaN       0.689     0.722    0.658       0.770     0.745    0.797 0.461     0.736       2.188       3.212

--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
             model  TN_sum  FP_sum  FN_sum  TP_sum
LogisticRegression    1400     100     197    1003
               SVM    1384     116     188    1012
               MLP    1126     374       0    1200
        NaiveBayes    1301     199     352     848
      RandomForest    1352     148     386     814
               KNN    1400     100     427     773
           XGBoost    1216     284     353     847
      DecisionTree    1196     304     410     790

=== LOSO (Leave-One-Subject-Out) — WEIGHTED metrics ===
             model   AUC  F1_pos(=1)  Prec_pos  Rec_pos  F1_neg(=0)  Prec_neg  Rec_neg   MCC  Accuracy  Pred1_mean  Pred0_mean
LogisticRegression 0.967       0.870     0.909    0.833       0.903     0.875    0.933 0.775     0.889      11.000      16.000
               MLP 0.967       0.828     0.706    1.000       0.800     1.000    0.667 0.686     0.815      17.000      10.000
               KNN 0.942       0.762     0.889    0.667       0.848     0.778    0.933 0.632     0.815       9.000      18.000
               SVM 0.922       0.870     0.909    0.833       0.903     0.875    0.933 0.775     0.889      11.000      16.000
      RandomForest 0.900       0.727     0.800    0.667       0.812     0.765    0.867 0.549     0.778      10.000      17.000
           XGBoost 0.878       0.696     0.727    0.667       0.774     0.750    0.800 0.472     0.741      11.000      16.000
        NaiveBayes 0.867       0.727     0.800    0.667       0.812     0.765    0.867 0.549     0.778      10.000      17.000
      DecisionTree 0.733       0.696     0.727    0.667       0.774     0.750    0.800 0.472     0.741      11.000      16.000

=== CV vs LOSO 差異比較(高代表可能過擬合)===
             model  AUC_LOSO  Accuracy_CV  Accuracy_LOSO  Acc_gap
LogisticRegression     0.967        0.890          0.889    0.001
               SVM     0.922        0.887          0.889   -0.001
               MLP     0.967        0.861          0.815    0.047
        NaiveBayes     0.867        0.796          0.778    0.018
      RandomForest     0.900        0.802          0.778    0.024
               KNN     0.942        0.805          0.815   -0.010
           XGBoost     0.878        0.764          0.741    0.023
      DecisionTree     0.733        0.736          0.741   -0.005

ROC曲線

  • MLP(紅色)表現夭壽讚

SHAP

Logistic Regression

feature,mean_abs_shap  
BAI_T1,1.031575879138239  
BDI_T1,0.9923127783210869  
HRV_RecAnx_RMSSD,0.4060877756469311

XGBoost



feature,mean_abs_shap  
BDI_T1,1.579461  
BAI_T1,1.3617431  
HRV_RecAnx_RMSSD,0.82197875