加入CDRISC, DBAS 計算

/Users/yuchi/PycharmProjects/PsyMl_ISI/.venv/bin/python /Users/yuchi/PycharmProjects/PsyMl_ISI/ML/ml_benchmark_modular.py 

[資料] 來源=isi_raw_data  目標=3TP  列數=51  特徵數=4
[特徵] 使用欄位（前 15）：['BDI_T1', 'BAI_T1', 'CD_RISC_T1', 'DBAS']
[CV] Stratified 5-fold, seed=42  |  class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted

[Leakage check] Class balance
     count  percent%
3TP                 
0       35      68.6
1       16      31.4

[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。

=== Basic ML Benchmark (Stratified 5-fold CV) ===
             model   AUC  F1_pos(=1)  Prec_pos  Rec_pos  F1_neg(=0)  Prec_neg  Rec_neg   MCC  Accuracy  Pred1_mean  Pred0_mean
        NaiveBayes 0.921       0.759     0.846    0.688       0.904     0.868    0.943 0.671     0.863       2.600       7.600
LogisticRegression 0.909       0.800     0.857    0.750       0.917     0.892    0.943 0.720     0.882       2.800       7.400
               KNN 0.904       0.759     0.846    0.688       0.904     0.868    0.943 0.671     0.863       2.600       7.600
               SVM 0.891       0.848     0.824    0.875       0.928     0.941    0.914 0.777     0.902       3.400       6.800
      RandomForest 0.860       0.710     0.733    0.688       0.873     0.861    0.886 0.584     0.824       3.000       7.200
               MLP 0.852       0.634     0.520    0.812       0.754     0.885    0.657 0.436     0.706       5.000       5.200
           XGBoost 0.791       0.710     0.733    0.688       0.873     0.861    0.886 0.584     0.824       3.000       7.200
      DecisionTree 0.664       0.533     0.571    0.500       0.806     0.784    0.829 0.342     0.725       2.800       7.400

--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
             model  TN_sum  FP_sum  FN_sum  TP_sum                                                                                                                                FP_FN_IDS
        NaiveBayes      33       2       5      11                                                                         [S112194, S112019 | S112002, S112008, S112036, S112169, S112183]
LogisticRegression      33       2       4      12                                                                                  [S112194, S112019 | S112002, S112008, S112169, S112183]
               KNN      33       2       5      11                                                                         [S112194, S112019 | S112002, S112008, S112036, S112169, S112183]
               SVM      32       3       2      14                                                                                           [S112194, S112019, S112104 | S112008, S112169]
      RandomForest      31       4       5      11                                                       [S112194, S112019, S112012, S112105 | S112008, S112115, S112036, S112169, S112183]
               MLP      23      12       3      13 [S112194, S112016, S112019, S112159, S112171, S112105, S112164, S112234, S112043, S112070, S112104, S112186 | S112008, S112086, S112115]
           XGBoost      31       4       5      11                                                       [S112194, S112019, S112012, S112105 | S112008, S112115, S112036, S112169, S112183]
      DecisionTree      29       6       8       8          [S112194, S112019, S112012, S112105, S112186, S112176 | S112002, S112008, S112115, S112039, S112036, S112169, S112183, S112029]

[SKIP] LOSO 已停用（RUN_LOSO=False）

/Users/yuchi/PycharmProjects/PsyMl_ISI/.venv/bin/python /Users/yuchi/PycharmProjects/PsyMl_ISI/ML/ml_benchmark_modular.py 

[資料] 來源=isi_raw_data  目標=3TP  列數=51  特徵數=3
[特徵] 使用欄位（前 15）：['BDI_T1', 'CD_RISC_T1', 'DBAS']
[CV] Stratified 5-fold, seed=42  |  class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted

[Leakage check] Class balance
     count  percent%
3TP                 
0       35      68.6
1       16      31.4

[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。

=== Basic ML Benchmark (Stratified 5-fold CV) ===
             model   AUC  F1_pos(=1)  Prec_pos  Rec_pos  F1_neg(=0)  Prec_neg  Rec_neg   MCC  Accuracy  Pred1_mean  Pred0_mean
               KNN 0.925       0.786     0.917    0.688       0.919     0.872    0.971 0.721     0.882       2.400       7.800
        NaiveBayes 0.914       0.800     0.857    0.750       0.917     0.892    0.943 0.720     0.882       2.800       7.400
LogisticRegression 0.912       0.765     0.722    0.812       0.882     0.909    0.857 0.650     0.843       3.600       6.600
               SVM 0.895       0.750     0.750    0.750       0.886     0.886    0.886 0.636     0.843       3.200       7.000
               MLP 0.868       0.316     1.000    0.188       0.843     0.729    1.000 0.370     0.745       0.600       9.600
      RandomForest 0.864       0.710     0.733    0.688       0.873     0.861    0.886 0.584     0.824       3.000       7.200
           XGBoost 0.796       0.688     0.688    0.688       0.857     0.857    0.857 0.545     0.804       3.200       7.000
      DecisionTree 0.693       0.571     0.667    0.500       0.838     0.795    0.886 0.422     0.765       2.400       7.800

--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
             model  TN_sum  FP_sum  FN_sum  TP_sum                                                                                                                 FP_FN_IDS
               KNN      34       1       5      11                                                                   [S112019 | S112002, S112008, S112169, S112183, S112029]
        NaiveBayes      33       2       4      12                                                                   [S112194, S112019 | S112002, S112008, S112169, S112183]
LogisticRegression      30       5       3      13                                                 [S112194, S112019, S112012, S112105, S112070 | S112002, S112008, S112183]
               SVM      31       4       4      12                                                 [S112194, S112019, S112105, S112104 | S112002, S112008, S112169, S112183]
               MLP      35       0      13       3 [- | S112002, S112003, S112074, S112192, S112008, S112039, S112055, S112087, S112036, S112169, S112183, S112029, S112075]
      RandomForest      31       4       5      11                                        [S112194, S112019, S112012, S112105 | S112003, S112008, S112115, S112169, S112029]
           XGBoost      30       5       5      11                               [S112194, S112019, S112012, S112105, S112119 | S112008, S112115, S112036, S112169, S112029]
      DecisionTree      31       4       8       8             [S112194, S112019, S112012, S112105 | S112002, S112008, S112115, S112039, S112036, S112169, S112183, S112029]

[SKIP] LOSO 已停用（RUN_LOSO=False）

[資料] 來源=isi_raw_data  目標=3TP  列數=51  特徵數=4
[特徵] 使用欄位（前 15）：['BDI_T1', 'CD_RISC_T1', 'DBAS', 'HRV_VLF']
[CV] Stratified 5-fold, seed=42  |  class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted

[Leakage check] Class balance
     count  percent%
3TP                 
0       35      68.6
1       16      31.4

[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。

=== Basic ML Benchmark (Stratified 5-fold CV) ===
             model   AUC  F1_pos(=1)  Prec_pos  Rec_pos  F1_neg(=0)  Prec_neg  Rec_neg   MCC  Accuracy  Pred1_mean  Pred0_mean
        NaiveBayes 0.920       0.800     0.857    0.750       0.917     0.892    0.943 0.720     0.882       2.800       7.400
               KNN 0.901       0.692     0.900    0.562       0.895     0.829    0.971 0.624     0.843       2.000       8.200
               SVM 0.896       0.788     0.765    0.812       0.899     0.912    0.886 0.687     0.863       3.400       6.800
LogisticRegression 0.891       0.812     0.812    0.812       0.914     0.914    0.914 0.727     0.882       3.200       7.000
      RandomForest 0.880       0.621     0.692    0.562       0.849     0.816    0.886 0.477     0.784       2.600       7.600
           XGBoost 0.812       0.667     0.714    0.625       0.861     0.838    0.886 0.531     0.804       2.800       7.400
      DecisionTree 0.633       0.483     0.538    0.438       0.795     0.763    0.829 0.283     0.706       2.600       7.600
               MLP 0.600       0.452     0.467    0.438       0.761     0.750    0.771 0.213     0.667       3.000       7.200

--- Aggregated Confusion Matrix Sums (across all folds' test parts) ---
             model  TN_sum  FP_sum  FN_sum  TP_sum                                                                                                                                                  FP_FN_IDS
        NaiveBayes      33       2       4      12                                                                                                    [S112194, S112019 | S112002, S112008, S112169, S112183]
               KNN      34       1       7       9                                                                                  [S112019 | S112002, S112003, S112008, S112086, S112169, S112183, S112029]
               SVM      31       4       3      13                                                                                           [S112016, S112019, S112158, S112104 | S112002, S112008, S112169]
LogisticRegression      32       3       3      13                                                                                                    [S112019, S112105, S112070 | S112002, S112008, S112169]
      RandomForest      31       4       7       9                                                       [S112194, S112019, S112012, S112105 | S112002, S112003, S112008, S112115, S112036, S112169, S112183]
           XGBoost      31       4       6      10                                                                [S112194, S112019, S112012, S112105 | S112002, S112003, S112008, S112115, S112036, S112169]
      DecisionTree      29       6       9       7                   [S112194, S112019, S112012, S112105, S112070, S112186 | S112002, S112003, S112008, S112115, S112039, S112036, S112169, S112183, S112029]
               MLP      27       8       9       7 [S112079, S112082, S112173, S112239, S112047, S112171, S112042, S112119 | S112002, S112003, S112192, S112008, S112036, S112169, S112183, S112023, S112029]

[SKIP] LOSO 已停用（RUN_LOSO=False）

Tip

⭐ AUC提升分析

   Baseline (BDI+BAI): 0.878

   加入HRV : 0.857 (-0.021, -2.4%) ❌ 反而下降

   加入CD-RISC+DBAS : 0.921 (+0.043, +4.9%) ✅ 顯著提升

   移除BAI : 0.925 (+0.047, +5.4%) ✅ 最佳效果

   CD-RISC+DBAS+HRV : 0.920 (+0.042, +4.8%) ✅ 略降但仍優於baseline

18 24 25

研究室筆記

Explorer

加入CDRISC, DBAS 計算

⭐ AUC提升分析