BDI_T1, BAI_T1
/Users/yuchi/PycharmProjects/PsyMl_ISI/.venv/bin/python /Users/yuchi/PycharmProjects/PsyMl_ISI/ML/ml_benchmark_modular.py
[Filter] ID exclusions: 8 rows removed.
============================================================
[BASE] 來源=isi_raw_data_recalc 目標=3TP 列數=54 特徵數=2
[BASE] 使用欄位:['BDI_T1', 'BAI_T1']
[CV] Stratified 10-fold, seed=42 | class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted
[前處理] HRV Log轉換=False | 異常值處理=iqr
[Leakage check] Class balance
count percent%
3TP
0 35 64.8
1 19 35.2
[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。
=== [BASE] ML Benchmark (Stratified 10-fold CV) ===
model AUC AUC_overall F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
NaiveBayes 0.913 0.913 0.743 0.812 0.684 0.877 0.842 0.914 0.626 0.833 1.600 3.800
LogisticRegression 0.905 0.905 0.757 0.778 0.737 0.873 0.861 0.886 0.631 0.833 1.800 3.600
XGBoost 0.868 0.868 0.718 0.700 0.737 0.841 0.853 0.829 0.559 0.796 2.000 3.400
SVM 0.863 0.863 0.762 0.696 0.842 0.848 0.903 0.800 0.620 0.815 2.300 3.100
MLP 0.860 0.860 0.571 0.625 0.526 0.795 0.763 0.829 0.371 0.722 1.600 3.800
KNN 0.838 0.838 0.632 0.632 0.632 0.800 0.800 0.800 0.432 0.741 1.900 3.500
RandomForest 0.835 0.835 0.571 0.625 0.526 0.795 0.763 0.829 0.371 0.722 1.600 3.800
DecisionTree 0.651 0.651 0.529 0.600 0.474 0.784 0.744 0.829 0.322 0.704 1.500 3.900
--- [BASE] Aggregated Confusion Matrix ---
model TN_sum FP_sum FN_sum TP_sum FP_FN_IDS
NaiveBayes 32 3 6 13 [S112271, S112019, S112240 | S112002, S112169, S112222, S112036, S112183, S112257]
LogisticRegression 31 4 5 14 [S112271, S112201, S112019, S112240 | S112169, S112222, S112036, S112183, S112257]
XGBoost 29 6 5 14 [S112012, S112271, S112159, S112042, S112019, S112240 | S112029, S112222, S112036, S112183, S112039]
SVM 28 7 3 16 [S112271, S112159, S112119, S112184, S112201, S112019, S112240 | S112169, S112036, S112183]
MLP 29 6 9 10 [S112012, S112271, S112159, S112184, S112019, S112240 | S112029, S112169, S112003, S112209, S112036, S112183, S112023, S112086, S112039]
KNN 28 7 7 12 [S112012, S112271, S112184, S112042, S112201, S112019, S112240 | S112002, S112169, S112222, S112036, S112183, S112257, S112039]
RandomForest 29 6 9 10 [S112012, S112271, S112159, S112042, S112019, S112240 | S112002, S112029, S112169, S112222, S112036, S112183, S112023, S112086, S112039]
DecisionTree 29 6 10 9 [S112012, S112271, S112159, S112042, S112019, S112240 | S112029, S112055, S112169, S112003, S112222, S112036, S112183, S112023, S112086, S112039]
[Filter] ID exclusions: 8 rows removed.
BDI_T1, BAI_T1, EEG_PWR_REL_BETA1_BRAIN_AVG
============================================================
[BASE + ADDED] 來源=isi_raw_data_recalc 目標=3TP 列數=54 特徵數=3
[BASE + ADDED] 使用欄位:['BDI_T1', 'BAI_T1', 'EEG_PWR_REL_BETA1_BRAIN_AVG']
[CV] Stratified 10-fold, seed=42 | class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted
[前處理] HRV Log轉換=False | 異常值處理=iqr
[Leakage check] Class balance
count percent%
3TP
0 35 64.800
1 19 35.200
[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。
=== [BASE + ADDED] ML Benchmark (Stratified 10-fold CV) ===
model AUC AUC_overall F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
NaiveBayes 0.931 0.931 0.743 0.812 0.684 0.877 0.842 0.914 0.626 0.833 1.600 3.800
SVM 0.923 0.923 0.829 0.773 0.895 0.896 0.938 0.857 0.731 0.870 2.200 3.200
MLP 0.910 0.910 0.789 0.789 0.789 0.886 0.886 0.886 0.675 0.852 1.900 3.500
LogisticRegression 0.890 0.890 0.703 0.722 0.684 0.845 0.833 0.857 0.548 0.796 1.800 3.600
KNN 0.883 0.883 0.789 0.789 0.789 0.886 0.886 0.886 0.675 0.852 1.900 3.500
RandomForest 0.862 0.862 0.757 0.778 0.737 0.873 0.861 0.886 0.631 0.833 1.800 3.600
XGBoost 0.838 0.838 0.700 0.667 0.737 0.824 0.848 0.800 0.526 0.778 2.100 3.300
DecisionTree 0.742 0.742 0.667 0.650 0.684 0.812 0.824 0.800 0.479 0.759 2.000 3.400
--- [BASE + ADDED] Aggregated Confusion Matrix ---
model TN_sum FP_sum FN_sum TP_sum FP_FN_IDS
NaiveBayes 32 3 6 13 [S112271, S112019, S112240 | S112002, S112169, S112222, S112036, S112183, S112257]
SVM 30 5 2 17 [S112271, S112159, S112119, S112042, S112019 | S112002, S112183]
MLP 31 4 4 15 [S112271, S112159, S112019, S112240 | S112002, S112036, S112183, S112214]
LogisticRegression 30 5 6 13 [S112271, S112119, S112201, S112019, S112240 | S112002, S112169, S112222, S112036, S112183, S112257]
KNN 31 4 4 15 [S112271, S112159, S112019, S112240 | S112002, S112029, S112036, S112183]
RandomForest 31 4 5 14 [S112271, S112159, S112019, S112240 | S112002, S112029, S112222, S112036, S112183]
XGBoost 28 7 5 14 [S112012, S112271, S112159, S112184, S112201, S112019, S112240 | S112002, S112029, S112222, S112036, S112183]
DecisionTree 28 7 6 13 [S112271, S112159, S112119, S112070, S112042, S112019, S112240 | S112002, S112029, S112209, S112222, S112036, S112183]
BDI_T1, BAI_T1, EEG_PWR_REL_BETA1_BRAIN_AVG, EEG_PWR_REL_BETA1_BRAIN_AVG²
============================================================
[BASE + ADDED + ADDED²] 來源=isi_raw_data_recalc 目標=3TP 列數=54 特徵數=3
[BASE + ADDED + ADDED²] 使用欄位:['BDI_T1', 'BAI_T1', 'EEG_PWR_REL_BETA1_BRAIN_AVG']
[BASE + ADDED + ADDED²] 多項式特徵(Pipeline 內產生):['EEG_PWR_REL_BETA1_BRAIN_AVG²']
[CV] Stratified 10-fold, seed=42 | class_weight=balanced
[聚合] K-fold = weighted | LOSO = weighted
[前處理] HRV Log轉換=False | 異常值處理=iqr
[Leakage check] Class balance
count percent%
3TP
0 35 64.800
1 19 35.200
[Leakage check] 未發現與目標 |r| ≥ 0.95 的欄位。
=== [BASE + ADDED + ADDED²] ML Benchmark (Stratified 10-fold CV) ===
model AUC AUC_overall F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
NaiveBayes 0.944 0.944 0.800 0.875 0.737 0.904 0.868 0.943 0.711 0.870 1.600 3.800
LogisticRegression 0.932 0.932 0.757 0.778 0.737 0.873 0.861 0.886 0.631 0.833 1.800 3.600
MLP 0.926 0.926 0.737 0.737 0.737 0.857 0.857 0.857 0.594 0.815 1.900 3.500
SVM 0.917 0.917 0.810 0.739 0.895 0.879 0.935 0.829 0.699 0.852 2.300 3.100
KNN 0.903 0.903 0.811 0.833 0.789 0.901 0.889 0.914 0.713 0.870 1.800 3.600
RandomForest 0.891 0.891 0.757 0.778 0.737 0.873 0.861 0.886 0.631 0.833 1.800 3.600
XGBoost 0.854 0.854 0.737 0.737 0.737 0.857 0.857 0.857 0.594 0.815 1.900 3.500
DecisionTree 0.771 0.771 0.703 0.722 0.684 0.845 0.833 0.857 0.548 0.796 1.800 3.600
--- [BASE + ADDED + ADDED²] Aggregated Confusion Matrix ---
model TN_sum FP_sum FN_sum TP_sum FP_FN_IDS
NaiveBayes 33 2 5 14 [S112271, S112019 | S112002, S112169, S112036, S112183, S112257]
LogisticRegression 31 4 5 14 [S112271, S112159, S112019, S112240 | S112002, S112169, S112222, S112036, S112183]
MLP 30 5 5 14 [S112271, S112159, S112215, S112073, S112019 | S112002, S112036, S112183, S112086, S112258]
SVM 29 6 2 17 [S112271, S112159, S112119, S112042, S112073, S112019 | S112002, S112183]
KNN 32 3 4 15 [S112271, S112159, S112019 | S112002, S112029, S112036, S112183]
RandomForest 31 4 5 14 [S112271, S112159, S112019, S112240 | S112002, S112029, S112222, S112036, S112183]
XGBoost 30 5 5 14 [S112012, S112271, S112159, S112019, S112240 | S112002, S112029, S112222, S112036, S112183]
DecisionTree 30 5 6 13 [S112271, S112070, S112042, S112019, S112240 | S112002, S112029, S112055, S112222, S112036, S112183]

- Model A(藍):
BAI_T1+BDI_T1+beta1
- Model B(紅):
BAI_T1+BDI_T1+beta1+beta1² 👉 這次表現最好的模型
- Model C(橘):
BAI_T1+BDI_T1+beta1+BAI × beta1+BDI × beta1 👉 確認是否BAI&BDI跟beta1存在交互關係(線性)
- Model D(紫):
BAI_T1+BDI_T1+beta1+BAI_T1 × beta1+BDI_T1 × beta1+beta1² 👉 全部加在一起(考慮最多,但容易overfitting)

- 左圖:
- Model A 失眠機率(x軸)對比Model B失眠機率(Y軸)
- 對角線上方:模型傾向把樣本預測成失眠
- 對角線下方:模型傾向把樣本預測成非失眠
- 右圖:
- 加上
beta1²後,兩組的機率走向
- 大於 0:加平方後分數變高(越像失眠)
- 小於 0:加平方後分數變低(越像健康)

- 把樣本依照z-score分數分成各個區間:
- 黑線:每個區間裡,真實失眠比例是多少
- 藍線:Model A 在那個區間平均預測多少
- 紅線:Model B 在每個區間平均預測多少 👉 離標準差較遠的地方,預測會比model A還要準

- 樣本在各個標準差範圍當中,加上
beta1²後失眠機率的變化
- 失眠:應該往0.0上面移動
- 非失眠:應該往0.0下面移動