原本論文架構
快速比較(重點)
固定條件:目標=3TP、樣本=51、Repeated Stratified 5-fold x 100 CV、seed=42
| 特徵組合 | SMOTE | class_weight | AUC 最佳 | F1_pos 最佳 | Rec_pos 最佳 |
|---|---|---|---|---|---|
| BDI/BAI | OFF | OFF | LogisticRegression 0.904 | NaiveBayes 0.743 | NaiveBayes 0.702 |
| BDI/BAI | ON | Balanced | LogisticRegression 0.904 | NaiveBayes 0.735 | KNN 0.779 |
| BDI/BAI + HRV | OFF | OFF | NaiveBayes 0.885 | NaiveBayes 0.748 | NaiveBayes 0.745 |
| BDI/BAI + HRV | ON | Balanced | NaiveBayes 0.879 | NaiveBayes 0.721 | NaiveBayes 0.748 |
備註:SMOTE 開啟時,正類召回通常上升,但 AUC 可能下降。
詳細輸出(保留原始紀錄)
BAI_T1, BDI_T1
5 fold + 100CV
- smote = False
- class_weight = False
[資料] 目標=3TP 樣本=51 特徵=2
使用欄位:['BDI_T1', 'BAI_T1']
[SMOTE] OFF (mode=Standard, k_neighbors=5, class_weight_when_smote=True)
[CLASS_WEIGHT] mode=Off
[目標分佈]
- 類別 0: 35 (68.6%)
- 類別 1: 16 (31.4%)
[偵測] 目標型態:binary;classes=[0 1]
=== Basic ML Benchmark (Standard | Repeated Stratified 5-fold x 100 CV) | seed=42 | SMOTE=OFF | class_weight=OFF ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
LogisticRegression 0.904 0.691 0.841 0.631 0.894 0.858 0.943 0.630 0.845 2.418 7.782
NaiveBayes 0.894 0.743 0.856 0.702 0.901 0.881 0.932 0.680 0.860 2.718 7.482
SVM 0.884 0.660 0.818 0.595 0.883 0.844 0.936 0.590 0.829 2.350 7.850
KNN 0.851 0.626 0.765 0.575 0.866 0.834 0.913 0.538 0.807 2.446 7.754
RandomForest 0.845 0.620 0.700 0.616 0.838 0.839 0.853 0.499 0.779 2.994 7.206
DecisionTree 0.713 0.601 0.669 0.602 0.831 0.833 0.845 0.471 0.768 3.006 7.194
--- Aggregated Confusion Matrix Sums (Standard) ---
model TN_sum FP_sum FN_sum TP_sum
LogisticRegression 3300 200 591 1009
NaiveBayes 3263 237 478 1122
SVM 3275 225 650 950
KNN 3195 305 682 918
RandomForest 2987 513 616 984
DecisionTree 2957 543 640 960
--- Specificity / Sensitivity (Standard) ---
model Specificity Sensitivity
LogisticRegression 0.943 0.631
NaiveBayes 0.932 0.701
SVM 0.936 0.594
KNN 0.913 0.574
RandomForest 0.853 0.615
DecisionTree 0.845 0.600
5 fold + 100CV
- smote = False
- class_weight = Balance
[資料] 目標=3TP 樣本=51 特徵=2
使用欄位:['BDI_T1', 'BAI_T1']
[SMOTE] OFF (mode=Standard, k_neighbors=5, class_weight_when_smote=True)
[CLASS_WEIGHT] mode=Balanced
[目標分佈]
- 類別 0: 35 (68.6%)
- 類別 1: 16 (31.4%)
[偵測] 目標型態:binary;classes=[0 1]
=== Basic ML Benchmark (Standard | Repeated Stratified 5-fold x 100 CV) | seed=42 | SMOTE=OFF | class_weight=OFF ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
LogisticRegression 0.904 0.691 0.841 0.631 0.894 0.858 0.943 0.630 0.845 2.418 7.782
NaiveBayes 0.894 0.743 0.856 0.702 0.901 0.881 0.932 0.680 0.860 2.718 7.482
SVM 0.884 0.660 0.818 0.595 0.883 0.844 0.936 0.590 0.829 2.350 7.850
KNN 0.851 0.626 0.765 0.575 0.866 0.834 0.913 0.538 0.807 2.446 7.754
RandomForest 0.845 0.620 0.700 0.616 0.838 0.839 0.853 0.499 0.779 2.994 7.206
DecisionTree 0.713 0.601 0.669 0.602 0.831 0.833 0.845 0.471 0.768 3.006 7.194
--- Aggregated Confusion Matrix Sums (Standard) ---
model TN_sum FP_sum FN_sum TP_sum
LogisticRegression 3300 200 591 1009
NaiveBayes 3263 237 478 1122
SVM 3275 225 650 950
KNN 3195 305 682 918
RandomForest 2987 513 616 984
DecisionTree 2957 543 640 960
--- Specificity / Sensitivity (Standard) ---
model Specificity Sensitivity
LogisticRegression 0.943 0.631
NaiveBayes 0.932 0.701
SVM 0.936 0.594
KNN 0.913 0.574
RandomForest 0.853 0.615
DecisionTree 0.845 0.600
5 fold + 100CV
- smote=True 👉 smote會把邊緣模糊化(數字會比較差)
- class_weight = Balanced
[資料] 目標=3TP 樣本=51 特徵=2
使用欄位:['BDI_T1', 'BAI_T1']
[SMOTE] ON (mode=Standard, k_neighbors=5, class_weight_when_smote=True)
[CLASS_WEIGHT] mode=Balanced
[目標分佈]
- 類別 0: 35 (68.6%)
- 類別 1: 16 (31.4%)
[偵測] 目標型態:binary;classes=[0 1]
=== Basic ML Benchmark (Standard | Repeated Stratified 5-fold x 100 CV) | seed=42 | SMOTE=ON | class_weight=OFF ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
LogisticRegression 0.904 0.678 0.712 0.700 0.854 0.872 0.851 0.564 0.804 3.278 6.922
NaiveBayes 0.895 0.735 0.807 0.726 0.886 0.887 0.898 0.655 0.845 3.036 7.164
SVM 0.855 0.682 0.667 0.757 0.838 0.894 0.809 0.561 0.791 3.752 6.448
KNN 0.852 0.696 0.670 0.779 0.842 0.902 0.809 0.580 0.798 3.818 6.382
RandomForest 0.845 0.638 0.638 0.701 0.817 0.865 0.794 0.496 0.764 3.674 6.526
DecisionTree 0.703 0.589 0.623 0.619 0.810 0.834 0.807 0.439 0.748 3.320 6.880
--- Aggregated Confusion Matrix Sums (Standard) ---
model TN_sum FP_sum FN_sum TP_sum
LogisticRegression 2980 520 481 1119
NaiveBayes 3144 356 438 1162
SVM 2830 670 394 1206
KNN 2831 669 360 1240
RandomForest 2778 722 485 1115
DecisionTree 2826 674 614 986
--- Specificity / Sensitivity (Standard) ---
model Specificity Sensitivity
LogisticRegression 0.851 0.699
NaiveBayes 0.898 0.726
SVM 0.809 0.754
KNN 0.809 0.775
RandomForest 0.794 0.697
DecisionTree 0.807 0.616
BDI_T1, BAI_T1, HRV_SDNN_MS, HRV_LF, HRV_LF_HF
5 fold + 100CV
- smote=None 👉 smote會把邊緣模糊化
- class_weight = Balance
- AUC明顯下降
[資料] 目標=3TP 樣本=51 特徵=5
使用欄位:['BDI_T1', 'BAI_T1', 'HRV_SDNN_MS', 'HRV_LF', 'HRV_LF_HF']
[SMOTE] OFF (mode=Standard, k_neighbors=5, class_weight_when_smote=False)
[目標分佈]
- 類別 0: 35 (68.6%)
- 類別 1: 16 (31.4%)
[偵測] 目標型態:binary;classes=[0 1]
=== Basic ML Benchmark (Standard | Repeated Stratified 5-fold x 100 CV) | seed=42 | SMOTE=OFF | class_weight=OFF ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
NaiveBayes 0.885 0.748 0.808 0.745 0.889 0.893 0.898 0.668 0.850 3.096 7.104
LogisticRegression 0.876 0.659 0.769 0.629 0.873 0.853 0.906 0.574 0.819 2.670 7.530
SVM 0.839 0.640 0.761 0.601 0.866 0.842 0.905 0.549 0.809 2.586 7.614
RandomForest 0.822 0.639 0.745 0.616 0.857 0.845 0.884 0.538 0.800 2.780 7.420
KNN 0.790 0.578 0.767 0.503 0.863 0.812 0.932 0.501 0.797 2.082 8.118
DecisionTree 0.681 0.552 0.594 0.570 0.790 0.807 0.792 0.379 0.722 3.274 6.926
--- Aggregated Confusion Matrix Sums (Standard) ---
model TN_sum FP_sum FN_sum TP_sum
NaiveBayes 3143 357 409 1191
LogisticRegression 3170 330 595 1005
SVM 3166 334 641 959
RandomForest 3094 406 616 984
KNN 3261 239 798 802
DecisionTree 2771 729 692 908
--- Specificity / Sensitivity (Standard) ---
model Specificity Sensitivity
NaiveBayes 0.898 0.744
LogisticRegression 0.906 0.628
SVM 0.905 0.599
RandomForest 0.884 0.615
KNN 0.932 0.501
DecisionTree 0.792 0.568
5 fold + 100CV
- smote=True 👉 smote會把邊緣模糊化(數字會比較差)
- class_weight = Balanced
[資料] 目標=3TP 樣本=51 特徵=5
使用欄位:['BDI_T1', 'BAI_T1', 'HRV_SDNN_MS', 'HRV_LF', 'HRV_LF_HF']
[SMOTE] ON (mode=Standard, k_neighbors=5, class_weight_when_smote=True)
[CLASS_WEIGHT] mode=Balanced
[目標分佈]
- 類別 0: 35 (68.6%)
- 類別 1: 16 (31.4%)
[偵測] 目標型態:binary;classes=[0 1]
=== Basic ML Benchmark (Standard | Repeated Stratified 5-fold x 100 CV) | seed=42 | SMOTE=ON | class_weight=OFF ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
NaiveBayes 0.879 0.721 0.746 0.748 0.865 0.890 0.857 0.617 0.823 3.392 6.808
LogisticRegression 0.869 0.691 0.716 0.726 0.856 0.884 0.849 0.585 0.810 3.378 6.822
SVM 0.836 0.637 0.653 0.681 0.823 0.859 0.810 0.499 0.769 3.504 6.696
RandomForest 0.828 0.643 0.674 0.673 0.835 0.860 0.830 0.516 0.780 3.340 6.860
KNN 0.823 0.651 0.661 0.700 0.829 0.867 0.812 0.517 0.776 3.554 6.646
DecisionTree 0.673 0.542 0.565 0.582 0.774 0.809 0.764 0.357 0.706 3.504 6.696
--- Aggregated Confusion Matrix Sums (Standard) ---
model TN_sum FP_sum FN_sum TP_sum
NaiveBayes 3000 500 404 1196
LogisticRegression 2970 530 441 1159
SVM 2836 664 512 1088
RandomForest 2905 595 525 1075
KNN 2841 659 482 1118
DecisionTree 2674 826 674 926
--- Specificity / Sensitivity (Standard) ---
model Specificity Sensitivity
NaiveBayes 0.857 0.748
LogisticRegression 0.849 0.724
SVM 0.810 0.680
RandomForest 0.830 0.672
KNN 0.812 0.699
DecisionTree 0.764 0.579
==============================================================================================================
5 fold + 100CV
- smote=True 👉 smote會把邊緣模糊化(數字會比較差)
- class_weight = OFF
[資料] 目標=3TP 樣本=51 特徵=5
使用欄位:['BDI_T1', 'BAI_T1', 'HRV_SDNN_MS', 'HRV_LF', 'HRV_LF_HF']
[SMOTE] OFF (mode=Standard, k_neighbors=5, class_weight_when_smote=True)
[CLASS_WEIGHT] mode=Off
[目標分佈]
- 類別 0: 35 (68.6%)
- 類別 1: 16 (31.4%)
[偵測] 目標型態:binary;classes=[0 1]
=== Basic ML Benchmark (Standard | Repeated Stratified 5-fold x 100 CV) | seed=42 | SMOTE=OFF | class_weight=OFF ===
model AUC F1_pos(=1) Prec_pos Rec_pos F1_neg(=0) Prec_neg Rec_neg MCC Accuracy Pred1_mean Pred0_mean
NaiveBayes 0.885 0.748 0.808 0.745 0.889 0.893 0.898 0.668 0.850 3.096 7.104
LogisticRegression 0.876 0.659 0.769 0.629 0.873 0.853 0.906 0.574 0.819 2.670 7.530
SVM 0.839 0.640 0.761 0.601 0.866 0.842 0.905 0.549 0.809 2.586 7.614
RandomForest 0.822 0.639 0.745 0.616 0.857 0.845 0.884 0.538 0.800 2.780 7.420
KNN 0.790 0.578 0.767 0.503 0.863 0.812 0.932 0.501 0.797 2.082 8.118
DecisionTree 0.681 0.552 0.594 0.570 0.790 0.807 0.792 0.379 0.722 3.274 6.926
--- Aggregated Confusion Matrix Sums (Standard) ---
model TN_sum FP_sum FN_sum TP_sum
NaiveBayes 3143 357 409 1191
LogisticRegression 3170 330 595 1005
SVM 3166 334 641 959
RandomForest 3094 406 616 984
KNN 3261 239 798 802
DecisionTree 2771 729 692 908
--- Specificity / Sensitivity (Standard) ---
model Specificity Sensitivity
NaiveBayes 0.898 0.744
LogisticRegression 0.906 0.628
SVM 0.905 0.599
RandomForest 0.884 0.615
KNN 0.932 0.501
DecisionTree 0.792 0.568