I am looking for some advice regarding the best choice of binary classification model based on training, validation and test set results. Model 1 (results in 1st image) shows better test set results than Model 2, but Model 2 (results in 2nd image) shows results that seem more intuitive to me with better training set performance than its test set performance. I feel as if the Model 1 test set results might have been a bit of a fluke, whereas Model 2 appears more like a well trained model with more long-term reliability.
Any advice on this is much appreciated.


