The AdamW optimizer was used with a Weight Decay of 0.05 to prevent overfitting. Also, for stability in the early stages of training, a Linear Warmup strategy was implemented, linearly increasing the ...