Phase 1: Content encoder (pretrained ResNet50 backbone) + style encoder trained with contrastive loss (NT-Xent) to disentangle content from style. Phase 2: Joint training with a decoder for ...