I think your case is the case where square resize is not the best choice.
Look you are doing semantic segmentation, it's shape, place, and form is super important, and your job is to maintain them in the output.
You kind of use deformed images by forcing them and breaking the ratio, specifically when you are doing person segmentation. It means your images or objects can be vertical and you compress them to make a square.
I will recommend you to maintain the ration by adding padding something like this.
I've seen a couple of occasions, when this approach improved the results.