"Official" train/dev/test?

I have never reported results on CALLHOME because of the (apparent) lack of an official train/validation/test split (or at least validation/test split).

What experimental protocol does BUT use for reporting results?
Validation on part1, test on part2?
Validation on part2, test on part1?
Both?

cc @fnlandini