How much of a domain mismatch is there in the CDX test set to the DnR dataset?

It seems that validation on the DnR validation set is completely off in terms of evaluation on the CDX test set. Does anyone else face the same issue?

One big difference is obviously the stereo nature of the test set, I wonder what other kinds of differences might be there?