The test conducted in part 1 is a bit artificial. It is possible that the effect is not an issue in the "real world".
There are (at least) two effects that could potentially have an impact on the test. The first is that the reference signal and the signal under test is phase coherent. It may be that the interpolators are hit at precisely the same "spot" every measurement, so any nonlinearity in the interpolators will accumulate. The 53230A has an FPGA inside, and the internal XO is phaselocked to the reference signal - I do not know if the phase coherence "survives" all the way to the interpolators, but it is not impossible. See this patent for some pointers on the interpolator circuits. It seems odd that only the first sample is biased if this is the case, but it is perhaps conceivable be that there is an interaction between the input frequency and the samplerate used internally.
The other effect that may come into play is the "Leiby-effect" - an inability of frequency counters to accurately measure frequencies very close to an exact multiple of its own reference. Several counters was tested in this paper, but the types are not listed. It is safe to assume the 53230A was one of them, and that it exhibits this limitation to some degree.
Luckily, both of these can be tested at once. Use a precise clock generator, locked to the same reference, and generate a signal not a multiple of 10 MHz. The system clock in the 53230A is running at 100 MHz, so it is perhaps best to avoid this as well. The 53230A has a minimum gate time of 10 us, it seems a reasonable assumption that the 53230A process samples internally at this rate - the measured frequency should then not have a period an exact multiple of 10 us, to ensure the interpolators are given a workout.
A test frequency of 11.111 111 MHz does not divide evenly into 10 us, and it does not divide evenly into gate times shorter than 1 second.
The test was run using a Keysight 33510B clocked from the Maser as shown on the right, and configured to output a 4Vpp sine wave. The 53230A was clocked from the same DA. Due to time contraints, gate time 1 second was not tested. The number of samples per trigger collected was increased to 25.
The data looks less random than in the previous test, but the bias is obviously present. Clearly the bias also applies to "real world scenarios".
For completeness, I also ran the tests with gate times multiples of 3:
Again the data looks less random that what might have been expected, but the bias is clearly present in all plots except
CONTinous mode, gate time 0.3 seconds. Also
CONTinous mode, gate time 0.003 stands out. Something interesting is going on there, more tests are called for. It could be the signal source - no way to tell from this, I think.
Unprocessed data used to create these plots available here
To the extent it is possible to draw any conclusions from these tests, it looks like that whatever is causing the bias it is not a result of beatnotes/interactions between clock edges. There are some visual clues in the data that clock edge interaction may be present in the data - but this could also come from the signal source. In any case, I *think* this is unrelated to the bias of the first sample. Ideas on further tests to shed light on this are welcome.
More importantly, it is also clear that the bias does apply outside of "artificial lab settings".