Recently we've been working with Brent Yates and Geeter Kyrazis at an exciting company called swXtch.io. They deploy a system for multicast distribution in public clouds. Our initial work has focused on a test environment in Microsoft Azure.
Azure offers their customers an "advanced networking" capability which offers "Single Root I/O Virtualization" (SR-IOV) capability based on NVIDIA (/Mellanox) hardware.
Our interest was peaked when we discovered that the network interfaces that are presented with this capability advertises hardware timestamping capability. We thought: "Nooooo... it can't be so". But as our later work determined: "Oh... it be so"... :-)
As is evident the interfaces present with SOF_TIMESTAMPING_*_HARDWARE capability and are linked to PHC IDs 1 and 2.
In addition PHC ID 0 is part of Microsoft's approach to getting hypervisor time into a VM using the /dev/ptp_hyperv symlink to /dev/ptp0.
Timebeat is quite an advanced clock sync client which takes "full"(tm) advantage of the PTP subsystem in Linux. We recently added PHC source support to Timebeat to enable our work with Ahmad Byagowi (Meta's chief time master, founder of the OCP-TAP group and all-round nice guy) and the OCP-TAP/Meta Timecard, so we were in a good place to start with.
For the time being our approach to making the Azure VMs do hardware timestamping with other nodes will have to remain proprietary, but I can share our results with you and I can tell you that there is no magic other than clever use of standard APIs.
It's worth also noting that in this setup we are using our cool PTP+Squared system so that nodes run PTP unicast sessions with other nodes.
Using this approach we get a keen insight into not just whether clients are synchronised well to root sources of time, we get unlimited scalability, but most crucially we verify that clients all have the same time (something every other clock sync system detrimentally omits) :
In the above front-end control (see our front-end in Timebeat's demo environment) the solid lines represent PTP unicast sessions that hosts use to actively steers their clocks and the perforated lines PTP unicast sessions used only to monitor and compare (and as backup to make active if current active sessions are blocked or expires).
We can see that all the nodes are getting time from the "brent-test-sw004". In addition, if we move the cursor over another node in our front-end control can see how PTP+Squared ensures that we know and monitor the offsets between different nodes:
So what results did we achieve....? Let me show you:
bove we see both the filtered and unfiltered results. Timebeat uses a proprietary filter and only uses the blue dots in the above chart to steer the clock, but already you can see that we are in the single microsecond space, so for cloud this is a good result.
But what is surprising and what conclusively shows that hardware timestamping is really being used is when I remove the red dots that are filtered offset measurements from the diagram above.
We can see that Timebeat is managing to keep the accuracy of the PHC within +/- 50ns - that is definitely only achievable using hardware tinestamping between NICs that have a path delay of 359 microseconds between them.
I will also add that I know of no other public cloud environment where this is possible.
You can see from our front page table summary that overall the variation between the clocks in the PTP+Squared mesh is extremely low (here comparing EMAs).
For the "instant" variation you can examine our alerts bubbles front-end control below (values not averaged). The individual values in the bubbles below is actually the cumulative error between various upstream paths available to each individual host.
These are good results and for any application that requires tight synchronisation between different VMs to achieve a particular end, and this is I believe, currently the best that can be achieved in a public cloud. I am also confident that Timebeat is currently the only client advanced enough to achieve these results.
However, I will provide the caveat that close synchronisation between discrete VMs does not infer that synchronisation to UTC is as good and here - I will caution - Azure is lacking somewhat behind Google Cloud and AWS in respect of the PHC /dev/ptp_hyperv approach which sees quite coarse corrections in our experience.
I will mention that Timebeat in addition to the PTP+Squared approach supports synchronising time from (standard) PTP, PPS, NTP, NMEA and PHCs - the latter is particularly relevant if you have an OCP/Meta Timecard, if you are using Azure or if you have a private cloud deployment based on VMware ESXi.
If anyone from NVIDIA or Microsoft happens to be reading this and can shed more light on the current capabilities of Azure, then we would love to hear from you.
Lastly, aside from writing the code for Timebeat and the text in this article, credit for the discoveries that were made through our work described in this article goes to my colleague Ian Gough who worked long hours to get these great results. Kudos my friend, Kudos!