How to fix NTP issues on Nutanix CVMs

First, research the following article: https://portal.nutanix.com/#/page/kbs/details?targetId=kA032000000bmjeCAA

But if you suspect an offset anyway, run the following. From this example, the time is way off.

allssh grep offset ~/data/logs/genesis.out
================== 173.23.33.12 =================
2018-06-20 08:54:07 INFO time_manager.py:555 NTP offset: -119.154 seconds
2018-06-20 09:04:43 INFO time_manager.py:555 NTP offset: -119.149 seconds
2018-06-20 09:15:14 INFO time_manager.py:555 NTP offset: -119.154 seconds
2018-06-20 09:25:50 INFO time_manager.py:555 NTP offset: -119.145 seconds
2018-06-20 09:36:21 INFO time_manager.py:555 NTP offset: -119.154 seconds

Cassandra will not allow for the server to immediately switch back to another time because of the large offset and messing up with timestamps but fear not, it comes with a script to slowly catch up “Fix Time Drift”

allssh '(/usr/bin/crontab -l && echo "*/5 * * * * bash -lc /home/nutanix/serviceability/bin/fix_time_drift") | /usr/bin/crontab -'

Then you can keep an eye on the cluster time offset using:

for i in `svmips` ; do echo CVM $i: ; ssh $i "/usr/sbin/ntpq -pn" ; echo ; done
CVM 173.23.33..12:
FIPS mode initialized
Nutanix Controller VM
remote refid st t when poll reach delay offset jitter
==============================================================================
*173.23.33..1 13.65.245.138 3 u 23 256 377 0.513 23.415 21.630
127.127.1.0 .LOCL. 10 l 107m 64 0 0.000 0.000 0.000

CVM 173.23.33..14:
FIPS mode initialized
Nutanix Controller VM
remote refid st t when poll reach delay offset jitter
==============================================================================
*173.23.33.12 173.23.33..1 4 u 135 256 377 0.226 6.950 6.836

CVM 173.23.33..16:
FIPS mode initialized
Nutanix Controller VM
remote refid st t when poll reach delay offset jitter
==============================================================================
*173.23.33.12 173.23.33..1 4 u 30 256 377 0.240 -2.570 6.010
<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>

or with the previously mentioned command:

allssh grep offset ~/data/logs/genesis.out
================== 173.23.33.12 =================
2018-06-21 09:41:16 INFO time_manager.py:555 NTP offset: -118.121 seconds
2018-06-21 09:40:22 INFO time_manager.py:555 NTP offset: -0.000 seconds
2018-06-21 09:50:52 INFO time_manager.py:555 NTP offset: 0.005 seconds
2018-06-21 10:01:22 INFO time_manager.py:555 NTP offset: 0.006 seconds

when all caught up, run the ntp health check:

ncc health_checks network_checks check_ntp

Also, after all is clear don’t forgot to remove the fix_time_drift crontab job!

allssh "(/usr/bin/crontab -l | sed '/fix_time_drift/d' | /usr/bin/crontab -)"
Advertisements