Network Platforms - unexpected VMNode reboots – Incident details

unexpected VMNode reboots

Identified
Major outage
Started about 20 hours ago

Affected

Virtual Hosting

Major outage from 3:12 AM to 12:00 AM, Partial outage from 3:12 AM to 12:00 AM, Under maintenance from 3:12 AM to 12:00 AM, Operational from 3:12 AM to 12:00 AM

CT1 - iSCSI SAN

Under maintenance from 3:12 AM to 12:00 AM

CT1 - VMWare Nodes

Operational from 3:12 AM to 12:00 AM

CT1 - vSAN

Operational from 3:12 AM to 12:00 AM

JB1 - iSCSI SAN

Major outage from 3:12 AM to 12:00 AM

JB1 - VM Nodes

Partial outage from 3:12 AM to 12:00 AM

Updates
  • Identified
    Identified

    Good day

    We are investigating an issue where two of our VMNodes restarted unexpectedly after the scheduled maintenance work had been completed. This affected nodes 146(twice) and node 147. The nodes rebooted within 30minutes of each other resulting in HA restarting the VMs on other available nodes.

    Preliminary investigation suggest that an oversubscription of memory is the cause and our engineers are busy manually rebalancing workloads on the cluster to ensure that no single node is oversubscribed.

    Regards

    Network Platforms