My oVirt cluster runs on an NFS server that gets regular updates. As a result, I've nailed down a concrete process to fully shut down the cluster and start it back up again.

Shutdown

  1. Shutdown all VMs
  2. Maintenance all hosts except the one running the hosted engine (migrate it to your favourite if you wish)
    • You can leave them turned on, but they will need to be in maintenance mode
    • Through the UI: Compute > Hosts > Select a host (or multiple with CTRL) > Management > Maintenance
    • This ensures that these hosts all unmount any shared storage they have.
  3. Put the cluster in Maintenance Mode
    • Through the UI:
      • Compute > Hosts > Select an active host > Three dots menu in the top right > Enable Global HA Maintenance
    • Through the CLI:
      • On the host that's still running, run hosted-engine --set-maintenance --mode=global as root
    • This ensures that the "cluster" agents on any active hosts (hopefully just one by now) doesn't try to automatically restart anything.
  4. Put each storage domain in Maintenance Mode
    • Through the UI: Storage > Data Centers
    • Select your Data Center
    • Select each Domain Name from the list and then click Maintenance from the top right.
    • This ensures that the remaining host unmounts any storage that's not the hosted-engine storage domain.
  5. Shut down the hosted-engine
    • On the host that's still running: hosted-engine --vm-shutdown
    • Once that command has completed, wait for the VM to shutdown - you can watch free memory or systemd-cgtop and wait for the /machine.slice/machine-qemu\x2d24\x2dHostedEngine.scope entry to vanish.
  6. Disconnect the last host from the Hosted Engine's storage:
    • hosted-engine --disconnect-storage
    • Verify that there's no remote filesystems in mount

At this point, the storage should be completely disconnected and you can perform whatever maintenance you need to do. Any or all of the hosts can be shut down if needed.

Startup

Assuming your storage is back up and fully available, and any hosts you want to activate are also online.

  1. Connect back on to your "favourite" host that you used to shut everything down.

  2. Reconnect the Hosted Engine storage domain:

    • hosted-engine --connect-storage
    • Check it's now mounted with mount and df -h
  3. Check that the Hosted Engine's HA agent has activated OK

    • hosted-engine --vm-status
    • This should report a long list of each host and it's status. It can take a couple of minutes for it to pick up a freshly reconnected storage. If you don't have any luck after about ten minutes, try restarting the host, and then try again.
  4. Start the Hosted Engine

    • hosted-engine --vm-start
    [root@host ~]# hosted-engine --vm-start
    VM exists and is Down, cleaning up and restarting
    VM in WaitForLaunch
    
    • You can monitor this both by hosted-engine --vm-status and by connecting to the URL of the Hosted Engine. The status of the VM on the particular host will slowly change until it looks like:
    --== Host host.domain (id: 3) status ==--
    
    Host ID                            : 3
    Host timestamp                     : 1680476
    Score                              : 3400
    Engine status                      : {"vm": "up", "health": "good", "detail": "Up"}
    Hostname                           : host.domain
    Local maintenance                  : False
    stopped                            : False
    crc32                              : af02b3f8
    conf_on_shared_storage             : True
    local_conf_timestamp               : 1680476
    Status up-to-date                  : True
    Extra metadata (valid at timestamp):
       metadata_parse_version=1
       metadata_feature_version=1
       timestamp=1680476 (Sun Jan 29 18:29:01 2023)
       host-id=3
       score=3400
       vm_conf_refresh_time=1680476 (Sun Jan 29 18:29:01 2023)
       conf_on_shared_storage=True
       maintenance=False
       state=GlobalMaintenance
       stopped=False
    
  5. Once the Hosted Engine is fully started up, we can unwind the rest of the shutdown. Log into it.

  6. Disable the Cluster's HA maintenance:

    • Through the UI:
      • Compute > Hosts > Select an active host > Three dots menu in the top right > Disable Global HA Maintenance
    • Through the CLI:
      • hosted-engine --set-maintenance --mode=none
  7. Activate your Storage Domains:

    • Through the UI: Storage > Data Centers
    • Select your Data Center
    • Select each Domain Name from the list and then click Activate from the top right.
  8. Activate your hosts:

    • Through the UI: Compute > Hosts > Select a host (or multiple with CTRL) > Management > Activate
  9. Start your VMs.