framework
framework copied to clipboard
Capture more volumedriver event message
Problem description
The volumedriver emits certain events that we do not capture @redlicha please provide us a list so we can handle them
Protobuf descriptions of the events:
- https://github.com/openvstorage/volumedriver/blob/dev/src/volumedriver/Events.proto
- https://github.com/openvstorage/volumedriver/blob/dev/src/volumedriver/VolumeDriverEvents.proto
- https://github.com/openvstorage/volumedriver/blob/dev/src/filesystem/FileSystemEvents.proto
The former provides a base message type that is extended by the latter two.
The generated code is part of the -base package:
$ find . -name \*_pb2.py
./usr/lib/python2.7/dist-packages/volumedriver/storagerouter/Events_pb2.py
./usr/lib/python2.7/dist-packages/volumedriver/storagerouter/VolumeDriverEvents_pb2.py
./usr/lib/python2.7/dist-packages/volumedriver/storagerouter/FileSystemEvents_pb2.py
Some more info on the VolumeDriverErrorCodes
-
Unknown: (unused) placeholder -
ReadFromDisposableSCO: error reading data from a disposable SCO in the SCO cache (I/O error on device?) Will trigger SCO cache mountpoint offlining. -
ReadFromNonDisposableSCO: error reading data from a non-disposable SCO in the SCO cache (I/O Error on device?). Will trigger SCO cache mountpoint offlining and the SCO will be fetched from the DTL to another mountpoint (if there are any). -
PutSCOToBackend: problem uploading a SCO to the backend (due to backend errors, local I/O errors, checksum mismatch ...). Depending on the exact cause this could trigger a SCO cache mountpoint getting offlined (and the SCO getting fetched again from the DTL to another mountpoint); -
PutTLogToBackendproblem uploading a TLog to the backend (due to backend errors, local I/O errors, checksum mismatch ...). Depending on the exact cause this could lead to the volume getting put into 'halted' state. -
PutSnapshotsToBackend: problem uploadingsnapshots.xmlto the backend (due to backend errors, local I/O errors, ...). Depending on the exact cause this could lead to the volume getting put into 'halted' state -
GetSCOFromBackend: failure to download a SCO from the backend to the SCO cache. Obsolete as we use partial reads. -
GetTLogFromBackend: failure to fetch a TLog from the backend (due to backend errors, local I/O errors, ...). This can happen on MDS slave updates or volume restarts; the former will be logged and ignored, the latter will lead to a failed restart -
GetSnapshotsFromBackend: analogous toGetTLogFromBackend, forsnapshots.xml -
ReadSourceSCOWhileMoving; unused -
MetaDataStore: unused -
ReadTLog: error reading a local TLog (I/O error, ...). Can happen on MDS slave updates and volume restarts - cf.GetTLogFromBackend -
ReadSnapshots: analogous toReadTLogforsnapshots.xml -
ApplyScrubbingRelocs: error during scrub result application to the volume's metadata. Might leak scrub result data. -
GetScrubbingResultsFromBackend: failure to fetch scrub result info from the backend, unused at the moment. -
WriteToSCO: failure to write to a SCO in the SCO cache (I/O error, ...). Will lead to the mountpoint getting offlined. -
WriteDestinationSCOWhileMoving: unused -
WriteTLog: failure to write to a TLog (I/O error). Will lead to the volume gettinghalted. -
WriteSnapshots: failure to writesnapshots.xmllocally (I/O error). Will lead to the volume gettinghalted -
ApplyScrubbingToSnapshotMamager: unused (and misspelt) -
SCOCacheMountPointOfflined: a SCO cache mountpoint was offlined, usually as a consequence of another error, or an error discovered by the cleanup thread -
ClusterCacheMountPointOfflined: a cluster cache mountpoint was offlined (I/O error) -
GetSCOFromFOC: failure to fetch a SCO from the DTL. Might lead to the mountpoint getting offlined and the volume gettinghalted -
VolumeHalted: volume enteredhaltedstate due to one of the errors listed here or due to fencing -
DiskSpace: unused -
MDSFailover: a volume failed over to an MDS slave