validate icon indicating copy to clipboard operation
validate copied to clipboard

Context reference check is not collapsing whitespace

Open jstone-psi opened this issue 8 months ago • 3 comments

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I created test labels that has a line break in a particularly long observing system component name ("The Origins, Spectral Interpretation, Resource Identification, Security, Regolith Explorer (OSIRIS-REx) Spacecraft"), I noticed that the validate tool was raising a warning that the component name did not match the context object.

This warning does not appear when there are no line breaks in the component name. However, it does appear again when I have multiple spaces in the name.

🕵️ Expected behavior

The data type of Observing_System_Component/name is UTF8_Short_String_Collapsed, so these extra spaces and line breaks should be collapsed into a single space before the check is performed. When the values are collapsed, no warning would occur for these labels.

📜 To Reproduce

  1. Find a label that validates correctly.
  2. Add additional spaces in the middle of the Observing_System_Component/name
  3. Try to validate the new label, and observe the warning

🖥 Environment Info

  • Version of this software: validate 3.7.1
  • Operating System: MacOS Sonoma 14.7.5 with OpenJDK openjdk 23.0.2

📚 Version of Software Used

validate 3.7.1

🩺 Test Data / Additional context

sample_labels.zip

orex-hk.txt

🦄 Related requirements

  • 🦄 https://github.com/NASA-PDS/validate/issues/970
  • 🦄 https://github.com/NASA-PDS/validate/issues/861
  • 🦄 #857

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

jstone-psi avatar Jun 04 '25 17:06 jstone-psi

@jordanpadams @jstone-psi

I am not a lawyer nor do I play one on television nor do I want to do either. However the definition of the UTF8_Short_String_Collapsed says that it is collapsed white space not will be or should be or could be. In other words, what is written in the XML is collapsed as needed not that the reader should collapse it. Is this going to become a DDWG thing?

al-niessner avatar Jun 18 '25 21:06 al-niessner

I think you are not off the mark with your reading (the standards reference describes all of the collapsed data types as if only pre-collapsed data were allowed). If what you are saying is correct, then this means that there is still a different problem. It seems that this should have been a validation failure, since the original value contained invalid characters.

Something strikes me as not quite right about this answer, though. First, the example products have values that are not pre-collapsed. Additionally, the XML specifications say that this should be handled at the data-normalization step, not in storage.

https://www.w3.org/TR/xmlschema-2/#datatype-components https://www.w3.org/TR/REC-xml/#AVNormalize

At this point, I'm willing to back away from this issue for now, but I'll leave these for reference.

jstone-psi avatar Jun 23 '25 17:06 jstone-psi

@jstone-psi I hate to say it, but can we get a ticket opened with the DDWG to clarify what is meant here? I agree this is confusing, and we clearly are not checking this properly either in schematron or in our validation checks.

@rsjoyner any thoughts here?

jordanpadams avatar Jun 24 '25 20:06 jordanpadams