grabbit icon indicating copy to clipboard operation
grabbit copied to clipboard

Filter trying to write invalid characters

Open jdigger opened this issue 9 years ago • 7 comments

Even though a Node name may not meet the JCR specification, it is still possible (especially for content migrated from old CQ instances) for bad data to come over.

Grabbit should either filter (not write) bad nodes while logging the error, rewrite the name/property to something legal while logging the error, or fail with a clearer error message.

See GH-114

jdigger avatar Jul 18 '16 22:07 jdigger

https://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/util/Text.html#escapeIllegalJcrChars(java.lang.String) Potentially for reference

jbornemann avatar Jul 20 '16 14:07 jbornemann

Hi @jbornemann I've tried rewriting the name in ProtoNodeDecorator#getOrCreateNode(Session) method, using https://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/util/Text.html#escapeIllegalJcrChars(java.lang.String), as well as with https://docs.adobe.com/content/docs/en/cq/5-6-1/javadoc/com/day/cq/commons/jcr/JcrUtil.html#createValidName(java.lang.String)

One issue i found with Text#escapeIllegalJcrChars is it will escape colon(:) as well, so a jcr:content node will be written as jcr%3Acontent in repository. Additionally, JcrUtil#createValidName will not be much useful either, as it converts a dot(.) and colon(:) into an underscore(_), as in a default.groovy will become default_groovy.

Should i write my own method which will filter only the needed characters viz "[, ]" as given in #114 and may be some others as well ?

Shashi-Bhushan avatar Jul 25 '16 10:07 Shashi-Bhushan

Funny enough, that method isn't wrong, in fact ':' is technically an invalid name character according to the spec, and Jackrabbit even admits so in https://wiki.apache.org/jackrabbit/EncodingAndEscaping; however in AEM we freely use it in things like jcr:content, and Jackrabbit seems to not restrict it's use. According to the spec, if you want to use ':', or '[' in your JCR implementation, you are supposed to map these restricted characters to special private-use code points defined in the spec.

Anyways, since we don't really understand the root cause of this issue, or really if there is a practical issue at all; since we support > CQ 5.6; I would vote no-action on this issue.

jbornemann avatar Jul 29 '16 14:07 jbornemann

@jdigger, @sagarsane what's your take on this one ?

Shashi-Bhushan avatar Aug 01 '16 09:08 Shashi-Bhushan

@sagarsane , @jdigger : Should this just be closed?

viveksachdeva avatar Aug 18 '16 16:08 viveksachdeva

I think its ok to keep this issue open (in-case we have cases where there is bad data already in AEM 6.x instances and then Grabbit is used to sync) .. but as Jeff said, we may probably not take any action on this right now ..

sagarsane avatar Aug 19 '16 03:08 sagarsane

FYI, #175 is another case of this this happening (like we saw in #114)

sagarsane avatar Feb 26 '17 00:02 sagarsane