folia icon indicating copy to clipboard operation
folia copied to clipboard

some questions regarding the new <t-hspace> tag

Open kosloot opened this issue 5 years ago • 7 comments

recently a <t-hspace> tag is introduced, but when I started using it , some questions arose:

  1. It is possible the add some text to a <h-space> like this: <t-hspace>extra text</t-hspace> This is acceptable to foliavalidator and folialint, but doesn't show up in text() output. Probably that is OK In libfolia, it DOES show up, which is a bug I assume? But shouldn't we disallow this construct? To avoid strange effects and misunderstandings?
  2. There are NO predefined class values for <h-space>. I understand the ratio, but that poses a big burden on all tools that would like to make use of it. They all have to create their own text() extraction functions and would be very helped by a predefined set, that the libraries support. Like "tab", "space", "wide-space", or such. I realize that defining such a set might be a challenge, but still. The text() function is very complex and replicating it is cumbersome. (like handling of the tag' feature already showed us.) Another possibility might be a way of providing a translation table for those class values: tab ==> '\t' space ==> ' _' wide-space ==> ' __'

kosloot avatar Apr 12 '21 09:04 kosloot

  1. Good point, this is indeed not intentional and should be disallowed.
  2. We could define a set, implement some support for it in the libraries, and recommend its usage. It's then simply up to users whether they decide to use that set or not (i.e. it'll be an opt-in choice).

proycon avatar Apr 12 '21 10:04 proycon

Good point, this is indeed not intentional and should be disallowed.

Maybe the same holds for a few of the other text Markup tags too?

We could define a set, implement some support for it in the libraries, and recommend its usage. It's then simply up to users whether they decide to use that set or not (i.e. it'll be an opt-in choice).

That would be great. Leaving us with a challenge to create a reasonable set.

kosloot avatar Apr 12 '21 10:04 kosloot

We can simply forbid text in a TextMarkupHSpace by adding 1 line in folia_properties.cxx:

//------ TextMarkupHSpace -------
    TextMarkupHSpace::PROPS = AbstractTextMarkup::PROPS;
    TextMarkupHSpace::PROPS.ACCEPTED_DATA.erase( XmlText_t );           <=== 1 extra line
    TextMarkupHSpace::PROPS.ELEMENT_ID = TextMarkupHSpace_t;

But maybe this is not generic enough?

Otherwise XmlText_t could be removed from AbstractTextMarkup::PROPS, and explicitly added for the Sub-classes it applies to?

kosloot avatar Apr 12 '21 12:04 kosloot

Generally we have the TEXTCONTAINER property for this. ACCEPTED_DATA only carries FoLiA elements in my implementations.

proycon avatar Apr 12 '21 12:04 proycon

A right. That is a better solution, and it works:

folialint tests/bug59.xml
tests/bug59.xml failed: XML error: found extra text 'test' inside element <t-hspace>, NOT allowed there.

the input contained:

    <div xml:id="example.div.4" class="section" n="4">
      <t>Space,<t-hspace>test</t-hspace>the<t-hspace/>final<t-hspace/><t-hspace/>frontier</t>
    </div>

kosloot avatar Apr 12 '21 12:04 kosloot

Ok, but still there is room for rather suspicious constructions like:

      <t>Space,<t-hspace><t-str>test</t-str><t-hbr>what</t-hbr></t-hspace>the<t-hspace/>final<t-hspace/><t-hspace/>frontier</t>

This passes folialint and foliavalidator, and both folia2txt and FoLiA-2text ignore everything inside the <t-hspace> but still this is confusing and should be rejected imho

kosloot avatar Apr 12 '21 14:04 kosloot

Agreed

proycon avatar Apr 13 '21 09:04 proycon