MathCAT Add a matrix cleanup for numeric indices

It is common to have a matrix like this one from Wikipedia:

In this case and likely many others, the indices are just numbers 11, 12, etc., and will be a single mn in MathML (e.g., <mn>11</mn>).

The cleanup code should try to find this and convert the <mn>11</mn> to

<mn>1</mn><mo>&#x2063;</mo><mn>2</mn>

Note: U+2063 is "invisible separator".

This would improve the speech significantly.

Jul 20 '23 06:07 NSoiffer

Typed without spaces:

$$ A = \begin{bmatrix} A_{11} & A_{12} \ A_{21} & A_{22} \end{bmatrix} $$

Typed with spaces:

$$ A = \begin{bmatrix} A_{1 1} & A_{1 2} \ A_{2 1} & A_{2 2} \end{bmatrix} $$

For a matrix whose elements are indicated with numeric indices, I suspect the proposed pronunciation would be "cap A sub 1 1." This would make the reading more natural within the matrix itself, but would MathCAT try to apply such verbalization to $A_{21}$ outside of the matrix (elsewhere in a document)? How would MathCAT determine when to do so?

Also, it seems that the following LaTeX expressions lead to the intended speech (and braille, using a subscript comma ⠪⠀in Nemeth) when MathCAT encounters the Pandoc-generated MathML, but no difference is present when one tries to copy the LaTeX through MathCAT:

$A_{1 1}$ ( $A_{1 1}$ )

<math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>A</mi><mrow><mn>1</mn><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">A_{1 1}</annotation></semantics></math>

$A_{1 \thinspace 1}$ ( $A_{1 \, 1}$ )

<math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>A</mi><mrow><mn>1</mn><mspace width="0.167em"></mspace><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">A_{1 \, 1}</annotation></semantics></math>

$A_{11}$ , however, is verbalized as "cap A eleven" and brailled as such:

<math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>A</mi><mn>11</mn></msub><annotation encoding="application/x-tex">A_{11}</annotation></semantics></math>

Perhaps the plan is to have $A_{11}$ read as if it were $A_{1 1}$ when occurring in a matrix, but it seems that Pandoc (as well as GitHub) makes it easy for authors to convey the intent of numeric indices, simply by leaving a space in the source LaTeX.

Feb 04 '25 22:02 NV-Codes

In general, it seems risky to change a 11 to be two ones as in changing a_{11} to a_{1 1}. However, it seems a lot less risky to do this when we are in a table-like element, especially if there is a pattern for indices like 11, 12, etc. Or maybe simpler to detect is if the number of rows/columns is greater than the index when split into digits. For m1, mn, etc., it's a bit trickier. So maybe relax the check even further to just being in a matrix/determinant? If it is a 10 row/column matrix (vector), then "10" might be wrongly spoken/brailled, but that seems like an unlikely case.

Feb 06 '25 19:02 NSoiffer