Add a matrix cleanup for numeric indices
It is common to have a matrix like this one from Wikipedia:
In this case and likely many others, the indices are just numbers 11, 12, etc., and will be a single mn in MathML (e.g., <mn>11</mn>).
The cleanup code should try to find this and convert the <mn>11</mn> to
<mn>1</mn><mo>⁣</mo><mn>2</mn>
Note: U+2063 is "invisible separator".
This would improve the speech significantly.
Typed without spaces:
$$ A = \begin{bmatrix} A_{11} & A_{12} \ A_{21} & A_{22} \end{bmatrix} $$
Typed with spaces:
$$ A = \begin{bmatrix} A_{1 1} & A_{1 2} \ A_{2 1} & A_{2 2} \end{bmatrix} $$
For a matrix whose elements are indicated with numeric indices, I suspect the proposed pronunciation would be "cap A sub 1 1." This would make the reading more natural within the matrix itself, but would MathCAT try to apply such verbalization to $A_{21}$ outside of the matrix (elsewhere in a document)? How would MathCAT determine when to do so?
Also, it seems that the following LaTeX expressions lead to the intended speech (and braille, using a subscript comma ⠪⠀in Nemeth) when MathCAT encounters the Pandoc-generated MathML, but no difference is present when one tries to copy the LaTeX through MathCAT:
-
$A_{1 1}$ (
$A_{1 1}$)<math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>A</mi><mrow><mn>1</mn><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">A_{1 1}</annotation></semantics></math> -
$A_{1 \thinspace 1}$ (
$A_{1 \, 1}$)<math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>A</mi><mrow><mn>1</mn><mspace width="0.167em"></mspace><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">A_{1 \, 1}</annotation></semantics></math>
$A_{11}$, however, is verbalized as "cap A eleven" and brailled as such:
<math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>A</mi><mn>11</mn></msub><annotation encoding="application/x-tex">A_{11}</annotation></semantics></math>
Perhaps the plan is to have $A_{11}$ read as if it were $A_{1 1}$ when occurring in a matrix, but it seems that Pandoc (as well as GitHub) makes it easy for authors to convey the intent of numeric indices, simply by leaving a space in the source LaTeX.
In general, it seems risky to change a 11 to be two ones as in changing a_{11} to a_{1 1}. However, it seems a lot less risky to do this when we are in a table-like element, especially if there is a pattern for indices like 11, 12, etc. Or maybe simpler to detect is if the number of rows/columns is greater than the index when split into digits. For m1, mn, etc., it's a bit trickier. So maybe relax the check even further to just being in a matrix/determinant? If it is a 10 row/column matrix (vector), then "10" might be wrongly spoken/brailled, but that seems like an unlikely case.