map2loop-2 icon indicating copy to clipboard operation
map2loop-2 copied to clipboard

[BUG] iso characters

Open markjessell opened this issue 2 years ago • 2 comments

Describe your issue

If CODE field entries have accents, e.g. "Amphibolites_et_métagabbros" then networkx fails

probably true for GROUP entries as well?

see https://stackoverflow.com/questions/61789659/networkx-impossible-to-read-my-gml-file-input-is-not-ascii-encoded

Minimal reproducing code example

use accents in a field that will be used as CODE

Error message

File "/home/mark/map2loop-2_latest/map2loop-2/map2loop/topology.py", line 39, in __init__
    self.graph = nx.read_gml(config.strat_graph_filename, label="id")
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/utils/decorators.py", line 766, in func
    return argmap._lazy_compile(__wrapper)(*args, **kwargs)
  File "<class 'networkx.utils.decorators.argmap'> compilation 5", line 5, in argmap_read_gml_1
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 195, in read_gml
    G = parse_gml_lines(filter_lines(path), label, destringizer)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 438, in parse_gml_lines
    graph = parse_graph()
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 427, in parse_graph
    curr_token, dct = parse_kv(next(tokens))
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 373, in parse_kv
    curr_token, value = parse_dict(curr_token)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 421, in parse_dict
    curr_token, dct = parse_kv(curr_token)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 373, in parse_kv
    curr_token, value = parse_dict(curr_token)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 421, in parse_dict
    curr_token, dct = parse_kv(curr_token)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 358, in parse_kv
    curr_token = next(tokens)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 314, in tokenize
    for line in lines:
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 188, in filter_lines
    raise NetworkXError("input is not ASCII-encoded") from err

markjessell avatar Sep 07 '23 06:09 markjessell

Hi Mark,

Networkx does support accents in labels, the code below works including reading and writing gml: import networkx as nx G = nx.Graph() G.add_node("é") print(G.nodes) nx.write_gml(G, "tmp.gml") G2 = nx.read_gml(G, "tmp.gml") print(G2.nodes)

However it seems that the gml written out by map2model doesn't check for non-ASCII characters in the node labels before writing them, hence they corrupt the node labels (yEd still opens the gml file but corrupts the label). As map2model is C++ code using ASCII rather than python's UTF-8 this is what caused the problem.

Exploring further this is only a problem for the "c" field as "u" and "g"/"g2" labels are not an output from map2model.

In the example above the 'é' character needs to be escaped as "&#233" which is the html equivalent used for gml files. Either map2model can check for all the standard accented characters and replace them with the html equivalent or it could use a gml library so the encoding is done automatically.

RoyThomsonMonash avatar Sep 13 '23 02:09 RoyThomsonMonash

Thanks Roy

I came across this as a way to normalise the strings prior to reading them in (since gml is just a text file anyway we could run this over the file prior to use by networkx?

https://stackoverflow.com/questions/44431730/how-to-replace-accented-characters

m

markjessell avatar Sep 13 '23 08:09 markjessell