WeakRefStrings.jl icon indicating copy to clipboard operation
WeakRefStrings.jl copied to clipboard

setindex! incorrect for non-UTF-8 strings?

Open nalimilan opened this issue 6 years ago • 2 comments

These two lines don't seem correct to me for non-UTF-8 AbstractString types: https://github.com/JuliaData/WeakRefStrings.jl/blob/caf4ed477e493309d12502ab0984eec157120925/src/WeakRefStrings.jl#L369-L370

Indeed this will copy the contents of the string even if it uses a different encoding from existing data.

nalimilan avatar May 07 '19 20:05 nalimilan

what would you suggest though? is there a standard api for getting the encoding of a string? converting it to utf8? or maybe if it's not in the encoding of the rest of the array, we reject it?

quinnj avatar May 07 '19 21:05 quinnj

AFAIK there's no API to get the encoding of a string, but that would be a logical complement to codeunit/codeunits. BTW, there's no guaranty that you can call pointer on an AbstractString and get a pointer to the data: one would need to use codeunits anyway even if the encoding matched.

Waiting for a better API, I guess the only solution is to have a fast method for String with StringArray{<:Union{Missing, String}}, and a slower method iterating over characters for other cases.

nalimilan avatar May 08 '19 09:05 nalimilan