Internationalization support
I cannot find anything related to internationalization support neither in the documentation nor in the code itself. What is the status of internationalization support in dicomParser ?
I have not checked but I suspect some string won't play well with JSON which is limited to UTF-8 /UTF-16/UTF-32 strings.
Ref:
Here is what I get using the DICOM Dump to JSON live example:

while it should look like:

ref:
readFixedString does not seems to check the value for SpecificCharacterSet (0008,0005) as seen at:
- https://github.com/cornerstonejs/dicomParser/blob/82573d94342e12b4bae4fcd2e93f91bb061474cf/src/byteArrayParser.js#L29-L37
refs:
- http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.12.html#table_C.12-1
- http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.12.html#sect_C.12.1.1.2
@malaterre Currently dicomParser itself doesn't do any character set decoding. However, if you need it now, you can pair dicomParser with the dicom-character-set library that I wrote.
@yagni That looks pretty promising. One thing I still fail to understand. The original string function from dicomParser seems to be doing the following:
- Take raw byte
- Consider it as ISO-8859-1 character, turn it into UTF-16
- Return a truncated string (stop at first byte === 0)
So I am wondering what is the expected input to your library ? Can I pass directly the output of string element function ?
I didn't have any experience with DICOM character sets so didn't factor it into the original design. I like the idea of putting it in a separate library like @yagni did so it can be added in for those that need it. I specifically didn't add image decompression to this library for the same reason.
@chafey I am pretty sure that this test is just wrong:
if (byte === 0) {
This feel like a c-string ASCII ending. I am sure we can have byte===0 in unicode (we should only rely on the length).
@malaterre You'll need to pass the raw bytes to dicom-character-set. If I remember correctly, fromCharCode converts it to UTF so you end up with bytes not in the original data. So just slice the byteArray starting at the element's dataOffset and going for its length number of bytes, then pass that into dicom-character-set, along with the Specific Character Set and optional VR (see the readme for more details).
@yagni Thanks for the confirmation. @chafey it would be nice to document what string is actually doing. I hope the next version will offer a function rawString, that would be clearer (IMHO).
It probably makes sense to revisit the whole repo in light of non ascii character sets, lots of code is using this library now and we should not be propogating designs with are not character set aware
Hello!
Are there some new info about this feature?
Or, maybe someone, can help to understand, how to get the raw data from tag, and I can parse it by my self... ?
For example, how can i get this binary data from "x00100010"?

Thanks alot!
@creemer To get the raw data, create a Uint8Array at the data offset of the element (like we do in the readme) of the appropriate length:
const patientNameElement = dataSet.elements.x00100010;
const patientNameBytes = new Uint8Array(dataSet.byteArray.buffer, patientNameElement.dataOffset, patientNameElement.length);
If you don't want to parse those bytes yourself at this point, you can pass them, along with the value of the Specific Character Set element, to my dicom-character-set library:
import { convertBytes } from 'dicom-character-set';
const str = convertBytes(dataSet.string('x00080005'), patientNameBytes, {vr: 'PN'});
@yagni Thanks a lot! It is all I need :)