pyreadstat icon indicating copy to clipboard operation
pyreadstat copied to clipboard

Support for SPSS multiple response categories

Open Berndvanderwielen opened this issue 6 years ago • 9 comments

Support to retrieve (meta) data on MRVs / multiple answer question groupings would be great.

Berndvanderwielen avatar May 24 '19 13:05 Berndvanderwielen

Sorry, no idea what MRVs are. Can you please provide an example SPSS file and explain what it is and what information are you trying to retrieve?

ofajardo avatar May 24 '19 14:05 ofajardo

The official name is "Multiple Response sets". URL: https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/base/multiple_response_intro.html

If the SPSS documentation is not enough I can provide an example SPSS file.

Berndvanderwielen avatar May 27 '19 15:05 Berndvanderwielen

Yes, a sample file will be needed. In addition a description of the contents in plain text, because the job is to guess where in the binary file is the content you are looking for.

This will require changes to the Readstat C library. I can file an issue over there or you can do it yourself if you prefer. There is no guarantee that they will do it, nor timelines either.

It will help a lot if there is somewhere a description of how are these fields represented in the binary file. If you could find that would be great, because otherwise it will be very difficult to implement. An example of such specification is here or here but these doesn't seem to explain the feature you are requesting (can you see them?) Also other libraries in python or other languages that can do the job could also be useful to look at.

ofajardo avatar May 27 '19 17:05 ofajardo

Closed due to lack of example files.

ofajardo avatar Oct 14 '19 20:10 ofajardo

@ofajardo, sorry for the ping, but I assume you're not receiving comments on closed issues.

Could this be reopened?

The specs for this record in the SPSS file are actually part of the spec you linked: https://www.gnu.org/software/pspp/pspp-dev/html_node/Multiple-Response-Sets-Records.html

Essentially what it does is specify the relationship between multiple questions that should be interpreted as a single question with multiple values instead. example.sav.zip

Attached is an example file that contains 2 sets, one multiple category and multiple dichotomy. For details you could check docs here (https://www.gnu.org/software/pspp/manual/html_node/MRSETS.html#MRSETS), but for implementation that is not relevant.

Let me know if I can be of further assistance, I'm not familiar with python at all, but have spent many hours hating the binary file format that is SPSS SAV...

SamMousa avatar Nov 09 '20 14:11 SamMousa

hi there,

Haven't look into it in detail yet, but this will require that Readstat (the C library behind pyreadstat) implements this.

Could you therefore open an issue there? (I am sure they will appreciate your insights into the binary file format). Once it is implemented in Readstat I will be able to bring it into pyreadstat.

ofajardo avatar Nov 09 '20 15:11 ofajardo

I've opened a new PR to address this #259 . In accordance with our team at Crunch.io and Evan Miller. We'll also open a PR on readstat, so this won't be immediately available. The idea is to rebase the ☝️ pr once readstat changes get shipped.

slobodan-ilic avatar Apr 24 '24 07:04 slobodan-ilic

This feature will be hugely appreciated.

arsoni20 avatar Apr 25 '24 12:04 arsoni20