OpenCRE Changes in db.py for solving issue #546

Here ,to ensure that ISO numbers are treated as strings when fetched from the database and returned in API responses ,I have made changes to db.py and web_main.py keeping the resource fetching as dynamic .

Mar 19 '25 04:03 Pa04rth

Hey @Pa04rth thansk for the effort, I'm not sure this approach will work as the ISO numbers have already been stored in the database in the "wrong" format.

e.g.

37a59bb5-d416-4ebe-ae54-2f7608de0b05|ISO 27001|Acceptable use of information and other associated assets|||||Standard||5.1

that 5.1 should have been 5.10

I think the problem is with importing, if you look here you will see that the spreadsheet by default is interpreted as numbers . I think what you need to do is pass numericise_ignore to the spreadsheet reading piece and re-import

https://docs.gspread.org/en/latest/api/models/worksheet.html#gspread.worksheet.Worksheet.get_all_records

this will be a bigger piece of work however, the importing stuff is a pretty big PITA although it's pretty well tested

Mar 19 '25 09:03 northdpole

Hey @northdpole , extremely sorry that my this approach didn't work , but according to what you have told , I think we must make the following changes in the spreadsheet_parser file ,in order to change the interpretation of the spreadsheet -

Importing the gspread {library which interpretate the spreadsheet} and oauth2client , in order to be able to read the spreadsheet
Add a function to read the spreadsheet using the numericise_ignore parameter. [As per the website which you linked above] `def read_spreadsheet(file_path: str) -> List[Dict[str, Any]]: scope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/drive"] creds = ServiceAccountCredentials.from_json_keyfile_name("path/to/credentials.json", scope) client = gspread.authorize(creds) sheet = client.open("Spreadsheet_Name").sheet1

numericise_ignore = ["ISO Number"] records = sheet.get_all_records(numericise_ignore=numericise_ignore) return records` 3.The above function I have created ,which will first open the spreadsheet and then specify the columns that should be ignored for numeric conversion.
Then make the required changes in "parse_export_format" function , so that it uses new reading function

If you find any anomaly with my solution please tell me , we may discuss it here or in Slack .But I think this may solve the issue .

Mar 23 '25 06:03 Pa04rth

And the second way to do the same thing without using google authentication approach is using "Pandas Library" to read files instead of gspread . By using the "dtype()" parameter , it will treat certain columns as strings .The following is the function which I have created ,which may give you an insight of how this approach will work - def read_spreadsheet(file_path: str) -> List[Dict[str, Any]]: dtype = { "ISO Number": str, } df = pd.read_csv(file_path, dtype=dtype) records = df.to_dict(orient='records') return records

Mar 23 '25 08:03 Pa04rth

Hey @northdpole , extremely sorry that my this approach didn't work , but according to what you have told , I think we must make the following changes in the spreadsheet_parser file ,in order to change the interpretation of the spreadsheet -

Importing the gspread {library which interpretate the spreadsheet} and oauth2client , in order to be able to read the spreadsheet

Add a function to read the spreadsheet using the numericise_ignore parameter. [As per the website which you linked above] def read_spreadsheet(file_path: str) -> List[Dict[str, Any]]: scope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/drive"] creds = ServiceAccountCredentials.from_json_keyfile_name("path/to/credentials.json", scope) client = gspread.authorize(creds) sheet = client.open("Spreadsheet_Name").sheet1 numericise_ignore = ["ISO Number"] records = sheet.get_all_records(numericise_ignore=numericise_ignore) return records 3.The above function I have created ,which will first open the spreadsheet and then specify the columns that should be ignored for numeric conversion.

Then make the required changes in "parse_export_format" function , so that it uses new reading function

If you find any anomaly with my solution please tell me , we may discuss it here or in Slack .But I think this may solve the issue .

please don't apologise, i should have explained the problem better.

the spreadsheet_parsers file already parses the incoming spreadsheet with gspread, the problem is that afaik gspread is by default interpreting everything as a number if it can and as a string if the number conversion fails, look into the spreadsheet parsers file for where i already read the spreadsheet and try adjusting the parameters of gspread until iso reads correctly (you may need to also adjust the tests when this happens)

Mar 23 '25 15:03 northdpole

Ensuring Correct Data Types: By adding the numericise_ignore parameter, the function ensures that the "ISO Number" column is treated as a string, even if it contains numeric values. This prevents issues with numeric conversion and ensures that the data is read correctly.
Maintaining Functionality: The changes do not affect the overall functionality of the read_spreadsheet function. The function still reads the spreadsheet, processes the worksheets, and returns the data as a dictionary of documents.
Handling Known Issues: The numericise_ignore parameter is used as a workaround for a known issue in gspread (https://github.com/burnash/gspread/issues/1007). This ensures that the function handles the data correctly, even if the column names are in a different line.

I have added changes in spreadsheet_test file but in commented form so that you may take a look , I am in a doubt that weather this test case which I have written is correct or not .

Mar 24 '25 14:03 Pa04rth

Here in these new changes I have also reverted back the old ones ,which were not required

Mar 24 '25 14:03 Pa04rth

Added the test case , you may merge this now @northdpole

Mar 27 '25 09:03 Pa04rth