OpenCRE icon indicating copy to clipboard operation
OpenCRE copied to clipboard

Changes in db.py for solving issue #546

Open Pa04rth opened this issue 10 months ago • 7 comments

Here ,to ensure that ISO numbers are treated as strings when fetched from the database and returned in API responses ,I have made changes to db.py and web_main.py keeping the resource fetching as dynamic .

Pa04rth avatar Mar 19 '25 04:03 Pa04rth

Hey @Pa04rth thansk for the effort, I'm not sure this approach will work as the ISO numbers have already been stored in the database in the "wrong" format.

e.g.

37a59bb5-d416-4ebe-ae54-2f7608de0b05|ISO 27001|Acceptable use of information and other associated assets|||||Standard||5.1  

that 5.1 should have been 5.10

I think the problem is with importing, if you look here you will see that the spreadsheet by default is interpreted as numbers . I think what you need to do is pass numericise_ignore to the spreadsheet reading piece and re-import

https://docs.gspread.org/en/latest/api/models/worksheet.html#gspread.worksheet.Worksheet.get_all_records

this will be a bigger piece of work however, the importing stuff is a pretty big PITA although it's pretty well tested

northdpole avatar Mar 19 '25 09:03 northdpole

Hey @northdpole , extremely sorry that my this approach didn't work , but according to what you have told , I think we must make the following changes in the spreadsheet_parser file ,in order to change the interpretation of the spreadsheet -

  1. Importing the gspread {library which interpretate the spreadsheet} and oauth2client , in order to be able to read the spreadsheet

  2. Add a function to read the spreadsheet using the numericise_ignore parameter. [As per the website which you linked above] `def read_spreadsheet(file_path: str) -> List[Dict[str, Any]]: scope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/drive"] creds = ServiceAccountCredentials.from_json_keyfile_name("path/to/credentials.json", scope) client = gspread.authorize(creds) sheet = client.open("Spreadsheet_Name").sheet1

    numericise_ignore = ["ISO Number"] records = sheet.get_all_records(numericise_ignore=numericise_ignore) return records` 3.The above function I have created ,which will first open the spreadsheet and then specify the columns that should be ignored for numeric conversion.

  3. Then make the required changes in "parse_export_format" function , so that it uses new reading function

If you find any anomaly with my solution please tell me , we may discuss it here or in Slack .But I think this may solve the issue .

Pa04rth avatar Mar 23 '25 06:03 Pa04rth

And the second way to do the same thing without using google authentication approach is using "Pandas Library" to read files instead of gspread . By using the "dtype()" parameter , it will treat certain columns as strings .The following is the function which I have created ,which may give you an insight of how this approach will work - def read_spreadsheet(file_path: str) -> List[Dict[str, Any]]: dtype = { "ISO Number": str, } df = pd.read_csv(file_path, dtype=dtype) records = df.to_dict(orient='records') return records

Pa04rth avatar Mar 23 '25 08:03 Pa04rth

Hey @northdpole , extremely sorry that my this approach didn't work , but according to what you have told , I think we must make the following changes in the spreadsheet_parser file ,in order to change the interpretation of the spreadsheet -

  1. Importing the gspread {library which interpretate the spreadsheet} and oauth2client , in order to be able to read the spreadsheet
  2. Add a function to read the spreadsheet using the numericise_ignore parameter. [As per the website which you linked above] def read_spreadsheet(file_path: str) -> List[Dict[str, Any]]: scope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/drive"] creds = ServiceAccountCredentials.from_json_keyfile_name("path/to/credentials.json", scope) client = gspread.authorize(creds) sheet = client.open("Spreadsheet_Name").sheet1 numericise_ignore = ["ISO Number"] records = sheet.get_all_records(numericise_ignore=numericise_ignore) return records 3.The above function I have created ,which will first open the spreadsheet and then specify the columns that should be ignored for numeric conversion.
  3. Then make the required changes in "parse_export_format" function , so that it uses new reading function

If you find any anomaly with my solution please tell me , we may discuss it here or in Slack .But I think this may solve the issue .

please don't apologise, i should have explained the problem better.

the spreadsheet_parsers file already parses the incoming spreadsheet with gspread, the problem is that afaik gspread is by default interpreting everything as a number if it can and as a string if the number conversion fails, look into the spreadsheet parsers file for where i already read the spreadsheet and try adjusting the parameters of gspread until iso reads correctly (you may need to also adjust the tests when this happens)

northdpole avatar Mar 23 '25 15:03 northdpole

  • Ensuring Correct Data Types: By adding the numericise_ignore parameter, the function ensures that the "ISO Number" column is treated as a string, even if it contains numeric values. This prevents issues with numeric conversion and ensures that the data is read correctly.
  • Maintaining Functionality: The changes do not affect the overall functionality of the read_spreadsheet function. The function still reads the spreadsheet, processes the worksheets, and returns the data as a dictionary of documents.
  • Handling Known Issues: The numericise_ignore parameter is used as a workaround for a known issue in gspread (https://github.com/burnash/gspread/issues/1007). This ensures that the function handles the data correctly, even if the column names are in a different line.

I have added changes in spreadsheet_test file but in commented form so that you may take a look , I am in a doubt that weather this test case which I have written is correct or not .

Pa04rth avatar Mar 24 '25 14:03 Pa04rth

Here in these new changes I have also reverted back the old ones ,which were not required

Pa04rth avatar Mar 24 '25 14:03 Pa04rth

Added the test case , you may merge this now @northdpole

Pa04rth avatar Mar 27 '25 09:03 Pa04rth