Changes in db.py for solving issue #546
Here ,to ensure that ISO numbers are treated as strings when fetched from the database and returned in API responses ,I have made changes to db.py and web_main.py keeping the resource fetching as dynamic .
Hey @Pa04rth thansk for the effort, I'm not sure this approach will work as the ISO numbers have already been stored in the database in the "wrong" format.
e.g.
37a59bb5-d416-4ebe-ae54-2f7608de0b05|ISO 27001|Acceptable use of information and other associated assets|||||Standard||5.1
that 5.1 should have been 5.10
I think the problem is with importing, if you look here you will see that the spreadsheet by default is interpreted as numbers . I think what you need to do is pass numericise_ignore to the spreadsheet reading piece and re-import
https://docs.gspread.org/en/latest/api/models/worksheet.html#gspread.worksheet.Worksheet.get_all_records
this will be a bigger piece of work however, the importing stuff is a pretty big PITA although it's pretty well tested
Hey @northdpole , extremely sorry that my this approach didn't work , but according to what you have told , I think we must make the following changes in the spreadsheet_parser file ,in order to change the interpretation of the spreadsheet -
-
Importing the gspread {library which interpretate the spreadsheet} and oauth2client , in order to be able to read the spreadsheet
-
Add a function to read the spreadsheet using the numericise_ignore parameter. [As per the website which you linked above] `def read_spreadsheet(file_path: str) -> List[Dict[str, Any]]: scope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/drive"] creds = ServiceAccountCredentials.from_json_keyfile_name("path/to/credentials.json", scope) client = gspread.authorize(creds) sheet = client.open("Spreadsheet_Name").sheet1
numericise_ignore = ["ISO Number"] records = sheet.get_all_records(numericise_ignore=numericise_ignore) return records` 3.The above function I have created ,which will first open the spreadsheet and then specify the columns that should be ignored for numeric conversion.
-
Then make the required changes in "parse_export_format" function , so that it uses new reading function
If you find any anomaly with my solution please tell me , we may discuss it here or in Slack .But I think this may solve the issue .
And the second way to do the same thing without using google authentication approach is using "Pandas Library" to read files instead of gspread .
By using the "dtype()" parameter , it will treat certain columns as strings .The following is the function which I have created ,which may give you an insight of how this approach will work -
def read_spreadsheet(file_path: str) -> List[Dict[str, Any]]: dtype = { "ISO Number": str, } df = pd.read_csv(file_path, dtype=dtype) records = df.to_dict(orient='records') return records
Hey @northdpole , extremely sorry that my this approach didn't work , but according to what you have told , I think we must make the following changes in the spreadsheet_parser file ,in order to change the interpretation of the spreadsheet -
- Importing the gspread {library which interpretate the spreadsheet} and oauth2client , in order to be able to read the spreadsheet
- Add a function to read the spreadsheet using the numericise_ignore parameter. [As per the website which you linked above]
def read_spreadsheet(file_path: str) -> List[Dict[str, Any]]: scope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/drive"] creds = ServiceAccountCredentials.from_json_keyfile_name("path/to/credentials.json", scope) client = gspread.authorize(creds) sheet = client.open("Spreadsheet_Name").sheet1 numericise_ignore = ["ISO Number"] records = sheet.get_all_records(numericise_ignore=numericise_ignore) return records3.The above function I have created ,which will first open the spreadsheet and then specify the columns that should be ignored for numeric conversion.- Then make the required changes in "parse_export_format" function , so that it uses new reading function
If you find any anomaly with my solution please tell me , we may discuss it here or in Slack .But I think this may solve the issue .
please don't apologise, i should have explained the problem better.
the spreadsheet_parsers file already parses the incoming spreadsheet with gspread, the problem is that afaik gspread is by default interpreting everything as a number if it can and as a string if the number conversion fails, look into the spreadsheet parsers file for where i already read the spreadsheet and try adjusting the parameters of gspread until iso reads correctly (you may need to also adjust the tests when this happens)
- Ensuring Correct Data Types: By adding the numericise_ignore parameter, the function ensures that the "ISO Number" column is treated as a string, even if it contains numeric values. This prevents issues with numeric conversion and ensures that the data is read correctly.
- Maintaining Functionality: The changes do not affect the overall functionality of the read_spreadsheet function. The function still reads the spreadsheet, processes the worksheets, and returns the data as a dictionary of documents.
- Handling Known Issues: The numericise_ignore parameter is used as a workaround for a known issue in gspread (https://github.com/burnash/gspread/issues/1007). This ensures that the function handles the data correctly, even if the column names are in a different line.
I have added changes in spreadsheet_test file but in commented form so that you may take a look , I am in a doubt that weather this test case which I have written is correct or not .
Here in these new changes I have also reverted back the old ones ,which were not required
Added the test case , you may merge this now @northdpole