activityinfo-R icon indicating copy to clipboard operation
activityinfo-R copied to clipboard

expose more meta-info for forms/databases (like # of total records in a form/db)

Open Ryo-N7 opened this issue 3 years ago • 8 comments

EDIT: upon discussion with Alex + Nic, this will be more of a focus in January esp. as this is something ActivityInfo needs to do on their back-end of the API rather than any implementation on the R side of things (at least until the data is exposed to the API)

Ryo-N7 avatar Dec 06 '22 09:12 Ryo-N7

Expose form metadata for account-level users:

  • List of forms - database tree
  • For each form:
    • Number of records (MUST HAVE)
    • Last update time (MUST HAVE)
    • Number of users who have access to this form
    • Sensitivity classification of data
    • Visibility

akbertram avatar Jan 16 '23 10:01 akbertram

A workaround I found for now while implementing the lazy df is to request one row containing only the record id. This gives the total number of records. If we sort it by update time, then we could in theory get the last updated time too. I've exposed the totalRows metadata in queryTable as well as a few more things as part of the implementation of getRecords() (sorting, limiting, offset).

nickdickinson avatar Mar 08 '23 08:03 nickdickinson

is there also any way to download the form-schema of every form in a database without having to loop over each form?

as currently for the +20 projects AV has, this process can take over 10 minutes to just grab all that meta-data before we even get to the processing + visualization bits of the code

Ryo-N7 avatar Mar 14 '23 14:03 Ryo-N7

I think this is more for @akbertram but as I do not know if this is possible server side.

@Ryo-N7 perhaps it would be an idea to split this over a few different R instances with the foreach package? I've successfully done this for longer GIS processing workflows. I don't know if there are certain rate limits or other considerations (like exponential back offs) that would need to be implemented.

nickdickinson avatar Mar 17 '23 14:03 nickdickinson

Note: the lazy dataframe reports the number of records and does retrieve the _lastEditTime column. The user can get the last edit time as follows: getRecords(formId) %>% select(_lastEditTime) %>% arrange(desc(_lastEditTime)) %>% head(1) %>% collect() %>% pull()

Should we add this to the metadata display (with a small additional call to the server every time we display a lazy dataframe)? Or wrap in a function? Otherwise, can we close this now? @jamiewhths

Is there a more efficient way to do this? Does the server provide this metadata for the form?

nickdickinson avatar Oct 16 '24 10:10 nickdickinson

@jamiewhths Can the server provide metadata on the following items mentioned by @akbertram :

  • Number of users who have access to this form
  • Sensitivity classification of data

The first can be derived but that is a bit complicated as it means reimplementing effectively the roles system and probably can better be served by the server. I am not sure what the second is...

nickdickinson avatar Nov 07 '24 22:11 nickdickinson

@nickdickinson we can retrieve the user grants on a resource via https://www.activityinfo.org/support/docs/api/reference/getDatabaseUserGrantsOnResource.html. To determine the number of users who have access, you can count the number of objects on the returned array, as the server has computed inherited grants etc.

jamiewhths avatar Nov 12 '24 09:11 jamiewhths

On sensitivity, this was probably an early idea that we had to mark datasets with data classification levels. You can disregard for now as these are not implemented yet.

jamiewhths avatar Nov 12 '24 09:11 jamiewhths