FeatureExtraction icon indicating copy to clipboard operation
FeatureExtraction copied to clipboard

Issue running getDbCovariateData on snowflake

Open mdlavallee92 opened this issue 1 year ago • 6 comments

Trying to run FeatureExtraction on snowflake which requires a temp emulation schema. Currently the function only allows for oracleTempSchema, which should be updated to tempEmulationSchema so it is not db exclusive. This is what I used below

#run FE
cov <- FeatureExtraction::getDbCovariateData( 
  connection = con,
  oracleTempSchema = tempEmulationSchema,
  cdmDatabaseSchema = cdmDatabaseSchema,
  cohortTable = cohortTable,
  cohortDatabaseSchema = cohortDatabaseSchema,
  cohortId = targetCohortIds,
  covariateSettings = covSettings,
  aggregated = TRUE
)

for which I get this error:

image

Not sure if others have got this issue or if I am setting this up wrong.

mdlavallee92 avatar Feb 28 '24 15:02 mdlavallee92

@mdlavallee92 - I don't have access to Snowflake so this will be hard to troubleshoot.

That said, we can add a proper tempEmulationSchema parameter and deprecate the oracleTempSchema so that this is handled consistently in the package.

Note of how that might look from DatabaseConnector docs: https://ohdsi.github.io/DatabaseConnector/reference/renderTranslateExecuteSql.html

anthonysena avatar Mar 01 '24 14:03 anthonysena

Linking to #159 since we may want to do this as a major, breaking change.

anthonysena avatar Mar 01 '24 14:03 anthonysena

@mdlavallee92 since version 3.7.0 you can pass the tempEmulationSchema in the getDbCovariateData function. Can you try if this works for you on Snowflake?

ginberg avatar Sep 27 '24 12:09 ginberg

There is still an issue in 3.7.0. While tempEmulationSchema is passed in here, I'm not sure it's making its way all the way down into the java query builders.

> covariateData2 <- getDbCovariateData(
+   connectionDetails = connectionDetails,
+   cdmDatabaseSchema = executionSettings$cdmDatabaseSchema,
+   cohortDatabaseSchema = executionSettings$workDatabaseSchema,
+   cohortTable = executionSettings$cohortTable,
+   cohortIds = c(99994, 99995),
+   covariateSettings = covariateSettings,
+   aggregated = TRUE,
+   tempEmulationSchema = executionSettings$workDatabaseSchema
+ )
Connecting using Snowflake driver
Constructing features on server
  |================================================================================================================================================================| 100%
Executing SQL took 1.15 mins
Fetching data from server
Error in `.createErrorReport()`:
! Error executing SQL:
net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
Object 'OXENIM4HCOV_1' does not exist or not authorized.
An error report has been created at  /home/sadowskk/o428/errorReportSql.txt
Run `rlang::last_trace()` to see where the error occurred.

And here is the error report - it got quite far along, so I think that the table was successfully created in the tempEmulationSchema, but then a later query is looking for it in the public schema.

DBMS:
snowflake

Error:
net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
Object 'OXENIM4HCOV_1' does not exist or not authorized.

SQL:
SELECT cohort_definition_id, covariate_id, count_value, min_value, max_value, average_value, standard_deviation, median_value, p10_value, p25_value, p75_value, p90_value
FROM (
SELECT cohort_definition_id, covariate_id, count_value, min_value, max_value, average_value, standard_deviation, median_value, p10_value, p25_value, p75_value, p90_value FROM oxenim4hcov_1 UNION ALL
SELECT cohort_definition_id, covariate_id, count_value, min_value, max_value, average_value, standard_deviation, median_value, p10_value, p25_value, p75_value, p90_value FROM oxenim4hcov_2
) all_covariates;

katy-sadowski avatar Oct 11 '24 16:10 katy-sadowski

Hey @ginberg and @katy-sadowski I believe the bug is on a line in the getDbDefaultCovariateData function when trying to extract continuous features. See here.

When running FE v3.7.1 on snowflake it works with the tempEmulationSchema setting if my covSettings are binary. Believe Katy has a continuous setting for covariates leading to the error.

mdlavallee92 avatar Oct 11 '24 17:10 mdlavallee92

hi @mdlavallee92 you are right we should use tempEmulationSchema here, thanks for the catch! @katy-sadowski I have created a branch with the updated code, could you try it out?

ginberg avatar Oct 11 '24 17:10 ginberg