[SUPPORT] Connect to standalone hive metastore rather than JDBC
hudi version: 0.14.0
Our production environment don't have hive server but rather a standalone hive metastore backed by postgreSQL. I cannot find any documentations regarding how to sync to hive metastore in spark / spark sql.
@stream2000 apprepicate if you can provide some sample sql query that sync hudi table to standalone hms.
You can try the following spark SQL:
call hive_sync(table => 'a', metastore_uri => 'uri');
All the params are as follows:
private val PARAMETERS = Array[ProcedureParameter](
ProcedureParameter.required(0, "table", DataTypes.StringType),
ProcedureParameter.optional(1, "metastore_uri", DataTypes.StringType, ""),
ProcedureParameter.optional(2, "username", DataTypes.StringType, ""),
ProcedureParameter.optional(3, "password", DataTypes.StringType, ""),
ProcedureParameter.optional(4, "use_jdbc", DataTypes.StringType, ""),
ProcedureParameter.optional(5, "mode", DataTypes.StringType, ""),
ProcedureParameter.optional(6, "partition_fields", DataTypes.StringType, ""),
ProcedureParameter.optional(7, "partition_extractor_class", DataTypes.StringType, ""),
ProcedureParameter.optional(8, "strategy", DataTypes.StringType, ""),
ProcedureParameter.optional(9, "sync_incremental", DataTypes.StringType, "")
)
Keep in mind that this procedure requires you already have a table. If you need to create a table by spark, try to use HiveSyncTool directly. You can see the code in HiveSyncProcedure which illustrate how to construct a HiveSyncTool.
@qidian99 Let us know in case you face any issues while trying this. Feel free to close this issue if it worked. Thanks.
@qidian99 Closing this. Please reopen or create new one in case you face further issues. Thanks.