OpenMetadata
OpenMetadata copied to clipboard
[Hive | Profiling] The Hive Profiling workflow throws some errors due to how the queries are formatted
Affected module Ingestion Framework Profiler
Describe the bug While running a Profiler workflow on Hive I got thrown this error:
[SQL: WITH `test_rnd` AS
(SELECT `default`.`test`.`id` AS `id`, ABS(RAND()) * 100 %% %(param_1)s AS `random`
FROM `default`.`test` ),
`test_sample` AS
(SELECT `test_rnd`.`id` AS `id`, `test_rnd`.`random` AS `random`
FROM `test_rnd`
WHERE `test_rnd`.`random` <= %(random_1)s)
SELECT count(*) AS `uniqueCount`
FROM (SELECT count(`id`) AS `count_1`
FROM `test_sample` GROUP BY `id`
HAVING count(`id`) = %(count_2)s) AS `only_once`
LIMIT %(param_2)s]
[parameters: {'param_1': 100, 'random_1': 5.0, 'count_2': 1, 'param_2': 1}]
It looks like hive doesn't like the default.test.id column name.
The following query gives the same error on a SQL Client:
SELECT `default`.`test`.`id` from `default`.`test`;
But the following one works fine:
SELECT `test`.`id` from `default`.`test`;
To Reproduce
- Spin up Openmetadata from the 1.4 branch
- Spin up a Hive environment
docker run -d -p 9083:9083 --env SERVICE_NAME=metastore --name metastore-standalone apache/hive:3.1.3
docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hive4 apache/hive:3.1.3
- Create a Table with the following DDL
docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:10000/'
CREATE TABLE test (id int);
- Register and Run a Metadata ingestion
- Register and Run a Profiling Ingestion
- Check the Workflow and HiveServer2 Logs
Expected behavior No errors
Version:
- OS: N/A
- Python version: N/A
- OpenMetadata version: 1.4.0 branch
- OpenMetadata Ingestion package version: 1.4.0 branch
Additional context N/A