Configuration YAML for changing table description text not working in my case
As the title suggests, I am not able to get a changed table description text when I perform a translation of a HIVEQL ddl file. The original description text persists in the translated version. I do not have the same problem when I try to use other transformation config.yaml files, such as one where I set external: false to avoid CREATE TABLE from becoming CREATE EXTERNAL TABLE. Details on how to reproduce below.
Environment details
- OS type and version: MacOS Monterey 12.6
- Python version: 3.9
- pip version: 22.0.4 (I use poetry v1.4.2 if that makes a difference)
-
google-cloud-bigquery-migrationversion: 0.11.0
Steps to reproduce
- Create a "source"
.ddlfile with the following contents
CREATE TABLE `example.table`(
`col1` bigint,
`col2` bigint,
`col3` boolean,
`col4` timestamp,
`col5` bigint,
`col6` bigint)
COMMENT 'email: [email protected]'
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://path/to/my/files'
TBLPROPERTIES (
'_artifact_placeholder'='{{ artifact:holder }}',
'bucketing_version'='2',
'transient_lastDdlTime'='1682664243')
- Add a file named
change-description.config.yamlwith the following contents
type: object_rewriter
relation:
- match:
db: my-gcp-project
schema: example
description:
text: "example"
-
Move these files to a bucket on GCP inside a 'folder' named
folder_containing_ddl_and_config_file -
Execute a migration workflow
MY_GCP_PROJECT = 'my-gcp-project'
BASE_GCS_BUCKET = 'my-bucket-name'
gcs_input_full_path = f"gs://{BASE_GCS_BUCKET}/folder_containing_ddl_and_config_file"
gcs_output_full_path = f"gs://{BASE_GCS_BUCKET}/folder_containing_ddl_and_config_file_translated"
parent = f"projects/{MY_GCP_PROJECT}/locations/eu"
client = bigquery_migration_v2.MigrationServiceClient()
source_dialect = bigquery_migration_v2.Dialect()
source_dialect.hiveql_dialect = bigquery_migration_v2.HiveQLDialect()
target_dialect = bigquery_migration_v2.Dialect()
target_dialect.bigquery_dialect = bigquery_migration_v2.BigQueryDialect()
translation_env = bigquery_migration_v2.SourceEnv(
default_database=MY_GCP_PROJECT, schema_search_path='example'
)
translation_config = bigquery_migration_v2.TranslationConfigDetails(
gcs_source_path=gcs_input_full_path,
gcs_target_path=gcs_output_full_path,
source_dialect=source_dialect,
target_dialect=target_dialect,
source_env=translation_env,
)
migration_task = bigquery_migration_v2.MigrationTask(
type_="Translation_HiveQL2BQ", translation_config_details=translation_config
)
workflow = bigquery_migration_v2.MigrationWorkflow(display_name=job_name)
workflow.tasks["translation"] = migration_task
request = bigquery_migration_v2.CreateMigrationWorkflowRequest(
parent=parent, migration_workflow=workflow
)
response = client.create_migration_workflow(request=request)
- Check the resulting file. In my case it looks like below, where the description text is unchanged from what it was in the original file.
-- Translation time: 2023-05-10T09:28:40.261922Z
-- Translation job ID: 28288a12-c0ba-1bd2-8c2e-d640b91c3b5b
-- Source: my-gcp-bucket/folder_containing_ddl_and_config_file/step_1_ddl_file.ddl
-- Translated from: Hive
-- Translated to: BigQuery
CREATE EXTERNAL TABLE `my-gcp-project`.example.table
(
col1 INT64,
col2 INT64,
col3 BOOL,
col4 DATETIME,
col5 INT64,
col6 INT64
)
OPTIONS(
description='email: [email protected]',
format='ORC',
uris=[
'hdfs://path/to/my/files'
]
);
Thanks!
I'm going to transfer this issue to the google-cloud-python repository as we are preparing to move the code in this repository to google-cloud-python in the next 1-2 weeks.
Hi, @steve-ahlswede . Were you able to resolve this issue?
I'm closing this issue since we haven't gotten a response. If this is still happening, go through the steps in our SUPPORT page to ensure you report the issue in the place that will give you the promptest response.