tfx icon indicating copy to clipboard operation
tfx copied to clipboard

Update SchemaGen Executor to natively handle SequenceExample

Open AlexanderLavelle opened this issue 3 years ago • 28 comments

While this PR does not address the need to specify payload_format in ExampleGen (or ImportExampleGen, etc), it does address the disconnect between ExampleGen, SchemaGen, and Transform by specifying tensor representations in SchemaGen. This should be a positive benefit to most users as the tensor representations will become a natural piece of the pipeline.

This addresses 5520, 4714, and, to a lesser extent, 5361.

AlexanderLavelle avatar Jan 31 '23 14:01 AlexanderLavelle

Thanks for your contribution! It looks beneficial for many users as well. We are checking if your change doesn't affect other users' existing pipelines or tests. Please be patient to be completely reviewed.

lego0901 avatar Feb 10 '23 01:02 lego0901

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Mar 12 '23 02:03 github-actions[bot]

I have been out of the country but am still interested in pursuing this commit.

AlexanderLavelle avatar Mar 13 '23 15:03 AlexanderLavelle

Thanks for your contribution! The reason we took more cautious approach about this is it changes the file payload, and any related bugs are not trivial to be caught by unit tests.

To check if it doesn't break any tests and any existing pipelines, I would (1) First approve this PR (2) Maybe, merge this PR into the master (3) Run some internal continuous tests including e2e pipeline runs (4, if problem appears) Rollback before the next releases, otherwise keep this change!

I appreciate again for your consideration.

lego0901 avatar Mar 14 '23 04:03 lego0901

@lego0901 - I have a potential recommendation for this. What if during the encoding everything was turned into a sequence example (where appropriate) anyway? This should not affect the majority of users as they wouldn't have features to encode into the feature_lists of tf.train.SequenceExample

AlexanderLavelle avatar Apr 19 '23 15:04 AlexanderLavelle

Hi @lego0901 Any update on this PR? Please. Thank you!

gbaned avatar Jul 04 '23 14:07 gbaned

@gbaned I have thought about this PR more and it can be improved even beyond here. Part of it is connecting other big query column types upstream, but fundamentally I don't see downstream negatives by putting everything as sequence examples. The changes will also need to be applied in example gen such that the output will be only sequence examples and not the option for one or another. The upstream modification is only regarding something like if isinstance(pyval, FeatureLists) ...

AlexanderLavelle avatar Jul 05 '23 00:07 AlexanderLavelle

Yes, this looks good to me, either. But the problem is, we are updating the testing environment (Ubuntu 16.04 -> 20.04) and it is causing some build failures.. I wanted to make sure that this newly introduced feature will not break any test breakage but cannot confirm it before making every test green. I will accept this PR as soon as possible if the tests are not failing.

Sorry and thank you.

lego0901 avatar Jul 05 '23 00:07 lego0901

Hi @lego0901 / @AlexanderLavelle Any update on this PR? Please. Thank you!

gbaned avatar Oct 27 '23 04:10 gbaned

@gbaned no update on my side -- I have used this and confirm it works when intending to use SequenceExamples in TFX downstream components (Transform, Trainer)

AlexanderLavelle avatar Oct 29 '23 15:10 AlexanderLavelle

Hi @lego0901 Any update on this PR? Please. Thank you!

gbaned avatar Nov 07 '23 05:11 gbaned

Hi, is there any update on this PR?

jsalkey avatar Nov 08 '23 13:11 jsalkey

Hi @lego0901 Any update on this PR? Please. Thank you!

gbaned avatar Nov 15 '23 05:11 gbaned

Hi @lego0901 Any update on this PR? Please. Thank you!

gbaned avatar Nov 27 '23 04:11 gbaned

Hi @lego0901 Any update on this PR? Please. Thank you!

gbaned avatar Dec 15 '23 05:12 gbaned

Hi @lego0901 Any update on this PR? Please. Thank you!

gbaned avatar Dec 27 '23 04:12 gbaned

Hi @lego0901 Any update on this PR? Please. Thank you!

gbaned avatar Jan 10 '24 05:01 gbaned

Hi @lego0901 Any update on this PR? Please. Thank you!

gbaned avatar Feb 23 '24 05:02 gbaned

Hi @lego0901 Any update on this PR? Please. Thank you!

gbaned avatar Apr 02 '24 04:04 gbaned

Hi @lego0901 Any update on this PR? Please. Thank you!

gbaned avatar Apr 19 '24 05:04 gbaned

I would have to explicitly mention that this is blocked because of the TF docker issue, so the internal integration test is broken for a while. I cannot confirm this PR unless it is resolved, because this should be merged after verifying that it does not break any internal tests.

lego0901 avatar Apr 19 '24 05:04 lego0901

I would have to explicitly mention that this is blocked because of the TF docker issue, so the internal integration test is broken for a while. I cannot confirm this PR unless it is resolved, because this should be merged after verifying that it does not break any internal tests.

Hi @lego0901 Thank you so much for the update.

gbaned avatar Apr 19 '24 05:04 gbaned

Hi @AlexanderLavelle, Could you possibly fetch the most recent code and run the tests? Please give an update on this PR.

keerthanakadiri avatar Nov 28 '24 10:11 keerthanakadiri

@keerthanakadiri Hello! I have brought in the new changes without conflicts. I am currently on PTO but can run tests in a week or early the following week

AlexanderLavelle avatar Dec 04 '24 06:12 AlexanderLavelle

@keerthanakadiri I realized I had made a typo regarding the tfx_bsl. I am running pytest locally since it's possible the github action times out.

AlexanderLavelle avatar Jan 07 '25 18:01 AlexanderLavelle

============================ short test summary info =============================
FAILED tfx/components/trainer/executor_test.py::ExecutorTest::testDo - ValueError: creating distributed tf.Variable with aggregation=MEAN and a non-...
FAILED tfx/components/trainer/executor_test.py::ExecutorTest::testDoWithHyperParameters - ValueError: creating distributed tf.Variable with aggregation=MEAN and a non-...
FAILED tfx/components/trainer/executor_test.py::ExecutorTest::testMultipleArtifacts - ValueError: creating distributed tf.Variable with aggregation=MEAN and a non-...
FAILED tfx/examples/chicago_taxi_pipeline/taxi_pipeline_native_keras_e2e_test.py::TaxiPipelineNativeKerasEndToEndTest::testTaxiPipelineNativeKeras - ValueError: creating distributed tf.Variable with aggregation=MEAN and a non-...
FAILED tfx/examples/imdb/imdb_pipeline_native_keras_e2e_test.py::ImdbPipelineNativeKerasEndToEndTest::testImdbPipelineNativeKeras - ValueError: creating distributed tf.Variable with aggregation=MEAN and a non-...
FAILED tfx/examples/mnist/mnist_pipeline_native_keras_e2e_test.py::MNISTPipelineNativeKerasEndToEndTest::testMNISTPipelineNativeKeras - ValueError: creating distributed tf.Variable with aggregation=MEAN and a non-...
FAILED tfx/orchestration/portable/docker_executor_operator_e2e_test.py::DockerComponentLauncherE2eTest::testDockerComponentLauncherInBeam - docker.errors.DockerException: Error while fetching server API version: ('Con...
= 7 failed, 3959 passed, 324 skipped, 45 xfailed, 124 warnings in 3631.42s (1:00:31) =

Are these related to the difference in the payload? I wouldn't think so. I have used this version of SchemaGen without issue previously.

This is with the latest fix. I tried to run pylint given the pylintrc but it didn't work -- what is the head directory linting method you typically use?

AlexanderLavelle avatar Jan 07 '25 20:01 AlexanderLavelle

Full error message: ERROR apache_beam.runners.common:bundle_processor.py:237 creating distributed tf.Variable with aggregation=MEAN and a non-floating dtype is not supported, please use a different aggregation or dtype [while running 'Run[Trainer.mnist]']

AlexanderLavelle avatar Jan 07 '25 20:01 AlexanderLavelle

Hi, TFX contributors,

Sorry for the late responses on the PRs in the TFX repository. The TFX project had been coupled with internal TFX implementations, but we recently decoupled this and updated the CI process to GitHub Actions. So, we can now easily merge external contributions into the main TFX branches.

If you are still interested in merging your PR to the main TFX branch, please update it with the latest version of the TFX repository and run the GitHub Actions tests. If the tests pass with the default version of the TFX libraries, TFX committers will approve and merge your PR. Thanks!

nikelite avatar Mar 26 '25 04:03 nikelite