graphics icon indicating copy to clipboard operation
graphics copied to clipboard

keras Model.fit pointnet v1

Open jackd opened this issue 5 years ago • 15 comments

  • add tf.keras.Model.fit scipt (equivalent to train, for demo purposes)
  • specifies axis of a squeeze in model that was only running successfully because of a bug in preprocessing

Test won't pass until #310 is accepted.

There's a lot of shared code between this and fit which could be hoisted into a shared file. Will wait for other PRs to get accepted and avoid the merge conflict nightmare...

jackd avatar May 09 '20 17:05 jackd

Can you clarify your second point? (the bug)

taiya avatar May 09 '20 17:05 taiya

Truncating points is along the batch dimension, not the points dimension. For some reason this changes the shape from (?, 2048, 3) to (32, 2034, 3) (which in the past would have give issues on the final step of training if there isn't a complete batch - though haven't hit that bug recently).

This results in a fully statically known shape prior to squeeze, so squeeze knows which axes it can and cannot remove. Without the bug above, the shape is (?, 1, F) - the squeeze doesn't know if it has to apply to the first dimension, tso the output shape is unknown (even the rank).

This and a few other small things addressed in https://github.com/tensorflow/graphics/pull/311

jackd avatar May 09 '20 17:05 jackd

ah yes, of course! Missed this one as testing with multiple point sizes was not possible. Now with the latest edits to the testing pipeline this would have been caught.

taiya avatar May 09 '20 20:05 taiya

If I'm renaming layers anyway, any chance I can address https://github.com/tensorflow/graphics/issues/314 e.g. PointNetConv2Layer -> PointnetConv2D?

jackd avatar May 10 '20 00:05 jackd

Sure

On Sat, May 9, 2020, 8:56 PM Dominic Jack [email protected] wrote:

If I'm renaming layers anyway, any chance I can address #314 https://github.com/tensorflow/graphics/issues/314 e.g. PointNetConv2Layer -> PointnetConv2D

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/graphics/pull/312#issuecomment-626256028, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCF4PPP3237C7PQ6A47L5LRQX3S3ANCNFSM4M44GPLQ .

taiya avatar May 10 '20 02:05 taiya

@ataiya this is a lot bigger than I originally planned it to be, but I've just refactored shared functionality between fit/train into helpers.py. This now supercedes https://github.com/tensorflow/graphics/pull/311. I'm happy to close 311, but I imagine it might get through the review process faster if you wanted to eliminate that bug ASAP.

Tests waiting https://github.com/tensorflow/graphics/pull/310.

jackd avatar May 10 '20 05:05 jackd

I'll have time to look at things tomorrow, finishing reviews today. Note helpers should NOT contain any project logic. It's basically project agnostic functionality that is useful whenever you have a custom training loop.

taiya avatar May 10 '20 16:05 taiya

Hi @jackd !

It seems that you made a lot of progress on this one; what is the current status?

Best.

julienvalentin avatar May 14 '20 13:05 julienvalentin

@julienvalentin once #310 is through I'll ensure tests are passing, then should be good for review.

@ataiya should I make a separate utils.py file for shared dataset/learning rate functionality shared between fit and train? They're currently in helpers.py, but they're a little project specific...

jackd avatar May 15 '20 11:05 jackd

#310 is in! :)

julienvalentin avatar May 20 '20 07:05 julienvalentin

@ataiya I'm happy to duplicate learning rate / preprocessing across examples. The only reason I'm tempted to keep at least data preprocessing separate is for reusability, e.g. it's nice to be able to say with confidence "we used exactly the same data preprocessing as pointnet" without having to copy/paste from examples. I agree helpers.py wouldn't be the appropriate place for that, but perhaps a small preprocessing.py file?

jackd avatar May 21 '20 01:05 jackd

So load("modelnet/pointnet", ...) would give you the variant with "exactly the same data preprocessing as pointnet". But perhaps, for now, a separate file is sufficient, something like modelnet_dataset.py or modelnet.py.

For longer term, shouldn't it be located somewhere in datasets/shapenet? For shapenet, we are using the "configs" to do this, e.g. to generate a point set version of the shapenet meshes (used by OccNet, DeepSDF, ...) After all dataset+augmentations can be considered as as (new) specific sub-version of a dataset.

What do you think?

taiya avatar May 21 '20 03:05 taiya

I like to have a clear separation between data stored on disk (from tfds) and preprocessing with tf.data, e.g. I don't think it's appropriate for jittering to be a part of the builder/config itself, but perhaps a preprocessing file alongside it. I guess it depends on whether or not you feel preprocessing is a property of the data or part of the model. I tend to feel its part of the model (though even then it's not clear cut. e.g. I feel the number of points is a property of the the data, while the jittering is a property of the model).

jackd avatar May 21 '20 06:05 jackd

I like to have a clear separation between data stored on disk (from tfds) and preprocessing with tf.data. e.g. I don't think it's appropriate for jittering to be a part of the builder/config itself, but perhaps a preprocessing file alongside it.

Yes, but TFDS has a way (or will soon) to store post-processed data on disk. (or have a cached version already stored, and download that)

I guess it depends on whether or not you feel preprocessing is a property of the data or part of the model. I tend to feel its part of the model (though even then it's not clear cut. e.g. I feel the number of points is a property of the the data, while the jittering is a property of the model).

Ok, go ahead and integrate the changes from the other PR for me to do another round of comments. Might be a bit slot due to the NeurIPS deadline.

taiya avatar May 21 '20 13:05 taiya

Sorry for the delay on this. I've removed learning rate / data pipelining logic from helpers and duplicated it across the two examples (fit.py and train.py).

There may be a slight performance boost by adding a Dataset.repeat at the beginning of the pipeline, and specifying steps_per_epoch in fit. This will result in slight bleeding of examples across epochs, but remove the need to fill a new shuffle-buffer from scratch each time. Reducing the buffer size would also reduce the scale of this problem.

The main reason this isn't currently implemented is tfds.testing.mock_data doesn't change info.splits["train"].num_examples, so steps_per_epoch can't be computed consistently across the real run and test run. This -may- be able to be resolved with upgraded tfds version - it uses tf.data.experimental.set_cardinality, so we can compute num_examples with tf.data.experimental.cardinality, but I'm not sure if the way in which it is set is robust to mock_data contexts.

I believe the current build failure is down to #469

jackd avatar Dec 27 '20 01:12 jackd