spark-deep-learning icon indicating copy to clipboard operation
spark-deep-learning copied to clipboard

Add support for Reading GIF files

Open seanpquig opened this issue 8 years ago • 3 comments

We have been using this library over at GIPHY and love it. We had to adapt it to work with GIFs, and we thought we'd share some of the changes with the community. It includes:

  • [x] readGifs function that reads a directory of GIFs and splits them out into individual frames/images that can be fed into InceptionV3 and other models.
  • [x] supporting unit tests mimicking tests in TestReadImages
  • [x] updates to .gitignore

seanpquig avatar Aug 26 '17 22:08 seanpquig

Codecov Report

Merging #46 into master will increase coverage by 0.11%. The diff coverage is 89.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #46      +/-   ##
==========================================
+ Coverage   85.06%   85.18%   +0.11%     
==========================================
  Files          19       19              
  Lines         991     1026      +35     
  Branches        5        5              
==========================================
+ Hits          843      874      +31     
- Misses        148      152       +4
Impacted Files Coverage Δ
python/sparkdl/image/imageIO.py 89.92% <89.47%> (-0.51%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a5a6e07...f86ac03. Read the comment docs.

codecov-io avatar Aug 27 '17 18:08 codecov-io

Hello @seanpquig thank you very much for this contribution, we will be happy to add support for GIFs to Deep Learning Pipelines.

I have some design questions about the new schema added for GIF which we should be able to resolve without too much change on your side.

To give some context, we are in the process of consolidating different image processing solutions around the image schema described in python/sparkdl/image/imageIO.py and I believe that we can add some extra fields to the image schema to handle gifs, without having to create a separate schema. Before offering some changes, I would like to understand a bit more some of the use case: from looking at gifSchema, it looks like you do not use the fact that frames in a GIF are ordered and you simply store them independently in a dataframe? Do you foresee such a use case of keeping all the frames together?

thunterdb avatar Aug 29 '17 16:08 thunterdb

Hey @thunterdb. I tried to take a minimal and flexible approach and have the gif schema be per frame and identical to the image schema with an additional frameNum field to keep track of ordering. This has allowed us to write our own custom functions and processing to do things like frame sampling, averaging model predictions across frames, and investigating ordering effects.

I think combining in a common schema could be great as long as it doesn't sacrifice any information and the ability to access individual frames. Perhaps images could be loosely modeled as special case of a GIF that has a single frame. Your thoughts?

seanpquig avatar Sep 04 '17 22:09 seanpquig