Add support for Reading GIF files
We have been using this library over at GIPHY and love it. We had to adapt it to work with GIFs, and we thought we'd share some of the changes with the community. It includes:
- [x]
readGifsfunction that reads a directory of GIFs and splits them out into individual frames/images that can be fed intoInceptionV3and other models. - [x] supporting unit tests mimicking tests in
TestReadImages - [x] updates to
.gitignore
Codecov Report
Merging #46 into master will increase coverage by
0.11%. The diff coverage is89.47%.
@@ Coverage Diff @@
## master #46 +/- ##
==========================================
+ Coverage 85.06% 85.18% +0.11%
==========================================
Files 19 19
Lines 991 1026 +35
Branches 5 5
==========================================
+ Hits 843 874 +31
- Misses 148 152 +4
| Impacted Files | Coverage Δ | |
|---|---|---|
| python/sparkdl/image/imageIO.py | 89.92% <89.47%> (-0.51%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update a5a6e07...f86ac03. Read the comment docs.
Hello @seanpquig thank you very much for this contribution, we will be happy to add support for GIFs to Deep Learning Pipelines.
I have some design questions about the new schema added for GIF which we should be able to resolve without too much change on your side.
To give some context, we are in the process of consolidating different image processing solutions around the image schema described in python/sparkdl/image/imageIO.py and I believe that we can add some extra fields to the image schema to handle gifs, without having to create a separate schema. Before offering some changes, I would like to understand a bit more some of the use case: from looking at gifSchema, it looks like you do not use the fact that frames in a GIF are ordered and you simply store them independently in a dataframe? Do you foresee such a use case of keeping all the frames together?
Hey @thunterdb. I tried to take a minimal and flexible approach and have the gif schema be per frame and identical to the image schema with an additional frameNum field to keep track of ordering. This has allowed us to write our own custom functions and processing to do things like frame sampling, averaging model predictions across frames, and investigating ordering effects.
I think combining in a common schema could be great as long as it doesn't sacrifice any information and the ability to access individual frames. Perhaps images could be loosely modeled as special case of a GIF that has a single frame. Your thoughts?