add `im_from_matlab` and `im_to_matlab`
This provides a more convenient MATLAB interface compared to the composition of PermutedDimsArray, colorview, etc (CRef #178)
Although our naming guidelines recommend not adding im_ prefix, I feel from_matlab is somehow too generic a name, especially when the input is a raw numerical array. For this reason I use the long verbose name im_from_matlab, users could add their own alias if they want, e.g., const im2jl = im_from_matlab
closed #170
Codecov Report
Merging #179 (251fb2d) into master (9e64fec) will decrease coverage by
1.20%. The diff coverage is85.33%.
@@ Coverage Diff @@
## master #179 +/- ##
==========================================
- Coverage 62.60% 61.39% -1.21%
==========================================
Files 10 11 +1
Lines 583 658 +75
==========================================
+ Hits 365 404 +39
- Misses 218 254 +36
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/ImageCore.jl | 72.41% <ø> (ø) |
|
| src/matlab.jl | 85.33% <85.33%> (ø) |
|
| src/convert_reinterpret.jl | 26.66% <0.00%> (-33.34%) |
:arrow_down: |
| src/show.jl | 50.00% <0.00%> (-20.84%) |
:arrow_down: |
| src/colorchannels.jl | 47.18% <0.00%> (-3.53%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 9e64fec...251fb2d. Read the comment docs.
TODO: support Matlab indexed image format
- [x]
im_from_matlab(index, values) - [x]
im_to_matlab(::IndirectArray)
TODO:
- [x]
im_to_matlab(UInt8, n0f8_data) - [x]
im_to_matlab(img)with offseted image.
@timholy I notice you've self-assigned a review. This PR is basically ready for review, except that I still need to 1) revisit and improve the docstring, and 2) rebase the commits.
In a summary, this PR:
- provides
im_to_matlabandim_from_matlabto support lazy conversion from/to matlab's layout - introduce
StructArrayto wrap the struct of array layout - introduce
IndirectArrayto represent the indexed image
The loading time, however, increases from 0.58 seconds to 0.98 seconds even in the very decent i9-12900K CPU, among the time StructArray contributes 0.32 seconds. This is not ideal, what's your thought?
This PR originates from the need to provide API compatibility to MATLAB's toolbox. im_to_matlab and im_from_matlab are something so basic that I think worth living in ImageCore.
As far as I know, forking the API and layout is no longer an issue given that Google has won the java API lawsuit. Thus I mimic the MATLAB's image layout convention:
When converting from MATLAB layout, because we don't have colorspace information, we have to guess from the input:
- RGB image is a 3-dimensional numerical array with the color channel in the last dimension
- Gray image is a 2-dimensional numerical array
- other colorspace needs explicit conversion of type annotation
when converting to MATLAB layout:
- similar to the above except that it supports arbitrary dimension...
- another colorspace will be converted to RGB first
- forcing 1-based indexing. JuliaImages supports arbitrary indexing offsets but MATLAB layout typically uses 1-based indexing. Again, this is because I want to make this MATLAB-tailored tool as convenient as possible.
As an example, here's the benchmark between various versions of the trivial rgb2gray implementation on different sizes:
- direct (AoS):
Gray.(img)whereimg = rand(RGB{Float64}, sz) - direct (SoA):
@. @views 0.2989*I[:,:,1]+0.5870*I[:,:,2]+0.1140*I[:,:,3] - indirect (SoA):
im_to_matlab(of_eltype(Gray, im_from_matlab(I))) - MATLAB (SoA):
rgb2gray
Conclusions:
- It's interesting to see that "indirect (SoA)" method is sometimes even faster than the "direct (AoS)" method.
- The performance overhead of
im_to_matlab/im_from_matlabis relatively small. - Except for the small image case, we're better than MATLAB.
Size (256, 256)
| version | time |
|---|---|
| direct (AoS) | 32.042 μs |
| direct (SoA) | 33.386 μs |
| indirect (SoA) | 38.963 μs |
| MATLAB (SoA) | 23.170 μs |
Size (512, 512)
| version | time |
|---|---|
| direct (AoS) | 143.831 μs |
| direct (SoA) | 151.395 μs |
| indirect (SoA) | 155.726 μs |
| MATLAB (SoA) | 546.576 μs |
Size (1024, 1024)
| version | time |
|---|---|
| direct (AoS) | 982.269 μs |
| direct (SoA) | 776.658 μs |
| indirect (SoA) | 921.979 μs |
| MATLAB (SoA) | 1320 μs |
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: 12th Gen Intel(R) Core(TM) i9-12900K
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, goldmont)
scripts
using ImageCore, MappedArrays, BenchmarkTools
rgb2gray(I) = im_to_matlab(of_eltype(Gray, im_from_matlab(I)))
function rgb2gray_direct(I)
@. @views 0.2989 * I[:, :, 1] + 0.5870 * I[:, :, 2] + 0.1140 * I[:, :, 3]
end
# size 256×256
# MATLAB baseline: 23.170 μs
I = rand(256, 256, 3)
img = collect(im_from_matlab(I))
@btime Gray.($img); # 32.042 μs (2 allocations: 512.05 KiB)
@btime collect(rgb2gray($I)); # 38.963 μs (13 allocations: 512.83 KiB)
@btime rgb2gray_direct($I); # 33.386 μs (2 allocations: 512.05 KiB)
# size 512×512
# MATLAB baseline: 546.576 μs
I = rand(512, 512, 3)
img = collect(im_from_matlab(I))
@btime Gray.($img); # 143.831 μs (2 allocations: 2.00 MiB)
@btime collect(rgb2gray($I)); # 155.726 μs (13 allocations: 2.00 MiB)
@btime rgb2gray_direct($I); # 151.395 μs (2 allocations: 2.00 MiB)
# size 1024×1024
# MATLAB baseline: 1320 μs
I = rand(1024, 1024, 3)
img = collect(im_from_matlab(I))
@btime Gray.($img); # 982.269 μs (2 allocations: 8.00 MiB)
@btime collect(rgb2gray($I)); # 921.979 μs (13 allocations: 8.00 MiB)
@btime rgb2gray_direct($I); # 776.658 μs (2 allocations: 8.00 MiB)
x = rand(256, 256, 3);
f = @() loop100(x);
timeit(f) * 1e4
x = rand(512, 512, 3);
f = @() loop100(x);
timeit(f) * 1e4
x = rand(1024, 1024, 3);
f = @() loop100(x);
timeit(f) * 1e4
function loop100(x)
for i = 1:100
rgb2gray(x);
end
end
The loading time, however, increases from 0.58 seconds to 0.98 seconds even in the very decent i9-12900K CPU, among the time StructArray contributes 0.32 seconds. This is not ideal, what's your thought?
I'm willing to pay this price. This seems like functionality that's too important to omit. We may find some load-time optimizations later, but the bottom line is that I would only balk if this made it something like 5s of load time rather than a "mere" doubling.
This PR needs a rework in a new PR, but here are a few more edge cases that needs to update:
- [ ]
im_to_matlab(img_n0f8)should output array of eltypeUInt8: currently it outputsN0f8but it would make it harder to keep consistant with the MATLAB's convention - [ ]
im_to_matlab(im_to_matlab(img))currently doesn't work - [ ] indexed image: workaround the 1-based index or 0-based index difference
- [ ] even though there is no grayscale indexed image, it's convenient to support
im_from_matlab(index, values::AbstractVector)case.
I have read through the StructArrays docs, PR changes and review comments and have a basic idea of what we are generally trying to do, should I continue on this here only or go further in another PR?
Naming: my preferences are to call this something like HWC{RGB}(rawimg) (for "height-width-color"). We could also have WHC, CHW, CWH, HWDC (D = "depth"), etc.
Naming: my preferences are to call this something like
HWC{RGB}(rawimg)(for "height-width-color"). We could also haveWHC,CHW,CWH,HWDC(D = "depth"), etc.
There's also "N", as in:
- WHCN (column-major) - also known as NCHW or "channels first" with row-major nomenclature
- CWHN (column-major) - also known as NHWC or "channels last" with row-major nomenclature Cf. https://discourse.julialang.org/t/what-is-the-fastest-dimension-ordering-for-convolutions/46182?u=stemann Cf. https://discourse.julialang.org/t/using-real-nchw-order-when-using-cudnn-jl/100842/2
I guess N could also be known as T (for time)?
In relation to the discussions regarding ImageAnnotations on Slack etc., - and handling the memory format of an input image, in relation to the expected memory format of the input for some neural network: I was thinking along the lines of using AxisArrays, or just tuples of symbols, to communicate the meaning of the axes.
Could this be used here instead of having separate HWC methods etc.? I guess the reason for separate methods is being easier on the compiler (not using a lot of Val{Symbol}'s)?
Context (copied from https://github.com/IHPSystems/ONNXRuntimeImageAnnotators.jl/issues/1):
I had the following on my mind (just for the CWH case - disregarding CWHN for now):
annotate(image::AbstractArray{<:Colorant,2}, annotator)could assume that the standard memory format of image is used and therefore dispatch to
annotate(image::AxisArray{<:AbstractArray{<:Colorant,2}, :w, :h}, annotator)and then among the model metadata (in
annotator) there could be the information that the neural network expects e.g.(:c, :w, :h, :n). Using strictly column-major notation. In this specific case the input, with format(:c, :w, :h)(afterchannelview), would not require conversion to match(:c, :w, :h, :n).Examples of expected input for ONNX models for vision
Naming: my preferences are to call this something like
HWC{RGB}(rawimg)(for "height-width-color"). We could also haveWHC,CHW,CWH,HWDC(D = "depth"), etc.
It's nice that this sort of naming makes the expected memory format explicit - so a user doesn't try to call a magic interpret function that ends up making the wrong call regarding the dimension order in an ambiguous case.
But I'm also slightly worried about ambiguities... I guess "D"/depth was meant as the depth component of e.g. an RGB-D camera?
Types of raw images I can think of - with a focus on raw camera output (using column-major notation, and calling the first spatial dimension W): 2D:
- Monochrome (Gray) image (area-scan or line-scan): WH
- Bayer-mask (area-scan) colour image (prior to conversion to RGB): WH
- Bi-linear or tri-linear colour image for a single line from a colour line scan camera (prior to conversion to RGB): WC (with C=2 or C=3)
- Hyperspectral image from a line-scan 2D sensor outputting Width x Channel/Band/Wavelength for each line: WC
3D:
- Sequence/batch of monochrome (gray) images or Bayer-mask colour images: WHN
- RGB image: CWH
- Multiple lines from a colour line scan camera (prior to conversion to RGB): WCH (with C=2 or C=3)
- Multiple lines from a hyperspectral line scan camera: WCH
4D:
- Sequence/batch of RGB images: CWHN
- Sequence/batch from colour/hyperspectral line scan camera: WCHN
There are two separate issues:
- disambiguating the meaning of axes: ImageAxes/AxisArrays (or whatever its successor turns out to be) is not only viable, but irreplaceable if you need to give meaning to certain axis types (like distinguishing spatial vs temporal axes).
- constructing "views" that support one of JuliaImages' core abstractions: each array index corresponds to the full spatiotemporal information for a single pixel in a single image. That means there isn't a color channel in that view, because color channels break that abstraction.
The second issue is of huge importance: it's what has allowed us to use generic algorithms to provide wide coverage for so many operations for gray/color and multidimensional images. However, it does not necessarily follow that all code has to use this abstraction: it's perfectly fine for some methods to split out into color channels and do specialized things.
In other words, what this PR is about is bridging these two worlds: providing a view consistent with no color channel while preserving memory layout favorable to algorithms that can be more efficiently written where color is a separate (and not the fastest) dimension of the array.
I guess "D"/depth was meant as the depth component of e.g. an RGB-D camera?
I was thinking of biomedical imaging techniques which are often natively 3D. I would distinguish an extra value D associated with a single X,Y position from an array where there is a D (or Z) axis to the array and intensity information available at every point.
But I think N makes lots of sense too as a generic marker. For example in a convolutional network, the N outputs of a single layer.
Another way to think about this core abstraction is to remember that, mathematically, an image is just a discretized version of a function mapping ℝᵐ to ℝⁿ. The m corresponds to all the spatio-temporal axes, i.e., the domain over which you have data. The n corresponds to the amount of data you get at each one of these locations. So a RGBD camera maps ℝ² to ℝ⁴, because you get 4 pieces of information at each 2d point. Conversely, a movie as you might watch on TV or in a theater maps ℝ³ to ℝ³, because HWT gets mapped to RGB.
In JuliaImages, the default representation is via an m-dimensional array of n-component values. Most other image processing suites represent this instead as an m+n dimensional array, i.e., over the product space ℝᵐ⊗ℝⁿ. But this adds a lot of complexity to algorithms: for example, a mean-shift algorithm like in ImageSegmentation requires that you compute a total distance between two pixels, where totaldistance = spatialdistance + colordistance. spatialdistance gets computed from the difference in indices for the two spatial locations, whereas colordistance gets computed from the difference in values assigned to the two locations. This is trivial to implement if you represent the image as an m-dimensional array of n-component values, and becomes a lot harder if you have a single m+n-dimensional array which may have different meanings to the different axes. There are many other examples, including iterating over neighboring pixels, etc.
So the core abstraction in JuliaImages is to make sure that the default representation doesn't mix these things up. But that's independent of memory layout.
There are two separate issues:
- constructing "views" that support one of JuliaImages' core abstractions: each array index corresponds to the full spatiotemporal information for a single pixel in a single image. That means there isn't a color channel in that view, because color channels break that abstraction.
The second issue is of huge importance: it's what has allowed us to use generic algorithms to provide wide coverage for so many operations for gray/color and multidimensional images. However, it does not necessarily follow that all code has to use this abstraction: it's perfectly fine for some methods to split out into color channels and do specialized things.
In other words, what this PR is about is bridging these two worlds: providing a view consistent with no color channel while preserving memory layout favorable to algorithms that can be more efficiently written where color is a separate (and not the fastest) dimension of the array.
Right! I also realised - post-commenting - that the the raw images (with channels), Bayer/bilinear/trilinear colour, and hyperspectral, images was off point (and the bilinear stuff was also wrong, I believe: One of the colour lines is inter-leaved red/blue while the other is all green - normally).
- disambiguating the meaning of axes: ImageAxes/AxisArrays (or whatever its successor turns out to be) is not only viable, but irreplaceable if you need to give meaning to certain axis types (like distinguishing spatial vs temporal axes).
In this regard you are saying...?
I guess "D"/depth was meant as the depth component of e.g. an RGB-D camera?
I was thinking of biomedical imaging techniques which are often natively 3D. I would distinguish an extra value D associated with a single X,Y position from an array where there is a D (or Z) axis to the array and intensity information available at every point.
Ah yes.
I guess it would work just fine as well with an RGB-D camera image like from a Kinect - once the typical (colour image, point cloud) output has been aligned/registered to e.g. enrich the colour image with depth data.
But I think N makes lots of sense too as a generic marker. For example in a convolutional network, the N outputs of a single layer.
I was more thinking of "N" as the number of images in a batch (or in a sequence/video) fed as input to a convolutional network... (as when the PyTorch world talks "NCHW"/"NHWC") 🤔
I was more thinking of "N" as the number of images in a batch (or in a sequence/video) fed as input to a convolutional network... (as when the PyTorch world talks "NCHW"/"NHWC")
Good point. That's a case that's not quite as nicely solved by JuliaImages' abstractions. Or rather, I guess it is, but IMO for the clearest case N would be an "outer" index and HW an "inner" index. I.e., images[k][i, j] would return the RGB value associated with spatial location i,j in the kth image. Having it be doubly-indexed clarifies the concept that the different images are not necessarily related to one another in the same manner as, say, a time series or XYZ image, for which the concept of "adjacent" in T or Z is genuinely meaningful.
But again, this representation is independent of memory layout. Suppose you have a NCHW array; this representation could be constructed something like (note: untested)
using StructArrays, ArraysOfArrays
bigarray = rand(T, n, c, h, w)
intermediatearray = StructArray(RGB{T}; dims=2)
nicearray = nestedview(intermediatearray, 1)
and all nicearray does is wrap bigarray with no additional memory allocation. But you might need special dispatch if you want to take advantage of this memory layout for efficient algorithms. We probably need tools that take possibly-nested array wrappers and return a fully-unwrapped raw array together with axis data indicating the meaning of each.
In HWC{RGB} etc., HWC looks a bit like a type, e.g. a suggestion could be (maybe this was also your intent?):
abstract type AbstractStorageOrder{<:Colorant} end
abstract type HWC{T} <: AbstractStorageOrder{T} end
with a generic function (with some* name):
interpret_raw(T::Type{<:AbstractStorageOrder{<:Colorant}}, raw_img)
*: I am still quite fond of channelview(C, normedview(T, raw_img)), so it would be nice if these convenience versions for "complicated layouts" were not too different from channelview: IIUC, the StructArray-constructor would still yield a view of the raw data - in that case it should be apparent from the function name that no conversion/copying is done.
IIUC, the channelview and normedview would remain unchanged, for when a StructArray representation is not necessary.
I'm not even sure we need interpret_raw, although I guess you're right that my syntax HWC{RGB}(rawimg) seems to imply that there's an HWC type and one can construct HWC instances (rather than, e.g., creating a StructArray). So maybe interpret_raw (or interpret_layout?) is better.
I was more thinking of "N" as the number of images in a batch (or in a sequence/video) fed as input to a convolutional network... (as when the PyTorch world talks "NCHW"/"NHWC")
Good point. That's a case that's not quite as nicely solved by JuliaImages' abstractions. Or rather, I guess it is, but IMO for the clearest case
Nwould be an "outer" index andHWan "inner" index. I.e.,images[k][i, j]would return the RGB value associated with spatial locationi,jin thekth image. Having it be doubly-indexed clarifies the concept that the different images are not necessarily related to one another in the same manner that, say, a time series or XYZ image is.
True, it might be an idea to not include "N" in that case - for this PR.
But perhaps consider "T" and "D" instead - common combinations of HWC etc. with T and D.
But again, this representation is independent of memory layout. Suppose you have a NCHW array; this representation could be constructed something like (note: untested)
using StructArrays, ArraysOfArrays bigarray = rand(T, n, c, h, w) intermediatearray = StructArray(RGB{T}; dims=2) nicearray = nestedview(intermediatearray, 1)and all
nicearraydoes is wrapbigarraywith no additional memory allocation.
Right! Nice example! :-)
But you might need special dispatch if you want to take advantage of this memory layout for efficient algorithms. We probably need tools that take possibly-nested array wrappers and return a fully-unwrapped raw array together with axis data indicating the meaning of each.
Right. This echoes my thoughts on the "input image in one format" (raw array format) and "input for neural network in another format" (raw array format). We'll see if it will be a stumbling block sooner or later... in relation to https://github.com/IHPSystems/ONNXRuntimeImageAnnotators.jl/issues/1
But with something like the example above we can probably get far (enough).
But perhaps consider "T" and "D" instead - common combinations of HWC etc. with T and D.
Yes, in that case I think img[i, j, k] is strongly preferred. The "discrete representation of a function" core concept helps make such decisions: when that 3rd axis is Z or T, you can imagine that it would have been possible to sample more densely and acquire values at intermediate positions/times in Z or T. But with N, it's just a "bag of images" and there is no sense in which you could interpolate between them.
I'm not even sure we need
interpret_raw, although I guess you're right that my syntaxHWC{RGB}(rawimg)seems to imply that there's anHWCtype and one can constructHWCinstances (rather than, e.g., creating aStructArray). So maybeinterpret_raw(orinterpret_layout?) is better.
Something like interpret/view would be nice. Layout is a good word, yes. Maybe layoutview? (with the top abstract type named something with Abstract and Layout - and maybe something with Image).
But perhaps consider "T" and "D" instead - common combinations of HWC etc. with T and D.
Yes, in that case I think
img[i, j, k]is strongly preferred. The "discrete representation of a function" core concept helps make such decisions: when that 3rd axis is Z or T, you can imagine that it would have been possible to sample more densely and acquire values at intermediate positions/times in Z or T. But with N, it's just a "bag of images" and there is no sense in which you could interpolate between them.
Right.
One associated thought - though I know a lot of algorithms in JuliaImages would not apply: If I just have an a = Array{UInt16, 3} from a 10-bit hyperspectral camera, then normedview(N4f10, a) will give me an image-like view with dimensions WCH where C is also a discretisation of a continuous space (wavelength/frequency)... (kind of getting "half of JuliaImages"... - the core abstraction(s) :-)
... but one could also have used the notation L (for lambda) instead of C in this case - to not encourage trying to interpret C > 3 (or much higher than 3) as the normal Colorants...
That's a great example! You're right that wavelength-based representations muddy the waters; even RGB wanders into this territory more than, say, LUV or similar where the different channels in no way represent a discretization.
I mean, the WCH-image (or WLH with L for lambda) is an intensity-image, with values between 0 and 1 - and for each c in C, you have a Monochrome-image - like a stack of Gray-images - continuous over L.
In the use case I have encountered L was 256 - but it could be any value >= 1 (just wouldn't be as hyper ... spectral).
Might be related to what you hinted at in https://github.com/JuliaImages/ImageCore.jl/issues/170#issuecomment-897536677, though I can't quite grasp what you meant there... ("the @makealpha macro was missing from the example...?)
I just meant that you don't have to stick with the types already defined in ColorTypes, it was designed to make it fairly easy to add new types.
Some wrapper types would help, but I think what is actually needed here is an interface or protocol that can be used to describe the axes of "image" data.
Note that this is a distinct but related problem to that being addressed by AxisArrays.jl and related packages. Rather than describing intrinsic properties of array data, here we need to describe an extrinsic property of how we want to interpret those axes.
It is related to the AxisArrays.jl issue in that there may some natural default mappings between intrinsic axes to the image space.
What I'm hearing here are two or three classes of axes.
- Image space (not necessarily the physical spatial dimensions)
- Colorant
- Composition / Rendering (e.g. alpha transparency,
I think most suites bundle alpha and color into a single axis, e.g., RGBA. We do that too, as an RGBA type.
If we are talking about 2D, I agree. In 3D for volumetric or voxel rendering the situation may get more complicated.
Some wrapper types would help, but I think what is actually needed here is an interface or protocol that can be used to describe the axes of "image" data.
Sounds right - at least as some follow-up from this PR. To me this reads along the lines of what @timholy hinted at with “We probably need tools…” in https://github.com/JuliaImages/ImageCore.jl/pull/179#issuecomment-1676341132 ?
Note that this is a distinct but related problem to that being addressed by AxisArrays.jl and related packages. Rather than describing intrinsic properties of array data, here we need to describe an extrinsic property of how we want to interpret those axes.
Can’t quite follow the reasoning here. How is intrinsic different from extrinsic in this regard?
(and then I didn’t get the rest either :-) )
Can’t quite follow the reasoning here. How is intrinsic different from extrinsic in this regard?
Perhaps the rows and columns of a Matrix correspond to distinct temperatures and pressures. These are intrinsic to the array data.
Now we want to display this as an image. So we map temperature to "X" or the horizontal axis and pressure to "Y" to the vertical axes. "X" and "Y" are extrinsic properties that we impose on the array based on how we want to interpret the data as an image.