machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Can't make image classification prediction on MacOS using a Tensorflow model

Open vadd98 opened this issue 3 years ago • 17 comments

System Information (please complete the following information):

  • OS & Version: MacOS 12.4
  • ML.NET Version: Microsoft.ML 1.7.1
  • .NET Version: NET 6.0

Describe the bug It seems impossible to make prediction based on images using a Tensorflow model on Mac. I'm always getting this error:

Unhandled exception. System.IndexOutOfRangeException: Index was outside the bounds of the array. at Microsoft.ML.Transforms.Image.ImagePixelExtractingTransformer.Mapper.<>c__DisplayClass5_01.<GetGetterCore>b__1(VBuffer1& dst) at Microsoft.ML.Transforms.TensorFlowTransformer.TensorValueGetterVec1.GetTensor() at Microsoft.ML.Transforms.TensorFlowTransformer.Mapper.UpdateCacheIfNeeded(Int64 position, ITensorValueGetter[] srcTensorGetters, String[] activeOutputColNames, OutputCache outputCache) at Microsoft.ML.Transforms.TensorFlowTransformer.Mapper.<>c__DisplayClass11_01.<MakeGetter>b__4(VBuffer1& dst) at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.<>c__DisplayClass8_01.<CreateDirectVBufferSetter>b__0(TRow row) at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.FillValues(TRow row) at Microsoft.ML.Data.TypedCursorable1.RowImplementation.FillValues(TRow row) at Microsoft.ML.PredictionEngineBase2.FillValues(TDst prediction) at Microsoft.ML.PredictionEngine2.Predict(TSrc example, TDst& prediction) at Microsoft.ML.PredictionEngineBase2.Predict(TSrc example)

The exact same code with same model and same images to predict works flawlessly on Windows 10.

To Reproduce Steps to reproduce the behavior:

  1. Train a Tensorflow Model (I trained the model using Python)
  2. Import the trained model in ML.NET
  3. Try to make a prediction starting from an image
  4. See error

Expected behavior Make predictions without exceptions.

Screenshots, Code, Sample Projects If applicable, add screenshots, code snippets, or sample projects to help explain your problem.

Additional context Add any other context about the problem here.

vadd98 avatar Jun 17 '22 19:06 vadd98

Are you using M1 macbook?

LittleLittleCloud avatar Jul 06 '22 19:07 LittleLittleCloud

No, I'm using a 2019 Intel Macbook Pro

vadd98 avatar Jul 06 '22 20:07 vadd98

Can you provide a minimal reproduction example, and also the following information

  • version of tensorflow you use to train the model

LittleLittleCloud avatar Jul 06 '22 20:07 LittleLittleCloud

This issue has been marked needs-author-action and may be missing some important information.

ghost avatar Jul 06 '22 22:07 ghost

Sorry for the late response but it has been some busy days. Here is a repro https://github.com/vadd98/MLNet_Repro I left a TODO where the crash happens.

The model is trained on a MacBook using tensorflow-metal version 0.5.0

vadd98 avatar Jul 18 '22 21:07 vadd98

@michaelgsharp can you help take a look

LittleLittleCloud avatar Jul 18 '22 22:07 LittleLittleCloud

Sorry for my delay. @vadd98 what version of the tensorflow redist did you use from nuget? and what version did you use with your python code?

michaelgsharp avatar Aug 08 '22 16:08 michaelgsharp

This issue has been marked needs-author-action and may be missing some important information.

ghost avatar Aug 08 '22 16:08 ghost

I'm using the SciSharp.TensorFlow.Redist Nuget package version 2.3.1. In Python I'm using tensorflow-macos version 2.9.2 and tensorflow-metal version 0.5.0

vadd98 avatar Aug 09 '22 17:08 vadd98

Could you try retraining your model with tensorflow-macos version 2.3.1? Let us know what happens. It's possible there is an incompatibility between 2.9.2 and 2.3.1 on mac that doesn't exist on windows.

dakersnar avatar Aug 09 '22 19:08 dakersnar

This issue has been marked needs-author-action and may be missing some important information.

ghost avatar Aug 09 '22 19:08 ghost

It seems there is no 2.3.1 version of tensorflow-macos, the minimum version is 2.5.0. So I tried updating redist to the latest version (2.7.0) but the error still remains. I'll try to retrain the model using tensorflow-macos and redist both on 2.7.0 version and will let you know as soon as I have any update

vadd98 avatar Aug 09 '22 20:08 vadd98

I just tried using tensorflow-macos and redist both on 2.7.0 version for training (via Python) and predicting but I still get the error

vadd98 avatar Aug 09 '22 21:08 vadd98

Managed to run training and predictions (using the sample code at least) on Mac M1 using tensorflow-macos 2.9.2 and SciSharp.TensorFlow.Redist 2.7.0. I also had to add the Microsoft.ML.TensorFlow.Redist package (latest version 0.14.0) to get it to work. Maybe it'll help @vadd98

Ritorna avatar Aug 10 '22 09:08 Ritorna

I just tried to add the Microsoft.ML.TensorFlow.Redist package version 0.14.0 but I still get the error. By the way I'd like to underline that I'm on an Intel Mac, not on a M1, so it could be that the issue exists only on the x64 architecture

vadd98 avatar Aug 10 '22 16:08 vadd98

Sorry for the delay. I got a hold of an intel mac machine and successfully managed to reproduce this bug as you outlined. I'm not exactly sure what is causing it, but I'm on a MacBook Air 2018, Intel chip, MacOS 12.4 for the reference of others in this thread.

dakersnar avatar Sep 02 '22 20:09 dakersnar

@tarekgh was planning some refactoring work for this code already, so we will take a look at this bug again once that goes through.

dakersnar avatar Sep 06 '22 21:09 dakersnar

I just tried to run the same code in a net 6 focal docker container and I'm getting the same error

vadd98 avatar Oct 15 '22 19:10 vadd98

I am working on changing the Image handling in general and should fix this problem https://github.com/dotnet/machinelearning/pull/6363. Will let you know when I have it merged. Thanks for your patience.

tarekgh avatar Oct 15 '22 19:10 tarekgh

The PR https://github.com/dotnet/machinelearning/pull/6363 should be addressed this issue. Please try it and let's know if you see any issue.

Please note, we have introduced a class called MLImage to use instead of System.Drawing.Bitmap. In your repro code, please replace the usage of Bitmap with MLImage.

tarekgh avatar Oct 20 '22 20:10 tarekgh

Thanks! I just have a question: is there already any pre-release package I can use or I have to build the library from source?

vadd98 avatar Oct 20 '22 21:10 vadd98

You can get the newly built package from the internal feed. But you need to wait till the new package gets built, signed, and published. Currently we are experiencing some issues in the building, this should be fixed soon.

The way to reference the internal feed is as follows. When we have a new build, I'll send you the package version that you can use.

Add a nuget.config file to your project, in the same folder as your .csproj or .sln file

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <clear />
    <add key="MachineLearning" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json" />
  </packageSources>
</configuration>

tarekgh avatar Oct 20 '22 22:10 tarekgh

The package version you can use which has the changes is 2.0.0-preview.22520.8

tarekgh avatar Oct 21 '22 01:10 tarekgh

I'm using the new package and it is indeed working on Mac and on Ubuntu Focal Docker containers. Thanks!

vadd98 avatar Oct 21 '22 13:10 vadd98