machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Updated Tensorflow.Net to 0.70.2 with Tensorflow 2.7.0.

Open Crichen opened this issue 8 months ago • 8 comments

Fixes #7471

NumSharp replaced with Tensorflow.NumPy. TensorShape replaced with Shape, Shape object has dimensions as 64 bit long, check added for casting to 32 bit int alsoTensor constructor using SafeTensorHandle/DangerousGetHandle and TF_DataType not required when casting.

Added StringTensorFactory to wrap addition tensorflow.dll methods required to create Tensors from string based input.

We are excited to review your PR.

So we can do the best job, please check:

  • [x] There's a descriptive title that will make sense to other developers some time from now.
  • [x] There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • [x] Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • [x] You have included any necessary tests in the same PR.

Crichen avatar May 23 '25 10:05 Crichen

@Crichen please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@dotnet-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@dotnet-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@dotnet-policy-service agree company="Microsoft"

Contributor License Agreement

@dotnet-policy-service agree

Crichen avatar May 23 '25 10:05 Crichen

CI failing is interesting, I wonder if the dotnet9.0 runtime needs to be included in the initial build steps? Following the standard build process: 6.0.36, 8.0.16 and 10.0.0-preview.3.25171.5 are installed to the machinelearning\.dotnet folder. Running .\build.cmd -test -integrationTest resulted in errors with referencing 9.0.

I installed the runtime manually to the machinelearning\.dotnet location and this was then resolved. Very unsure of where in the tooling pipeline this would need to be updated.

Crichen avatar May 23 '25 12:05 Crichen

Continuing CI issues, without access to logs we're limited in how far we can proceed with fixes.

Crichen avatar May 26 '25 09:05 Crichen

/azp run MachineLearning-CI

ericstj avatar Jun 12 '25 20:06 ericstj

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jun 12 '25 20:06 azure-pipelines[bot]

Sorry for the delay here, I'll have a look and see what failures you're hitting and figure out how we can get this working. Thank you for your contribution - this update is something we've wanted to get in.

ericstj avatar Jun 12 '25 20:06 ericstj

CI failures are all due to packages not mirrored to our build feeds. https://dev.azure.com/dnceng-public/public/_build/results?buildId=1066692&view=logs&j=80b813b5-9a08-5859-11a8-dc0e5b556e52&t=99848337-6ccc-53eb-9c14-1b676ae001b9

/__w/1/s/src/Microsoft.ML.Console/Microsoft.ML.Console.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.AutoML/Microsoft.ML.AutoML.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.CodeGenerator/Microsoft.ML.CodeGenerator.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.AutoML.Interactive/Microsoft.ML.AutoML.Interactive.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1603: Warning As Error: Microsoft.ML.Samples.GPU depends on SciSharp.TensorFlow.Redist-Linux-GPU (>= 2.7.0) but SciSharp.TensorFlow.Redist-Linux-GPU 2.7.0 was not found. SciSharp.TensorFlow.Redist-Linux-GPU 2.11.1 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1101: Unable to find package SciSharp.TensorFlow.Redist-Linux-GPU-primary. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1101: Unable to find package SciSharp.TensorFlow.Redist-Linux-GPU-fragment1. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1101: Unable to find package SciSharp.TensorFlow.Redist-Linux-GPU-fragment2. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1101: Unable to find package SciSharp.TensorFlow.Redist-Linux-GPU-fragment3. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.Fairlearn.Tests/Microsoft.ML.Fairlearn.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples/Microsoft.ML.Samples.csproj : error NU1603: Warning As Error: Microsoft.ML.Samples depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples/Microsoft.ML.Samples.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.PerformanceTests/Microsoft.ML.PerformanceTests.csproj : error NU1603: Warning As Error: Microsoft.ML.PerformanceTests depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.AutoML.Samples/Microsoft.ML.AutoML.Samples.csproj : error NU1603: Warning As Error: Microsoft.ML.AutoML.Samples depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.AutoML.Samples/Microsoft.ML.AutoML.Samples.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.PerformanceTests/Microsoft.ML.PerformanceTests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.TensorFlow.Tests/Microsoft.ML.TensorFlow.Tests.csproj : error NU1603: Warning As Error: Microsoft.ML.TensorFlow.Tests depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.TensorFlow.Tests/Microsoft.ML.TensorFlow.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.Core.Tests/Microsoft.ML.Core.Tests.csproj : error NU1603: Warning As Error: Microsoft.ML.Core.Tests depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.Core.Tests/Microsoft.ML.Core.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.CodeGenerator.Tests/Microsoft.ML.CodeGenerator.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.Benchmarks.Tests/Microsoft.ML.Benchmarks.Tests.csproj : error NU1603: Warning As Error: Microsoft.ML.PerformanceTests depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.Benchmarks.Tests/Microsoft.ML.Benchmarks.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.AutoML.Tests/Microsoft.ML.AutoML.Tests.csproj : error NU1603: Warning As Error: Microsoft.ML.AutoML.Tests depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.AutoML.Tests/Microsoft.ML.AutoML.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.Vision/Microsoft.ML.Vision.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.TensorFlow/Microsoft.ML.TensorFlow.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.DnnAnalyzer/Microsoft.ML.DnnAnalyzer/Microsoft.ML.DnnAnalyzer.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.Fairlearn/Microsoft.ML.Fairlearn.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]

Let me help get those mirrored.

ericstj avatar Jun 17 '25 14:06 ericstj

Codecov Report

Attention: Patch coverage is 76.33588% with 31 lines in your changes missing coverage. Please review.

Project coverage is 68.99%. Comparing base (71e1280) to head (e99dcc1). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/Microsoft.ML.TensorFlow/TensorflowUtils.cs 55.88% 14 Missing and 1 partial :warning:
src/Microsoft.ML.Vision/DnnRetrainTransform.cs 61.53% 8 Missing and 2 partials :warning:
src/Microsoft.ML.TensorFlow/TensorflowTransform.cs 85.00% 3 Missing :warning:
...rc/Microsoft.ML.TensorFlow/TensorTypeExtensions.cs 0.00% 0 Missing and 2 partials :warning:
.../Microsoft.ML.Vision/ImageClassificationTrainer.cs 97.56% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7472   +/-   ##
=======================================
  Coverage   68.98%   68.99%           
=======================================
  Files        1482     1482           
  Lines      273880   273901   +21     
  Branches    28254    28256    +2     
=======================================
+ Hits       188941   188977   +36     
+ Misses      77553    77534   -19     
- Partials     7386     7390    +4     
Flag Coverage Δ
Debug 68.99% <76.33%> (+<0.01%) :arrow_up:
production 63.28% <74.79%> (+<0.01%) :arrow_up:
test 89.45% <100.00%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ft.ML.TensorFlow.Tests/TensorFlowEstimatorTests.cs 98.52% <100.00%> (+0.01%) :arrow_up:
...t/Microsoft.ML.TensorFlow.Tests/TensorflowTests.cs 91.64% <100.00%> (ø)
.../Microsoft.ML.Vision/ImageClassificationTrainer.cs 92.17% <97.56%> (+0.03%) :arrow_up:
...rc/Microsoft.ML.TensorFlow/TensorTypeExtensions.cs 65.38% <0.00%> (+30.76%) :arrow_up:
src/Microsoft.ML.TensorFlow/TensorflowTransform.cs 85.57% <85.00%> (ø)
src/Microsoft.ML.Vision/DnnRetrainTransform.cs 61.01% <61.53%> (ø)
src/Microsoft.ML.TensorFlow/TensorflowUtils.cs 72.80% <55.88%> (-0.13%) :arrow_down:

... and 5 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jun 17 '25 17:06 codecov[bot]

Ok, I think this addresses all the feedback. @Crichen Can you have a look at what I did and see if this still would work for you? If so we can bring in another reviewer to get this in.

ericstj avatar Jul 09 '25 23:07 ericstj

@ericstj I assume the failing test https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-machinelearning-refs-pull-7472-merge-57e26a202b58496b9e/Microsoft.ML.Tests/1/console.e9146afc.log?helixlogtype=result is unrelated to tensor flow, right?

tarekgh avatar Jul 10 '25 20:07 tarekgh

..NET Framework tests are not failing, but crashing on exit. Finalizer seems to ~be double-disposing~ accessing another object outside it's graph which has already been finalized/disposed.

Unhandled exception: System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'The ThreadLocal object has been disposed.'.
   at System.Threading.ThreadLocal`1.GetValueSlow()
   at Tensorflow.BaseSession.DisposeUnmanagedResources(IntPtr handle)
   at Tensorflow.DisposableObject.Dispose(Boolean disposing)
   at Tensorflow.DisposableObject.Finalize()
System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'The ThreadLocal object has been disposed.'.
   at System.Threading.ThreadLocal`1.GetValueSlow()
   at Tensorflow.BaseSession.DisposeUnmanagedResources(IntPtr handle)
   at Tensorflow.DisposableObject.Dispose(Boolean disposing)
   at Tensorflow.DisposableObject.Finalize()

ericstj avatar Jul 10 '25 20:07 ericstj

Sigh, that's this ThreadLocal https://github.com/SciSharp/TensorFlow.NET/blob/0ee50d319e5539f15b13f8909fd246c18819d840/src/TensorFlowNET.Core/tensorflow.cs#L47

Which is accessed in the finalizer -- https://github.com/SciSharp/TensorFlow.NET/blob/0ee50d319e5539f15b13f8909fd246c18819d840/src/TensorFlowNET.Core/Sessions/BaseSession.cs#L301 That'll fail if the TF object and it's ThreadLocal's are finalized first.

This bug was introduced in https://github.com/SciSharp/TensorFlow.NET/commit/ec340eeff57c7f9bef8fc21dd94f17889b7453b5#diff-cb5a758e3cc3589393346616092a8e8cb3ab5f0bf833897526f765dff28486e2L294.

It had actually been previously fixed in https://github.com/SciSharp/TensorFlow.NET/commit/43625abe917a4712e8cdad9c7b49c9875f302a68, but was later regressed.

It's partially fixed with https://github.com/SciSharp/TensorFlow.NET/commit/a7c9a75954d219cb606042fcbfbeb1b176781d7e, but that change introduced a problem because _status was not set in all cases. That was fixed in https://github.com/SciSharp/TensorFlow.NET/commit/58de537be5b643c77f887bd13f146894d32bf8f7 but we can't take that due to the strong name bugs.

Let me see if we can somehow workaround this.

ericstj avatar Jul 10 '25 21:07 ericstj

I wonder if we just pick up 0.100.4 and then patch it to set the _status value of the session.

ericstj avatar Jul 10 '25 21:07 ericstj

would be great to have this merged...Is very annoying being limited to use old NVIDIA cards

rblanca avatar Jul 11 '25 01:07 rblanca

@ericstj many thanks for looking at this for us. I'm on holiday next week, but back in on the 21st and I'll check out the branch and have a look over.

Crichen avatar Jul 11 '25 14:07 Crichen

I think I have worked around all the crashes. What's happening is buggy finalizer in TF for Session will sometimes crash - depending on the order that the GC decides to finalize objects.

So long as we Dispose all Session objects we'll avoid this.

The problems is that ML.NET has rather loose rules for disposal. Here's the best I can summarize.

  1. A TensorFlowModel contains a Session and should be disposed if this your final object.
  2. A TensorFlowEstimator contains the TensorFlowModel, but itself is not disposable. If this is your final object, then you should maintain the lifetime via the TensorFlowModel. The EstimatorChain doesn't have plumbing for IDisosable, presumably because it's an intermediate object that should be fit to a transformer.
  3. A TensorFlowTransformer contains a copy of the session from the TensorFlow model. This object is Disposable, as is a TransformerChain. If this (or a chain) is your final object then you can use it to manage lifetime.

I found a case of both 1 and 3 in our tests where we weren't disposing, which was causing the objects to hit the finalizer and crash.

ericstj avatar Jul 11 '25 20:07 ericstj

I did a scrub of the TF.NET Codebase for other instances of crashing finalizers. I found one in EagerResourceDeleter but we don't use that. Just so happens that was already reported, so I dropped a note about root cause in https://github.com/SciSharp/TensorFlow.NET/issues/984.

ericstj avatar Jul 11 '25 21:07 ericstj

@Crichen - bumping this in case you are back from holiday. If you're happy with the changes we can merge this.

ericstj avatar Jul 22 '25 16:07 ericstj

Hi @ericstj code looks good, much neater than our long to int hack :-) I've got the branch checked out and we are running an integration test against our code base to make sure that we getting similar results to using the earlier Tensorflow.Net version. Fingers crossed for no distractions today.

Crichen avatar Jul 23 '25 08:07 Crichen

Hi @ericstj tested with our code and can see tiny differences in output values but well within tolerances for being updated Tensorflow and statistical modelling. Looks good to merge in! Let us know if you need anything else.

Crichen avatar Jul 23 '25 14:07 Crichen

Sounds good. I really wish we could move to the latest Tensorflow.NET, but those breaking changes around strong-name signing are a real blocker for us. At least we are able to use the latest published Tensorflow redist. @luisquintanilla @Oceania2018

ericstj avatar Jul 23 '25 16:07 ericstj