vision Model instantiation without loading from disk

🚀 The feature

So far, the only way to use a model from torchvision is through loading a jit checkpoint from the disk like so:

#include <torch/script.h>

#include <iostream>
#include <memory>

int main(int argc, const char* argv[]) {
  if (argc != 2) {
    std::cerr << "usage: example-app <path-to-exported-script-module>\n";
    return -1;
  }

  // Deserialize the ScriptModule from a file using torch::jit::load().
  std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(argv[1]);

  assert(module != nullptr);
  std::cout << "ok\n";
}

The feature that I would like to propose is to purge away the need to have a precompiled jit file and integrate a methodology in the C++ PyTorch frontend that can easily instantiate any torchvision.models file as easily as in Python. For example:

#include <torch/script.h>

#include <iostream>
#include <memory>

int main(int argc, const char* argv[]) {
  std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(torchvision::models::resnet50);

  assert(module != nullptr);
  std::cout << "ok\n";
}

Motivation, pitch

There shouldn't be any dependencies between the Python frontend and the C++ frontend. Specifically, there are projects that leverage the C++ PyTorch API solely, and in that case, the developers have to invoke every time a Python script before the utilization of their framework just to create an instance of the desired model from torchvision.models to then use their framework. This is a timely process, particularly if there is frequent model change at runtime.

Specific use case: I am building a framework that is connected with Torch TensorRT and utilizes NVIDIA NVDLAs of Jetson boards. However, every time I query my framework for some workload, I have to first use Python and compile a jit instance to later load in my framework. This creates a huge overhead and since disk operations are most timely, it defeats the whole purpose of using C++ to accelerate the process.

Mar 17 '24 07:03 AndreasKaratzas

Thank you for the request @AndreasKaratzas

However, every time I query my framework for some workload, I have to first use Python and compile a jit instance to later load in my framework

It wouldn't completely address your situation, but is it feasible for you to create the scripted model (in Python) once and for all and just load that one file every time?

I'm sorry that this is not the answer you're looking for, but I'm afraid the C++ models won't be coming back. For some context, we used to support direct C++ model access in the past, but the decision was made to deprecate that API in favour of torchscript. Whether this was a user-friendly decision in the long-term is up for debate, but the main reason at the time was that maintaining both the Python and C++ backend was just too much maintenance work. Some relevant past discussion: https://github.com/pytorch/vision/pull/4375#issuecomment-916982070

If that helps and if it's an option for you, those C++ models were removed in https://github.com/pytorch/vision/pull/6632/files so you'll find the original C++ implementations there (some of these models may have been updated in Python since then).

Mar 18 '24 10:03 NicolasHug

Hello @NicolasHug,

Thank you for your prompt response.

It wouldn't completely address your situation, but is it feasible for you to create the scripted model (in Python) once and for all and just load that one file every time?

This would be a nice solution generally. However, to my problem, there are several parameters that need to be taken into account. My domain primarily evolves around embedded systems, where the workload changes rapidly and frequently. In particular, regarding the case of edge data centers, there are various clients with different needs who post different queries on the edge systems. Therefore, these constrained devices must be capable of quickly serving each client. I opted for torch/TensorRT to boost the performance of my framework. However, there is a huge computational burden and, worse of all, a power burden if I have to read the memory too frequently. Furthermore, due to the variety of available torch models (torchvision alone has more than 70 models), the solution of storing several hundreds of GBs on the embedded devices is not scalable and cannot streamline efficiency. I strongly think that my use case is not an extreme scenario and that torch/TensorRT primarily targets such cases where performance and efficiency are primary challenges.

I also reviewed the pull request referenced. As pointed out, the models there are old and point to a rather old release of torchvision.

Overall, I think that in the case of torch/TensorRT, there is a huge benefit to reinstating and maintaining the CPP models. This is due to the target group that torch/TensorRT attracts, which mostly chases performance, efficiency, and scalability in constrained devices.

Mar 18 '24 20:03 AndreasKaratzas