XRT icon indicating copy to clipboard operation
XRT copied to clipboard

Improvements for docs multiprocess.rst & xrt_native_apis.rst

Open timsnyder opened this issue 3 years ago • 0 comments

First, thanks for writing https://github.com/Xilinx/XRT/commits/master/src/runtime_src/doc/toc/multiprocess.rst! I found it very helpful while migrating from AWS-f1 to an Alveo board.

There are a couple of things that I think could be done to improve the page and I have a couple of clarifying questions about the content.

  1. Can you extend the scenario to multiple devices with multiple processes? Are there any concerns about orchestrating the programming of all devices on a host before beginning to run applications on any of them? My guess would be that there is not because the devices in XRT architecture are suitably isolated from each other such that programming one device does not perturb the others. It would be nice if you could add a one or two sentence section at the end that addresses multiple devices or points to the relevant doc if it is discussed elsewhere.

  2. Can you explain why Process 6 fails when it is trying to use the xclbin used by Processes0-4? ~~I interpret the current doc as saying the error caused by Process5 puts the device into an error state that will cause all subsequent xclLoadXclBin() calls to fail, regardless of requested UUID until something happens. If I'm reading it correctly, it would be nice for the doc to explicitly state why Process6 fails and also explain what something has to happen for the device to allow further UUID_X processes to succeed. Does the device have to be reset?~~. Wow, I looked more carefully and my original interpretation of the small diagram was way off. I see now that Process6 fails because it calls xclOpenContext() using UUID_Y when UUID_X is loaded and has open contexts (applications running using it).

  3. Can you clarify that user-managed kernels can not use multiple processes and point to https://xilinx.github.io/XRT/2022.1/html/xrt_native_apis.html#user-managed-kernel ? Looking only at multiprocess.rst, I asked myself "what about user-managed kernels" and had to dig around to find the answer on the other page.

  4. Some small changes to the diagram could improve it's clarity:

    • Update it to use the C++ api since xrt_native_apis.rst says that's the API you prefer for users
    • In the error cases, indicate the ERROR condition with an additional node that has customized shape like star or doubleoctagon and explain in the words leading up to the diagram that host code is still expected to call xclClose() (in the C api, I would assume in the C++ api that the device handle going out of scope and being destructed is sufficient)

    Regarding https://xilinx.github.io/XRT/2022.1/html/xrt_native_apis.html#xrt-error-api and xrt_native_main.rst, which exception classes are thrown by your C++ api and by which methods? I suppose since the code it open, there's my answer... It would be helpful to remind users in https://xilinx.github.io/XRT/2022.1/html/xrt_native.main.html that they can go look at the XRT implementation code for the detailed synchronized exception classes. I was confused because I keep forgetting that XRT is open-source. It is easy to do when you are looking at docs on docs.xilinx.com and then click into the XRT docs and you are used to having everything closed-source when coming from a vendor.

timsnyder avatar Jul 02 '22 15:07 timsnyder