robotcode icon indicating copy to clipboard operation
robotcode copied to clipboard

[BUG] Analyser checks non-open files despite of the extension setting

Open Heck-R opened this issue 1 year ago • 3 comments

Describe the bug The extension analyses all test and resource files regardless of the "robotcode.analysis.diagnosticMode": "openFilesOnly" setting

Steps To Reproduce Steps to reproduce the behavior:

  1. Open an empty folder in VsCode
    • For simplicity, it is assumed that robotframework is available in the default interpreter used by the extension, but the issue is the same regardless of the interpreter coming from a Python installation or a virtual environent
  2. Add a .vscode\settings.json
{
  "robotcode.analysis.progressMode": "detailed",
  "robotcode.analysis.diagnosticMode": "openFilesOnly"
}
  1. Create some .resource files
  2. Create some .robot files
  3. Close all open .resource and .robot files (actually some can stay open, just make sure there are un-referenced files that are not open)
  4. Run Clear Cache and Restart Language Servers extension command
  5. See the analysis going through all .resource and .robot files regardless of them being open

To minimize the reproduction effort, just use the generator script from this analyserIssue.zip (it's PowerShell, so it works on Windows by default)

  1. Extract the zip to an empty folder
  2. Open with VsCode
  3. Run the generator script in the integrated terminal: .\PackageGenerator.ps1 \
    • the working directory should be the opened folder
    • there are no arguments, but the generated files can be tweaked by changing the commented variables as the top to make it easier or harder for the analyser
      • To make it actually slow, just set $resourceLevels = @(100, 100, 100) to make a ridiculous amount of resource references
  4. Make sure .resource and .robot files are closed
  5. Run Clear Cache and Restart Language Servers extension command
  6. See the analysis going through all .resource and .robot files regardless of them being open

From the dummy examples above it seems like this is not a real issue, as it's quite fast, but in our real-world repo (which I can sadly not share) with hundreds of tests referencing hundreds of resources with proper logic and documentation, this can take minutes, and it would be a lot better if it was really only the open files that are analysed

Expected behavior When the "robotcode.analysis.diagnosticMode": "openFilesOnly" setting is defined, only the open files (and the ones referenced from these) are analysed

Screenshots/ Videos Showcasing the problem using the generator script as preparation https://github.com/user-attachments/assets/aba0cb70-3545-4a39-b768-c5ac7b45b3ad

Logs RobotCode

Activate RobotCode Extension.
Try to activate python extension
Python Extension is active
executeRobotCode: c:\Program Files\Python\3.12\python.exe -u -X utf8 c:\Users\heckmannd\.vscode\extensions\d-biehl.robotcode-0.95.2\bundled\tool\robotcode --format json --no-color --no-pager --default-path . profiles list
create Language client: RobotCode Language Server mode=pipe for folder 'analyserIssue'
trying to start Language client: RobotCode Language Server mode=pipe for folder 'analyserIssue'
client for file:///c%3A/Users/heckmannd/LocalMadness/tmp/analyserIssue starting.
executeRobotCode: exit code 0
executeRobotCode: c:\Program Files\Python\3.12\python.exe -u -X utf8 c:\Users\heckmannd\.vscode\extensions\d-biehl.robotcode-0.95.2\bundled\tool\robotcode --format json --no-color --no-pager --default-path . profiles list
client for file:///c%3A/Users/heckmannd/LocalMadness/tmp/analyserIssue running.
client for file:///c%3A/Users/heckmannd/LocalMadness/tmp/analyserIssue started.
executeRobotCode: c:\Program Files\Python\3.12\python.exe -u -X utf8 c:\Users\heckmannd\.vscode\extensions\d-biehl.robotcode-0.95.2\bundled\tool\robotcode --format json --no-color --no-pager --default-path . discover --read-from-stdin all
executeRobotCode: exit code 0
executeRobotCode: c:\Program Files\Python\3.12\python.exe -u -X utf8 c:\Users\heckmannd\.vscode\extensions\d-biehl.robotcode-0.95.2\bundled\tool\robotcode --format json --no-color --no-pager --default-path . profiles list
executeRobotCode: exit code 0
executeRobotCode: exit code 0
executeRobotCode: c:\Program Files\Python\3.12\python.exe -u -X utf8 c:\Users\heckmannd\.vscode\extensions\d-biehl.robotcode-0.95.2\bundled\tool\robotcode --format json --no-color --no-pager --default-path . profiles list
executeRobotCode: exit code 0
executeRobotCode: c:\Program Files\Python\3.12\python.exe -u -X utf8 c:\Users\heckmannd\.vscode\extensions\d-biehl.robotcode-0.95.2\bundled\tool\robotcode --format json --no-color --no-pager --default-path . profiles list
executeRobotCode: exit code 0
executeRobotCode: c:\Program Files\Python\3.12\python.exe -u -X utf8 c:\Users\heckmannd\.vscode\extensions\d-biehl.robotcode-0.95.2\bundled\tool\robotcode --format json --no-color --no-pager --default-path . profiles list
executeRobotCode: exit code 0

RobotCode Language Server

Start analyzing workspace for 300 documents
End analyzing workspace for 300 documents took 3.1720 seconds
Start collect workspace diagnostic for 0 documents
End collect workspace diagnostic for 0 documents took 0.0150 seconds

Additional context This issue is not specific to newer releases, it's been present for a while

Making the folder a git repo and gitignoring the folder containing all the related files will actually cause the expected result (only analysing the open files)

There is also a potentially related issue where modifying a file re-analyses unrelated files, but I could not yet reproduce that with a dummy example I can share, so for now I'll just leave it here as an additional note (maybe it gives some ideas regarding the provided reproducible issue, if not, just ignore it for now, and I'll try to report that separately with a proper example)

Desktop (please complete the following information):

  • VS Code Version 1.95.0
  • RobotCode Version v0.95.2
  • OS: Windows
  • Python Version 3.11.9 (it was the same with 3.9.13) (the language server seems to be using Python 3.12 ased on its log, not sure why, but when doing the same test with a virtual environment, robotframework was definitely installed inside 3.11.9, resulting in the same issue)
  • RobotFramework Version 7.1 (it was the same with older versions, but can't recall the exact ones anymore)
  • Additional tools robotidy was installed at the time of the recording, but the issue remains after a pip uninstall

Heck-R avatar Oct 30 '24 17:10 Heck-R

Thank you for the detailed issue!

This behavior is actually intended and not a bug. Robot Framework handles imports and resource files in a way that requires comprehensive analysis to ensure accurate code navigation and diagnostics. Variables and keywords defined in one file can influence imports and be used in other imported resource files or test cases. For example, defining a variable or keyword in a .robot file allows it to be accessed in an imported resource file, and vice versa. This means changes in one file can affect other files, including which imports are loaded further down the chain.

For quick and accurate feedback within the IDE, the RobotCodes language server needs to analyze all files in the project to correctly identify these dependencies and reflect the impact of changes across the entire codebase. Without analyzing all files, the RobotCode wouldn't be able to provide accurate code completion, reference finding, or error detection for interdependent files. The setting robotcode.analysis.diagnosticMode with the value openFilesOnly modifies the display of errors and warnings, limiting them to currently open files, but it doesn't disable the background analysis of all files. This helps focus on the files you're actively working on while maintaining a holistic understanding of the project.

When a file changes, only its dependent files are re-analyzed. This optimization maintains performance by avoiding unnecessary re-analysis of unrelated files. The language server efficiently updates the analysis to reflect changes without reprocessing the entire project.

Regarding robotcode.analysis.progressMode, it's turned off by default and should primarily be enabled for diagnostic purposes to get an approximate progress status when there are issues analyzing the project. Enabling this setting consumes additional time because data is sent to VS Code and may wait for responses. Therefore, it's advisable to keep this setting off under normal circumstances, or set it to simple where fewer data are transmitted.

In the latest version of RobotCode, I have significantly improved the speed of file analysis—sometimes by a factor of 10 to 20—and I'm continuing to work on making RobotCode faster. However, there are some limitations beyond my control that cannot be accelerated. For example, Robot Framework runs only in Python and is interpreted, and it doesn't provide a way to define something as private or not externally visible.

Another important aspect is the project structure and the handling of imports, which affects the number of keywords and variables that need to be analyzed. In many projects, all resource files are collected into a single general resource file, which is then imported into every other Robot and resource file. This creates circular and unnecessary imports. While Robot Framework allows this, it's not considered good practice, and in other programming languages, you learn to avoid such patterns because they increase compile and execution time.

For example, if you import all 5,000 keywords into your suite—which can add up quickly—when a keyword is called in a test or keyword definition, all 5,000 keywords are checked to see if they match, and the best one is selected or evaluated for a better fit. If they are only normal keywords, this process is relatively quick, but if keywords with embedded arguments are involved, it becomes slower. This also impacts code completion and semantic highlighting, among other things.

I hope this clarifies why the language server behaves as it does and provides some insights into optimizing your project structure for better performance.

Could you please provide more details about your project?

  • How many .robot and resource files do you have?
  • How many libraries and variables are being used?
  • How is your project structured, especially regarding imports?
  • What do your import statements look like?
  • Are you using a .gitignore or .robotignore file?
  • What other files are present in your project (e.g., .venv, build, etc.)?
  • Besides the background analysis of files, are you experiencing any other issues in your project?

Understanding these details can help identify specific areas where performance can be further optimized.

d-biehl avatar Oct 30 '24 20:10 d-biehl

Thanks for the quick answer!

The problematic project is managed by my colleagues, so I only have a surface-level understanding of it at the moment. That said, I'm fairly certain that it has one or multiple libraries filled with hundreds of utility resource files, and there are hundreds of tests, of which many import significant chunks of these through resources collecting the other resources (exactly what you described).

I was fairly sure that this structure is no friend of speed, which is why I made a similar example to show the analyzation being slow enough to be visible to the eyes. Regarding resources and libraries, I also see the necessity of analyzing all files considering how many things are globally scoped, but I'm not sure about .robot files. As far as I know, those are test entry points, so one robot file should never impact another one in any analyzable fashion, as they should only appear in the same test execution if they are selected when selecting/filtering them with the robot.exe, which should not be possible to be assumed in any useful manner as far as analyzation goes. (Although I could be wrong about that, please correct me if it's not true)

If I'm wrong you can skip this paragraph :D, but in case my assumption is true, that would mean that at least non-open robot files can be skipped. Based on the debug progress I see on the bottom toolbar, this could provide enough speedup, although I could be misinterpreting what I see. However, that could also mean that the discovery of their dependencies could be skipped and the analysis of any resources that are not depended upon, even when considering imports affected by global variables.

The actual maintainers of the project are not developers by trade, but testers, so I'm not sure how much optimization of the project structure can be kept in the long run, as they tend to gravitate toward things being convenient rather than performant, which is why as a first attempt I hoped the realization could be optimized to an extent where they can't humanly produce enough tests to make it slow, but regardless, it is a fair point that the project should be optimized. Your recent optimizations do show by the way, the load time did visibly improve recently so your efforts are greatly appreciated!

As for the specific details you asked about the project, I'll have to familiarize myself with it more to give a proper answer. (Do you know of some effective way to map out the import tree in a human-readable fashion? It's okay if not, I should be able to find the info regardless, just asking in case you have some speedy way on hand) I think I'll be able to do that sometime next week.

Thanks a lot for your support!

Heck-R avatar Oct 31 '24 23:10 Heck-R

Hi @d-biehl I've collected the infos you asked Due to the size of the project and dependencies together, the values are based on going through files manually in a surface level (for the structure), and some regex searching (for the variable count), so they're likely not perfect, but should be relatively accurate

How many .robot and resource files do you have?

.robot - 55 .resource - 575 over the project and depended libraries (this was a search based on file extension, so it's accurate, which means I somehow over-counted the detailed number of resources in the import structure section, as that adds up to more, so imagine the numbers to be within ~10% of reality there)

How many libraries and variables are being used?

  • loaded via robotcode settings (with robotcode.robot.variableFiles): 17 fixed variables plus collects variables from the imports
    • Variable Lib 1 (496 variables)
    • Variable Lib 2 (832 variables)
  • The above Variable libs are discovered through imports as well as seen in the imports section, but that should not increase their number (should result in the same values as far as I know)
  • Variables in robot & resource files (under the *** Variables *** section) ~ 0-30 per file, 1700 overall (these are all different from variable files above)

How is your project structured, especially regarding imports?

This is going to be a long one, so first I'll define the format I'll mostly use to make it comprehensible Top-level points describe a file type functionally, sub points refer to other top-level points that are imported in the current top level point, e.g.:

  • File doing the importing (number of such files in total in the repo or in depended util packages)
    • File being imported (number of imports per file above)

Actual structure

Repo

  • test suite robot (55 in total): No test implementation, imports everything together for the suite, sets suite level setup and teardowns, contains the human readable info of the test, and only calls the test logic which is 1 keyword
    • suite utils resource (0-1 per suite)
    • test resource (1-30 per suite)
  • suite utils resource (45 in total): test suite specific utils and variables
    • wider scale product specific remote library interface (1 import in 38 suite utils, no mix with the one below)
    • narrower scale product specific remote library interface (1 import in 7 suite utils, no mix with the one above)
    • repo utils (varies per test resource but usually 0-1 which is actually necessary):
  • test resource (399 in total): There is 1 main keyword representing the implementation of the test, which calls a varying number of step keywords (also implemented in this file) representing the implementation of test steps representing the original test specification, and a varying number of test specific util keywords
    • suite utils resource (same thing as above): This is present in effectively all test resources, but commented out in most, probably due to the suite import above -> repeated import
    • wider scale product specific remote library interface (1 import in 3 test resources) -> repeated import
  • repo utils (11 in total) 7 are python scripts
    • all of them import either numpy or matplotlib. 4 are resources
    • 1 wrapping 4 of the above python scripts containing 28 functions / keywords in total
    • Lib 4 (Resource import, but one that collect all of it)
  • wider scale product specific remote library interface (1 in total): collecting testing keywords for a related product used in testing this product, and including wider product related keywords (22) using some of the imported keywords
    • narrower scale product specific remote library (1 import)
  • narrower scale product specific remote library interface (1 in total)
    • Lib 1 (Resource import, but one that collect all of it)
    • Lib 2 (Resource import, but one that collect all of it)
    • common library in python (1 module as a Library import)

Libraries

(I'll have to use generalized names as these are internal libraries I can't share)

  • Lib 1 - 57 resources collected into a single importable point
    • Lib 4
    • Variable Lib 2
  • Lib 2 - 1 python, 22 resources collected into a single importable point
    • common library in python (1-1 util per resource, but most of it overall)
    • Lib 4
    • Lib 5
    • Variable Lib 2 (imports smaller colelctions per resource, but imports most of it overall)
  • Lib 3 - 14 python, 10 resource collected into a single importable point
    • Variable Lib 1
    • common library in python
  • Lib 4 - 8 python files and 2 resources collected into a single importable point
    • common library in python
    • a remote library adressed using lib arguments (includes ~ 100-300 keywords depending on the situation)
    • Variable Lib 1 & 2 with a dynamic python variable file, that also discovers the variable files on the provided paths to builds some reference between the variables based on the folder structure (UI automation paths from automation IDs) this is parameterized with a global variable that is appended with paths to discover in the variable libs Sounds and looks terrifying, and I'll have a talk with the creator about that, but it does not seem to impact the analysation time (I added an immediate return to its initialization to skip file discovery)
  • Lib 5 - 1 python, 9 resources, manually writtem RoF gRPC client wrapping keywords collected into a single importable point
    • gRPC client collector library
    • Variable Lib 2
  • Variable Lib 1 - 53 python var files collected with 11 resource files into a single importable point (496 variables)
    • Lib 4 (only 1 specific util py, so there is no circular import, but due to this the libs technically depend on each other both ways)
  • Variable Lib 2 - 36 python var files collected with 6 resource files into a single importable point (832 variables)
    • Lib 4 (only 1 specific util py, so there is no circular import, but due to this the libs technically depend on each other both ways)
  • common library in python (1): Using some 3rd party python packages like pyautogui and xmltodict for keyword libraries (10 py files)
  • gRPC client collector library (1): Imported with parameters (same as the wrapped libs)
    • generated gRPC client libraries (15)
  • generated gRPC client libraries (15 in total, 1 per gRPC service): Imported with parameters (the same parameters identifying the server). These do provide keywords, as this is custom generated RoF on top of general python protobuf

What do your import statements look like?

99% are one of the following (ignore spacing)

  • Resource <library>/path/to.resource (this gathers all resources in a tree-like method, where a resource imports everything in their folder, and a similar resource from the folders next to the resource)
  • Resource relative/path/to.resource
  • Library relative/path/to.py
  • Library <library>/path/to.py
  • Library <library>/path/to.py WITH NAME SIMRSLIB

And there is a very select few that has parameterization

  • Library <library>/path/to.py arg_name_1=${variable_name_1} arg_name_2=${variable_name_2}

Are you using a .gitignore or .robotignore file?

1 .gitignore at the root of the project, which ignores the libraries in the venv folder (it further ignores some pycache, editor stuff, but nothing else should be relevant, many are visibly just a copy from another similar repo)

What other files are present in your project (e.g., .venv, build, etc.)?

As for general folders

  • .robotcode_cache (Sometimes I turn the caching target to be files, but mostly not. When it's present, it's gitignored)
  • .vscode (with config jsons)
  • README.md
  • CHANGELOG.md
  • requirements.txt

As for file types, the following PowerShell script

$extensions = @()
Get-ChildItem -File -Path . -Recurse | % {
  if ($_.FullName -match "\\venv\\") {return}
  $split = ($_.FullName -split "\.");
    if ($split.Count -gt 1 -and $extensions -notcontains $split[-1]) {
    $extensions += $split[-1]
  }
}

gave the following output

gitignore
md                                              
txt
pkl
json
yaml  
resource
robot
dat
src
xml
sub
ini
dll
so
SAF
IMG
bin
py
pyc

As you can see it excludes libraries, but those are gitignored anyway, and only py, and resource files are imported, other files are only used by the logic. All non-obvious extensions are relatively small resource files (excluding the venv, the files are ~15MB together, and most of that are due to some small binary resources used only by the logic at execution time)

Besides the background analysis of files, are you experiencing any other issues in your project?

No, previously I though that any change makes the whole repo being re-analyzed, but that either went away with the optimization lately, or I was mistaken due to modifying resources that were related to too many things (e.g. the "narrower scale product specific remote library interface" mentioned above)

Thanks in advance for checking it out! :)

Heck-R avatar Nov 06 '24 18:11 Heck-R