jfx icon indicating copy to clipboard operation
jfx copied to clipboard

8266396: Save VSCMD_DEBUG output in Windows build

Open jgneff opened this issue 4 years ago • 17 comments

The Windows build calls a series of batch files to get the Visual Studio paths and environment variables. The batch files are a black box: any messages they print are discarded. If anything goes wrong, the only signs are a vague Gradle exception and a corrupted properties file.

This has been causing problems for years. There are at least 49 messages on the mailing since 2017 that discuss the exception and ways to work around it.

This pull request lets you enable the batch file messages, shedding light on their internal workings and allowing you to catch any errors at their source. Specifically, it adds the variable VSCMD_DEBUG to the environment of genVSproperties.bat and documents its use.

Before

For example, if your PATH environment variable is missing C:/Windows/System32, like mine was, you'll see the following errors:

$ gradle sdk
Dependency verification is an incubating feature.

FAILURE: Build failed with an exception.

* Where:
Script 'C:\cygwin64\home\john\src\jfx\buildSrc\win.gradle' line: 108

* What went wrong:
A problem occurred evaluating script.
> FAIL: WINSDK_DIR not defined

    ︙

BUILD FAILED in 4s

Me: What's a WINSDK_DIR? The Wiki says, "This means that you will have to define these paths manually." I guess this is normal. 😕️

After

With this change, you can set the debug level to "3" in the win.gradle file and get a better picture of the problem:

$ gradle sdk
Dependency verification is an incubating feature.

> Configure project :
'reg' is not recognized as an internal or external command,
operable program or batch file.
'reg' is not recognized as an internal or external command,
operable program or batch file.
'reg' is not recognized as an internal or external command,
operable program or batch file.

    ︙

'powershell.exe' is not recognized as an internal or external command,
operable program or batch file.

FAILURE: Build failed with an exception.

* Where:
Script 'C:\cygwin64\home\john\src\jfx\buildSrc\win.gradle' line: 108

* What went wrong:
A problem occurred evaluating script.
> FAIL: WINSDK_DIR not defined

    ︙

BUILD FAILED in 3s

Me: Oh, it's just a "command not found" error. I'll check my PATH. 🤓


Progress

  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue
  • [ ] Change must be properly reviewed

Issue

  • JDK-8266396: Save VSCMD_DEBUG output in Windows build

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jfx pull/488/head:pull/488
$ git checkout pull/488

Update a local copy of the PR:
$ git checkout pull/488
$ git pull https://git.openjdk.java.net/jfx pull/488/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 488

View PR using the GUI difftool:
$ git pr show -t 488

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jfx/pull/488.diff

jgneff avatar May 06 '21 20:05 jgneff

:wave: Welcome back jgneff! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar May 06 '21 20:05 bridgekeeper[bot]

Webrevs

mlbridge[bot] avatar May 06 '21 20:05 mlbridge[bot]

Notes

The sequence of nested calls to the batch files is shown below, where C:/Program Files (x86)/Microsoft Visual Studio is the root directory of the Visual Studio files:

buildSrc/genVSproperties.bat
↳ "2019/Community/VC/Auxiliary/Build/vcvarsall.bat"
↳ "2019/Community/Common7/Tools/VsDevCmd.bat"
↳ "2019/Community/Common7/Tools/vsdevcmd/core/winsdk.bat"

The file VsDevCmd.bat calls powershell.exe to send telemetry to Microsoft ~when VSCMD_DEBUG is defined~. [EDIT: By default, it always calls powershell.exe to send telemetry. If VSCMD_DEBUG is defined, the call is synchronous; otherwise, the call is asynchronous and its standard output is discarded.] To avoid sending telemetry, define the environment variable VSCMD_SKIP_SENDTELEMETRY. For example, in Cygwin you can define:

export VSCMD_SKIP_SENDTELEMETRY=1

This pull request sets VSCMD_DEBUG to the empty string (""). In batch files, setting a variable to the empty string is equivalent to leaving the variable undefined. Furthermore, setting the value in Gradle to the string "3" is the same as setting the value to the number 3. To verify, I replaced genVSproperties.bat with this batch file:

@echo off
if defined VSCMD_DEBUG (
    echo VSCMD_DEBUG is defined 1>&2
) else (
    echo VSCMD_DEBUG is not defined 1>&2
)
if "%VSCMD_DEBUG%" NEQ "" (
    echo VSCMD_DEBUG = "%VSCMD_DEBUG%" 1>&2
) else (
    echo VSCMD_DEBUG is the empty string "%VSCMD_DEBUG%" 1>&2
)

We could instead set the default value to "3". In that case, I get two error messages during the Windows build due to following registry query failing on my system:

$ reg query "HKLM\SOFTWARE\Microsoft\VisualStudio\VSPerf" /v "CollectionToolsDir2019"
ERROR: The system was unable to find the specified registry key or value.

Wiki

I suggest that we remove the "Missing paths issue" section from the Wiki, added on October 31, 2019. It only led me astray. If you get WINSDK_DIR not defined, it's an error, and you should find its source and fix it. This pull request can help.

jgneff avatar May 06 '21 20:05 jgneff

@jgneff Unknown command v - for a list of valid commands use /help.

openjdk[bot] avatar May 06 '21 20:05 openjdk[bot]

Mailing list message from John Neffenger on openjfx-dev:

On 5/6/21 1:46 PM, John Neffenger wrote:

$ reg query "HKLM\SOFTWARE\Microsoft\VisualStudio\VSPerf" \

The last part of my comment was truncated by Skara when it tripped over the '/v' Windows command-line option. It's repeated below for the mailing list ...

We could instead set the default value to "3". In that case, I get two error messages during the Windows build due to following registry query failing on my system:

$ reg query "HKLM\SOFTWARE\Microsoft\VisualStudio\VSPerf" /v "CollectionToolsDir2019" ERROR: The system was unable to find the specified registry key or value.

### Wiki

I suggest that we remove the "Missing paths issue" section from the Wiki, added on October 31, 2019. It only led me astray. If you get `WINSDK_DIR not defined`, it's an error, and you should find its source and fix it. This pull request can help.

mlbridge[bot] avatar May 06 '21 21:05 mlbridge[bot]

It would be more convenient to ask the developer set VSCMD_DEBUG via an env variable, rather than asking them to edit win.gradle. That way, the instructions would be:

    export VSCMD_DEBUG=3
    export VSCMD_SKIP_SENDTELEMETRY=1
    gradle

kevinrushforth avatar May 07 '21 13:05 kevinrushforth

It would be more convenient to ask the developer set VSCMD_DEBUG via an env variable, rather than asking them to edit win.gradle.

Thanks, Kevin. That is the most direct approach. I didn't document it that way for the following reasons:

  1. Using the export command under Cygwin sets the variable for the entire build. In the current commit, the variable, when defined, is set only for the duration of the genVSproperties.bat batch file and the batch files it calls.
  2. I would like to see this documented in the code, even if just as comments, as opposed to the Wiki. When I got the error, the first thing I did was search through the code base for WINSDK_DIR. I would have found the lines added by this pull request right away.

Don't let the name confuse you. The variable is not a debugging variable; it doesn't enable debug trace logging. What it actually does is prevent the batch files from sending both standard output and standard error to "NUL", which they do by default. Defining it with VSCMD_DEBUG=3 really means, "Please don't silently discard really important error messages!" You don't expect all error messages to be hidden from you when you invoke 3,985 Windows batch file statements, but that's exactly what's happening now:

$ cloc trace.bat
       1 text file.
       1 unique file.                              
       0 files ignored.

github.com/AlDanial/cloc v 1.82  T=0.01 s (73.6 files/s, 438775.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
DOS Batch                        1           1938             40           3985
-------------------------------------------------------------------------------

I'll up the ante and suggest that we set the value to "3" by default in the build file (and disable telemetry). Then the batch files will behave as one would expect. I tested all values of the variable (undefined, valid, and invalid), and the generated file windows_tools.properties was identical in all cases.

Otherwise, where do you suggest we document this? If you disfavor the current options ("" and "3"), what do you think of making it three comment lines in the win.gradle file?

jgneff avatar May 07 '21 16:05 jgneff

I committed a better proposal, thanks to Kevin's suggestion, and after seeing what it takes to create the trace.bat file in my previous comment.

Unfortunately, it's not enough to define VSCMD_DEBUG externally. The cmd /q option (in win.gradle) and the redirection of output to NUL (in genVSproperties.bat) conspire to suppress the requested information, regardless of the variable's value.

So this commit adds the minimum changes for VSCMD_DEBUG to work. Instead of redirecting output to NUL, it redirects output to a new log file called vcvarsall.log. Because the trace-level debugging of VSCMD_DEBUG=3 depends on echo being enabled, it adds echo off and echo on commands selectively in the OpenJFX batch file instead of using the global cmd /q option in the Gradle build file.

The table below shows the number of debugging lines that are logged for each value of VSCMD_DEBUG, both before this pull request (on the master branch) and after the latest commit (on the pull request branch):

VSCMD_DEBUG Level Before After
undefined none 0 10
1 basic 0 93
2 detailed 0 615
3 trace 2 5,832

With this change, VSCMD_DEBUG actually works as intended. When the build is successful, there's just an extra 10-line log file in the build directory. If something goes wrong, though, developers can set the environment variable and run gradle help. The log file will then contain all the information needed to fix the error. If they're having trouble figuring it out, they can post the log file and ask for help on the mailing list.

jgneff avatar May 10 '21 06:05 jgneff

To understand the build error that occurred with my last commit, you need to know three things:

  1. by default, the JavaFX build on Windows always sends telemetry to Microsoft each time it runs the vcvarsall.bat batch file,
  2. by default, the process that sends the telemetry is asynchronous, and
  3. the Windows build runs that batch file twice, once for x86 (via vcvars32.bat) and again for x64.

The actual build error is way back at Line 182 with the message "The process cannot access the file because it is being used by another process." It's a timing problem that occurs only when both VSCMD_SKIP_SENDTELEMETRY and VSCMD_DEBUG are undefined (or set to the empty string). In that case, the Visual Studio batch files start an asynchronous PowerShell script to send telemetry to Microsoft. Because the asynchronous process will continue to use the same standard error file descriptor as the batch file that started it, we hit the error when we try to run the batch file a second time before the PowerShell script has completed. If the PowerShell script completes before we run the batch file again, the error does not occur. ~I didn't hit the error because powershell.exe was not found on my PATH, essentially disabling the telemetry.~ [EDIT: The error still occurs after the asynchronous call to START "" /B powershell.exe even when the PowerShell executable is not found. I may have had the telemetry disabled, but I'm not certain.]

The fix is simple: skip sending the telemetry. We should probably be doing that, anyway.

An alternative, and perhaps safer, fix is not to redirect standard error when calling the vcvarsall.bat batch file. In that case, the trace messages would go to the log file but the error messages would go to the console, making it difficult to associate any errors with the commands that caused them.

jgneff avatar May 10 '21 22:05 jgneff

With today's fix, the table in my prior comment is the same. In the "Before" column, the messages are printed with the normal Gradle output. In the "After" column, they're found in the new vcvarsall.log file. The table is included again below, showing the number of lines logged:

VSCMD_DEBUG Level Before After
undefined none 0 10
1 basic 0 93
2 detailed 0 615
3 trace 2 5,832

No more "black box" Windows batch files.

jgneff avatar May 11 '21 01:05 jgneff

/reviewers 2

kevinrushforth avatar May 11 '21 11:05 kevinrushforth

@kevinrushforth The number of required reviews for this PR is now set to 2 (with at least 1 of role reviewers).

openjdk[bot] avatar May 11 '21 11:05 openjdk[bot]

I wrote a Bash shell script, included below, that can help in reviewing this pull request. It isolates the call to vcvarsall.bat, making it easier to experiment with different choices, such as:

  • leaving VSCMD_SKIP_SENDTELEMETRY undefined,
  • adding the /q option to cmd,
  • removing 2>&1 to use the default standard error, or
  • discarding standard output with > /dev/null.
#!/bin/bash
# Tests the Visual Studio 'vcvarsall.bat' batch file
trap exit INT TERM
set -o errexit

# Path to 'vcvarsall.bat' batch file
vsroot="C:\\Program Files (x86)\\Microsoft Visual Studio"
vcfile="2019\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat"
vcpath="$vsroot\\$vcfile"

# Windows command interpreter and options
#   /C  Carries out the command specified by string and then terminates
#   /Q  Turns echo off
command="cmd /c"

# Skips calling PowerShell script to send telemetry
export VSCMD_SKIP_SENDTELEMETRY="1"
printf "VSCMD_SKIP_SENDTELEMETRY=\"%s\"\n" "$VSCMD_SKIP_SENDTELEMETRY"

# VSCMD_DEBUG undefined
unset VSCMD_DEBUG
printf "Testing VSCMD_DEBUG=\"%s\"\n" "$VSCMD_DEBUG"
$command "$vcpath" x86 > 0-debug.log 2>&1
$command "$vcpath" x64 >> 0-debug.log 2>&1

# VSCMD_DEBUG = 1 (basic), 2 (detailed), and 3 (trace)
for n in 1 2 3; do
    export VSCMD_DEBUG="$n"
    printf "Testing VSCMD_DEBUG=\"%s\"\n" "$VSCMD_DEBUG"
    $command "$vcpath" x86 > $n-debug.log 2>&1
    $command "$vcpath" x64 >> $n-debug.log 2>&1
done

I run the script under Cygwin on Windows and evaluate its output as follows:

$ ./vcvarstest.sh
VSCMD_SKIP_SENDTELEMETRY="1"
Testing VSCMD_DEBUG=""
Testing VSCMD_DEBUG="1"
Testing VSCMD_DEBUG="2"
Testing VSCMD_DEBUG="3"
$ wc -l *.log
    10 0-debug.log
    92 1-debug.log
   508 2-debug.log
  5690 3-debug.log
  6300 total

jgneff avatar May 13 '21 19:05 jgneff

@jgneff, do you have an idea of what kind of errors would be better detected with these changes? For instance, I've changed the PATH environment variable so that it no longer includes /cygdrive/c/Windows/System32. When running ./gradlew help it fails as expected with the known message:

> FAIL: WINSDK_DIR not defined

However, while inspecting the build/vcvarsall.log file to look for clues, it didn't reveal anything interesting to point me to the possible cause of the gradle failure. From the looks of it, it even seems to have been successfully executed.

tiainen avatar May 25 '21 12:05 tiainen

@jgneff, do you have an idea of what kind of errors would be better detected with these changes?

Thanks for trying it, Joeri. The intent is to make all such errors better detected, but that can happen only when you run the build with VSCMD_DEBUG=3 and when you're able to find the errors in the log file.

Without VSCMD_DEBUG=3, the batch files discard the output and errors of each reg query command, as shown below in the file winsdk.bat:

:GetWin10SdkDir

if "%VSCMD_DEBUG%" GEQ "3" goto :GetWin10SdkDirVerbose

call :GetWin10SdkDirHelper HKLM\SOFTWARE\Wow6432Node > nul 2>&1

In addition, it helps to have an idea what to look for in the log file because not all errors are prefixed with something helpful like ERROR. In the test that you ran, the following steps could lead you to the source of the problem:

$ export VSCMD_DEBUG=3
$ gradle sdk
   ︙
> FAIL: WINSDK_DIR not defined
   ︙
$ grep -A2 'reg query' build/vcvarsall.log
C:\cygwin64\home\john\src\jfx>for /F "tokens=1,2*" %i in
('reg query "HKLM\SOFTWARE\Wow6432Node\Microsoft\Microsoft SDKs\Windows\v10.0" /v
"InstallationFolder"') DO (if "%i" == "InstallationFolder" (SET WindowsSdkDir=%~k ) )
'reg' is not recognized as an internal or external command,
operable program or batch file.
   ︙

Not everyone will be able to figure it out, but with the information captured in the log file, there's a better chance at getting help on the mailing list. What's happening now is that, without the information to fix the problem, people like me are giving up on the build and either creating the properties file by hand or creating alternative build scripts based on PowerShell.

This pull request chooses between two not-so-great options:

  1. Redirect errors to the log file. In this case, the error messages directly follow the commands that caused them. On the other hand, the errors are difficult to find because the log file has so much trace output.

  2. Do not redirect errors and let them print on the console with the rest of the build output. In this case, the error messages are easy to find. On the other hand, they're no longer associated with the commands that caused them. For example, some of the error messages are unhelpful on their own:

ERROR: The system was unable to find the specified registry key or value.

When errors are redirected to the log file, though, you can find their source:

$ grep -B1 ^ERROR build/vcvarsall.log
C:\cygwin64\home\john\src\jfx>for /F "tokens=1,2*" %i in
('reg query "HKLM\SOFTWARE\Microsoft\VisualStudio\VSPerf" /v "CollectionToolsDir2019"')
DO (if "%i" == "CollectionToolsDir2019" (SET "__collection_tools=%k" ) )
ERROR: The system was unable to find the specified registry key or value.
--
C:\cygwin64\home\john\src\jfx>for /F "tokens=1,2*" %i in
('reg query "HKLM\SOFTWARE\Microsoft\VisualStudio\VSPerf" /v "CollectionToolsDir2019"')
DO (if "%i" == "CollectionToolsDir2019" (SET "__collection_tools=%k" ) )
ERROR: The system was unable to find the specified registry key or value.

I'm open to other suggestions on how to surface the error messages.

jgneff avatar May 25 '21 21:05 jgneff

@jgneff Unknown command v - for a list of valid commands use /help.

openjdk[bot] avatar May 25 '21 21:05 openjdk[bot]

Thank you for the review, Marius.

But turns out the whole build folder should be deleted first as otherwise you will get the error again immediately.

Yes, that's a good point! If you don't delete the build directory after the error, you'll get stuck and even gradle clean and gradle cleanAll won't let you recover. That's not obvious the first time it happens.

I think it will be more clear for everyone including new developers. :)

I hope so, especially considering that the Windows platform is probably our largest source of potential new contributors. I fell into this trap after building JavaFX on Windows for years, though, so it might even help old developers, too. 😄

jgneff avatar May 31 '21 03:05 jgneff

I would like to give this PR a bump. I still think this is a nice and safe improvement. @jgneff this branch needs to be updated as it has merge conflicts now.

Maran23 avatar Nov 01 '22 12:11 Maran23

I would like to give this PR a bump. I still think this is a nice and safe improvement.

Thanks, Marius. I appreciate the time you gave for a thorough review.

Nonetheless, the pull request may have already served its purpose. Searching for WINSDK_DIR on Google gives me the following results:

  1. "build on win10 fails · Issue 288 · javafxports/openjdk-jfx"
  2. A Visual Studio 2008 problem
  3. A Visual Studio 2015 problem
  4. "[JDK-8266396] Save VSCMD_DEBUG output in Windows build"

The first result is an issue in the old sandbox repository. It ends with the following comment linking to the bug report associated with this pull request (also the fourth result):

"Since this issue is pretty common and many others could get here in search of a solution, I found it here: JDK-8266396"

Building on Windows is surprisingly easy, but it took me four years to get there. Perhaps it's time for a more fundamental solution: move the build instructions into the repository, just like the OpenJDK project did, so that we all can participate in keeping them up to date. Changes to the instructions would be made in the same pull requests that change the build.

jgneff avatar Nov 02 '22 16:11 jgneff

My previous comment is confusing no matter how many times I edit it. :confused: Let me try again.

I would like to give this PR a bump.

Actually, I was planning to close it. One of my goals was to document how to find the true source of the WINSDK_DIR error, and that seems to have worked (see above). It's also very easy to find this pull request by searching the repository for the error message. That visibility might be even better than the fix. Unless you want to take it over, I'll close it later this month.

jgneff avatar Nov 03 '22 01:11 jgneff

Building on Windows is surprisingly easy, but it took me four years to get there. Perhaps it's time for a more fundamental solution: move the build instructions into the repository, just like the OpenJDK project did, so that we all can participate in keeping them up to date. Changes to the instructions would be made in the same pull requests that change the build.

I agree and really like this idea. We might need to pitch this idea in the mailing list as well.

Actually, I was planning to close it

Okay, I understand. I still think it might be worth documenting it, maybe via your idea above (in some kind of building.md).

Maran23 avatar Nov 03 '22 12:11 Maran23

Perhaps it's time for a more fundamental solution: move the build instructions into the repository, just like the OpenJDK project did, so that we all can participate in keeping them up to date. Changes to the instructions would be made in the same pull requests that change the build.

I agree and really like this idea. We might need to pitch this idea in the mailing list as well.

I also like this idea. @andy-goryachev-oracle also mentioned something similar to me in a slightly different context that got me thinking we should do it here as well.

kevinrushforth avatar Nov 03 '22 12:11 kevinrushforth

Closing this pull request as we discussed in the previous comments.

jgneff avatar Dec 06 '22 17:12 jgneff