Widespread wrong handling of space in File->URI->File conversion?!
After fixing https://github.com/eclipse-jdt/eclipse.jdt.debug/pull/66 i took a look where else URLs might be decoded wrong
org.eclipse.core.runtime.URIUtil seems to be wrong even though it's javadoc explicitly says it can handle space
File file = new File("c:/my test");
System.out.println("file=" + file); // c:\my test OK
URL url = file.toURI().toURL();
System.out.println("url=" + url); // file:/c:/my%20test OK
URL fileUrl = FileLocator.toFileURL(url);
System.out.println("fileUrl=" + fileUrl); // file:/c:/my%20test OK
URI uri = org.eclipse.core.runtime.URIUtil.toURI(url);
System.out.println("uri=" + uri); // file:/c:/my%2520test <- %25 means "%" WTF
File toFile = org.eclipse.core.runtime.URIUtil.toFile(uri);
System.out.println("toFile=" + toFile); // c:\my%20test <- should be c:\my test
// should be:
File good = new File(url.toURI());
System.out.println("good=" + good); // c:\my test OK
other wrong decodings new File(.*url.*getPath()) (regular pattern search) are in
- org.eclipse.ant.internal.core.ant.InternalAntRunner.setJavaClassPath()
- org.eclipse.ant.internal.ui.editor.utils.ProjectHelper.ProjectHandler.onStartElement(String, String, String, Attributes, AntXMLContext)
- org.eclipse.ant.internal.ui.model.AntDefiningTaskNode.setJavaClassPath(URL[])
- org.eclipse.core.internal.runtime.AuthorizationHandler.loadKeyring()
- org.eclipse.core.tests.internal.runtime.PlatformURLSessionTest.createData(String)
- org.eclipse.e4.ui.internal.workbench.swt.E4Application.getVersionFile(URL, boolean)
- org.eclipse.ui.internal.ide.application.IDEApplication.getVersionFile(URL, boolean)
- org.eclipse.ecf.provider.filetransfer.browse.LocalFileSystemBrowser.LocalFileSystemBrowser(IFileID, IRemoteFileSystemListener)
- org.eclipse.ecf.provider.filetransfer.outgoing.LocalFileOutgoingFileTransfer.openStreams()
- org.eclipse.jdt.internal.core.index.IndexLocation.createIndexLocation(URL)
- org.eclipse.urischeme.internal.registration.FileProvider.getFilePath(URL)
Another wrong pattern is new File(.*url.*getFile())
File urlGetFile = new File(url.getFile());
System.out.println("urlGetFile=" + urlGetFile); // c:\my%20test <- should be c:\my test
in places like:
- AbstractCSSEngine.java - org.eclipse.e4.ui.css.core/src/org/eclipse/e4/ui/css/core/impl/engine (2 matches)
- AbstractJavaModelTests.java - org.eclipse.jdt.core.tests.model/src/org/eclipse/jdt/core/tests/model
- AntTestPlugin.java - org.eclipse.ant.tests.core/test plugin/org/eclipse/ant/tests/core/testplugin
- AntUITestPlugin.java - org.eclipse.ant.tests.ui/test plugin/org/eclipse/ant/tests/ui/testplugin
- ChooseWorkspaceData.java - org.eclipse.ui.ide/src/org/eclipse/ui/internal/ide (4 matches)
- ClassLoaderTools.java - org.eclipse.test/src/org/eclipse/test
- ConfigurationActivator.java - org.eclipse.update.configurator/src/org/eclipse/update/internal/configurator
- ConfigurationParser.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/update/configurator
- ConfigurationParser.java - org.eclipse.update.configurator/src/org/eclipse/update/internal/configurator
- ConfigurationSessionTestSuite.java - org.eclipse.core.tests.harness/src/org/eclipse/core/tests/session
- ContentDetectHelper.java - org.eclipse.ui.intro.universal/src/org/eclipse/ui/internal/intro/universal/contentdetect
- DebugCorePlugin.java - org.eclipse.debug.examples.core/src/org/eclipse/debug/examples/core/pda
- E4Application.java - org.eclipse.e4.ui.workbench.swt/src/org/eclipse/e4/ui/internal/workbench/swt
- EEVMType.java - org.eclipse.jdt.launching/launching/org/eclipse/jdt/internal/launching (2 matches)
- FeatureParser.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/update/configurator
- FeatureParser.java - org.eclipse.update.configurator/src/org/eclipse/update/internal/configurator
- FileSystemAccess.java - org.eclipse.core.filesystem/src/org/eclipse/core/internal/filesystem
- FileTool.java - org.eclipse.core.filebuffers.tests/src/org/eclipse/core/filebuffers/tests
- FileTool.java - org.eclipse.search.tests/src/org/eclipse/search/tests
- FileTool.java - org.eclipse.ui.tests.harness/src/org/eclipse/ui/tests/harness/util
- FormatterMassiveRegressionTests.java - org.eclipse.jdt.core.tests.model/src/org/eclipse/jdt/core/tests/formatter
- HelpBasePlugin.java - org.eclipse.help.base/src/org/eclipse/help/internal/base
- HelpPlugin.java - org.eclipse.help/src/org/eclipse/help/internal
- IDEApplication.java - org.eclipse.ui.ide.application/src/org/eclipse/ui/internal/ide/application
- IncludedSchemaDescriptor.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/schema
- InternalPlatform.java - org.eclipse.core.runtime/src/org/eclipse/core/internal/runtime
- LaunchingPlugin.java - org.eclipse.jdt.launching/launching/org/eclipse/jdt/internal/launching (3 matches)
- LaunchTests.java - org.eclipse.debug.tests/src/org/eclipse/debug/tests/launching
- Main_in.java - org.eclipse.jdt.core.tests.model/workspace/Formatter/test492
- Main_out.java - org.eclipse.jdt.core.tests.model/workspace/Formatter/test492
- ModelTestsUtil.java - org.eclipse.jdt.core.tests.model/src/org/eclipse/jdt/core/tests/model
- OpenBundleResourceHandler.java - org.eclipse.help.ui/src/org/eclipse/help/ui/internal/handlers
- PlatformConfiguration.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/update/configurator (2 matches)
- PlatformConfiguration.java - org.eclipse.update.configurator/src/org/eclipse/update/internal/configurator (7 matches)
- PluginPathFinder.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core
- SchemaDescriptor.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/schema
- SchemaRegistry.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/schema
- SiteEntry.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/update/configurator (6 matches)
- SiteEntry.java - org.eclipse.update.configurator/src/org/eclipse/update/internal/configurator (8 matches)
- SmartImportTests.java - org.eclipse.ui.tests/Eclipse UI Tests/org/eclipse/ui/tests/datatransfer (5 matches)
- TargetPlatformHelper.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core
- TestsPlugin.java - org.eclipse.debug.tests/src/org/eclipse/debug/tests
- TestsPlugin.java - org.eclipse.debug.ui.launchview.tests/src/org/eclipse/debug/ui/launchview/tests
- ThemeEngine.java - org.eclipse.e4.ui.css.swt.theme/src/org/eclipse/e4/ui/css/swt/internal/theme
- UpdateManagerHelperDeprecated.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core
- URLFieldEditor.java - org.eclipse.ant.ui/Ant Tools Support/org/eclipse/ant/internal/ui/preferences
- Utils.java - org.eclipse.test.performance.ui/src/org/eclipse/test/performance/ui
I do not know when all of this is used, does anybody know if that can be a real issue? Can we just copy & paste the solution and trust it works?
Why do we need all this Util methods at all? Can't this be replaced with standard java?
I do not know when all of this is used, does anybody know if that can be a real issue?
Rule 1: Never use spaces in your eclipse workspace ;-) Rule 2: Always follow rule 1
Rule 1: Never use spaces in your eclipse workspace ;-)
Yea, I even had to disallow non-ascii characters for our developers for Workspace and Git repos.
Have no IDE at hand, but we not only support "file:" schema, also other schemas should be possible. So redirecting everything to "new File(url.toURI())" may not always work.
I suspect that some things may even rely on this wrong behavior, so sweeping changes and hoping/assuming all will work well is probably a bad assumption. At least on Windows, user IDs with spaces are not all that rare, but that does generally work okay despite this long list of questionable code...
As far as I remember this was required for remote file conversions, will try to find the related comment and usage.
Why do we need all this Util methods at all? Can't this be replaced with standard java?
I think it would definitely be better to work towards using standard Java API for that.
Replacing File.toURL() by File.toURI().toURL() is a similar topic that can be done at many locations in the Eclipse code base.
I do not know when all of this is used, does anybody know if that can be a real issue?
Rule 1: Never use spaces in your eclipse workspace ;-) Rule 2: Always follow rule 1
The 90s are calling and want their problems back. 😄
I suspect that some things may even rely on this wrong behavior, so sweeping changes and hoping/assuming all will work well is probably a bad assumption. At least on Windows, user IDs with spaces are not all that rare, but that does generally work okay despite this long list of questionable code...
Although I'm for fixing this, I share that concern. What we can do is to check how the produced URL is used. If it is only used by java API (e.g. a stream is opened from it) it should be save to fix it. If it is for example converted to a String and then later parsed again, we have to ensure that the parsing code can handle a correctly encoded URL. So maybe for such cases we have to ensure that both correct and incorrectly encoded URLs can be handled, then wait some time (one or two years?) and the 'fix' the remaining parts to produce correct URLs.
relates to https://bugs.eclipse.org/bugs/show_bug.cgi?id=145096#c4 FileLocator.resolve(URL) returns invalid file URL (spaces not escaped)
a long known bug where they did not change encoding to not break clients that might rely on wrong encoding
smile
I suspect that some things may even rely on this wrong behavior, so sweeping changes and hoping/assuming all will work well is probably a bad assumption. At least on Windows, user IDs with spaces are not all that rare, but that does generally work okay despite this long list of questionable code...
Although I'm for fixing this, I share that concern. What we can do is to check how the produced URL is used. If it is only used by java API (e.g. a stream is opened from it) it should be save to fix it. If it is for example converted to a String and then later parsed again, we have to ensure that the parsing code can handle a correctly encoded URL. So maybe for such cases we have to ensure that both correct and incorrectly encoded URLs can be handled, then wait some time (one or two years?) and the 'fix' the remaining parts to produce correct URLs.
Do we have any plan to fix this issue? What do you think about the suggested approach?
In the FileLocator we could work-around the compatibility problem by adding new methods like toFile/Path() that returns a File/Path directly. Isn't that what most users want to have any way?
For FileLocator.resolve() we could find an alternative, for example FileLocator.toNativeURL() or just FileLocator.resolve() that explicitly states that it encodes URLs correctly.
The existing wrong behaving methods could then be deprecated and removed after the usual deprecation period (or if necessary after an extended period).
My suggestion how to fix this is the following:
- Add factory/conversion methods to convert URLs to (file-) URIs or Paths and to parse Strings to URIs, which are both robust and can handle encoded and unencoded Strings/URLs.
- https://github.com/eclipse-equinox/equinox/pull/691
- https://github.com/eclipse-equinox/equinox/pull/920
- Internally migrate all usage of
URLto (properly encoded)URIand where afile-URL/URIis expected usePathas much as possible or at least make sure all used URLs are properly encoded.- I think that corresponds to the suggestions of the JDK makers:
- https://inside.java/2023/02/15/quality-heads-up/
- https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/net/URL.html#constructor-deprecation
- I think that corresponds to the suggestions of the JDK makers:
- Use the mentioned new (or suitable existing) methods wherever external 'user-input' is processed, e.g. when a location is read from the command-line or a config file.
- preserve existing APIs that return unencoded URLs as they are and provide alternatives using (properly encoded) URIs or Path if only file-URI/URLs are supported
- they can also be used for the internal migration
- Continue to emit unencoded URL-Strings for example into generated config-files when a secondary Eclipse is launched, at least for some time until the version that can parse proper URIs is mainly used (maybe the usual two years time-frame?).
This should preserve the greatest level of compatibility and the least breakages while allowing us to fix the miss-creation of URLs and to replace deprecated File.toURL() and URL constructors.
But even with the robust methods mentioned above there is a chance for breakage, for example if one has an unencoded URL with a path that contains a meta-character. Assume one has a file where one element is named with%20spaces (yes that's legal in Windows) and passes that as a URL file:/c:/with%20spaces/goo. If we try the standard way first that assumes proper encoding, the resulting path will be C:\with spaces\goo (on Windows) without an exception and thus the alternative way, assuming no encoding, will not be executed, although it would have lead to the actually intended C:\with%20spaces\goo.
But I think that's less likely and most problems come from not encoded spaces.
one has a file where one element is named
with%20spaces(yes that's legal in Windows)
https://github.com/search?q=path%3A%2520
=> 270_000 hits OMG
one has a file where one element is named
with%20spaces(yes that's legal in Windows)https://github.com/search?q=path%3A%2520
=> 270_000 hits OMG
It's indeed not impossible, but I don't know how it should be possible to avoid it besides not changing it, but then we don't get the benefits. And for spaces in paths there are even 156Mio hits: https://github.com/search?q=path%3A%22+%22&type=code
- where a
file-URL/URIis expected usePathas much as possible
A Path can not only be a File, so why restrict it here?