eclipse.platform icon indicating copy to clipboard operation
eclipse.platform copied to clipboard

Widespread wrong handling of space in File->URI->File conversion?!

Open jukzi opened this issue 3 years ago • 12 comments

After fixing https://github.com/eclipse-jdt/eclipse.jdt.debug/pull/66 i took a look where else URLs might be decoded wrong

org.eclipse.core.runtime.URIUtil seems to be wrong even though it's javadoc explicitly says it can handle space

		File file = new File("c:/my test");
		System.out.println("file=" + file); // c:\my test OK
		URL url = file.toURI().toURL();
		System.out.println("url=" + url); // file:/c:/my%20test OK
		URL fileUrl = FileLocator.toFileURL(url);
		System.out.println("fileUrl=" + fileUrl); // file:/c:/my%20test  OK

		URI uri = org.eclipse.core.runtime.URIUtil.toURI(url);
		System.out.println("uri=" + uri); // file:/c:/my%2520test <- %25 means "%" WTF
		File toFile = org.eclipse.core.runtime.URIUtil.toFile(uri);
		System.out.println("toFile=" + toFile); // c:\my%20test <- should be c:\my test

// should be:
		File good = new File(url.toURI()); 
		System.out.println("good=" + good); // c:\my test OK

other wrong decodings new File(.*url.*getPath()) (regular pattern search) are in

  • org.eclipse.ant.internal.core.ant.InternalAntRunner.setJavaClassPath()
  • org.eclipse.ant.internal.ui.editor.utils.ProjectHelper.ProjectHandler.onStartElement(String, String, String, Attributes, AntXMLContext)
  • org.eclipse.ant.internal.ui.model.AntDefiningTaskNode.setJavaClassPath(URL[])
  • org.eclipse.core.internal.runtime.AuthorizationHandler.loadKeyring()
  • org.eclipse.core.tests.internal.runtime.PlatformURLSessionTest.createData(String)
  • org.eclipse.e4.ui.internal.workbench.swt.E4Application.getVersionFile(URL, boolean)
  • org.eclipse.ui.internal.ide.application.IDEApplication.getVersionFile(URL, boolean)
  • org.eclipse.ecf.provider.filetransfer.browse.LocalFileSystemBrowser.LocalFileSystemBrowser(IFileID, IRemoteFileSystemListener)
  • org.eclipse.ecf.provider.filetransfer.outgoing.LocalFileOutgoingFileTransfer.openStreams()
  • org.eclipse.jdt.internal.core.index.IndexLocation.createIndexLocation(URL)
  • org.eclipse.urischeme.internal.registration.FileProvider.getFilePath(URL)

Another wrong pattern is new File(.*url.*getFile())

		File urlGetFile = new File(url.getFile());
		System.out.println("urlGetFile=" + urlGetFile); // c:\my%20test <- should be c:\my test

in places like:

  • AbstractCSSEngine.java - org.eclipse.e4.ui.css.core/src/org/eclipse/e4/ui/css/core/impl/engine (2 matches)
  • AbstractJavaModelTests.java - org.eclipse.jdt.core.tests.model/src/org/eclipse/jdt/core/tests/model
  • AntTestPlugin.java - org.eclipse.ant.tests.core/test plugin/org/eclipse/ant/tests/core/testplugin
  • AntUITestPlugin.java - org.eclipse.ant.tests.ui/test plugin/org/eclipse/ant/tests/ui/testplugin
  • ChooseWorkspaceData.java - org.eclipse.ui.ide/src/org/eclipse/ui/internal/ide (4 matches)
  • ClassLoaderTools.java - org.eclipse.test/src/org/eclipse/test
  • ConfigurationActivator.java - org.eclipse.update.configurator/src/org/eclipse/update/internal/configurator
  • ConfigurationParser.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/update/configurator
  • ConfigurationParser.java - org.eclipse.update.configurator/src/org/eclipse/update/internal/configurator
  • ConfigurationSessionTestSuite.java - org.eclipse.core.tests.harness/src/org/eclipse/core/tests/session
  • ContentDetectHelper.java - org.eclipse.ui.intro.universal/src/org/eclipse/ui/internal/intro/universal/contentdetect
  • DebugCorePlugin.java - org.eclipse.debug.examples.core/src/org/eclipse/debug/examples/core/pda
  • E4Application.java - org.eclipse.e4.ui.workbench.swt/src/org/eclipse/e4/ui/internal/workbench/swt
  • EEVMType.java - org.eclipse.jdt.launching/launching/org/eclipse/jdt/internal/launching (2 matches)
  • FeatureParser.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/update/configurator
  • FeatureParser.java - org.eclipse.update.configurator/src/org/eclipse/update/internal/configurator
  • FileSystemAccess.java - org.eclipse.core.filesystem/src/org/eclipse/core/internal/filesystem
  • FileTool.java - org.eclipse.core.filebuffers.tests/src/org/eclipse/core/filebuffers/tests
  • FileTool.java - org.eclipse.search.tests/src/org/eclipse/search/tests
  • FileTool.java - org.eclipse.ui.tests.harness/src/org/eclipse/ui/tests/harness/util
  • FormatterMassiveRegressionTests.java - org.eclipse.jdt.core.tests.model/src/org/eclipse/jdt/core/tests/formatter
  • HelpBasePlugin.java - org.eclipse.help.base/src/org/eclipse/help/internal/base
  • HelpPlugin.java - org.eclipse.help/src/org/eclipse/help/internal
  • IDEApplication.java - org.eclipse.ui.ide.application/src/org/eclipse/ui/internal/ide/application
  • IncludedSchemaDescriptor.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/schema
  • InternalPlatform.java - org.eclipse.core.runtime/src/org/eclipse/core/internal/runtime
  • LaunchingPlugin.java - org.eclipse.jdt.launching/launching/org/eclipse/jdt/internal/launching (3 matches)
  • LaunchTests.java - org.eclipse.debug.tests/src/org/eclipse/debug/tests/launching
  • Main_in.java - org.eclipse.jdt.core.tests.model/workspace/Formatter/test492
  • Main_out.java - org.eclipse.jdt.core.tests.model/workspace/Formatter/test492
  • ModelTestsUtil.java - org.eclipse.jdt.core.tests.model/src/org/eclipse/jdt/core/tests/model
  • OpenBundleResourceHandler.java - org.eclipse.help.ui/src/org/eclipse/help/ui/internal/handlers
  • PlatformConfiguration.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/update/configurator (2 matches)
  • PlatformConfiguration.java - org.eclipse.update.configurator/src/org/eclipse/update/internal/configurator (7 matches)
  • PluginPathFinder.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core
  • SchemaDescriptor.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/schema
  • SchemaRegistry.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/schema
  • SiteEntry.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core/update/configurator (6 matches)
  • SiteEntry.java - org.eclipse.update.configurator/src/org/eclipse/update/internal/configurator (8 matches)
  • SmartImportTests.java - org.eclipse.ui.tests/Eclipse UI Tests/org/eclipse/ui/tests/datatransfer (5 matches)
  • TargetPlatformHelper.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core
  • TestsPlugin.java - org.eclipse.debug.tests/src/org/eclipse/debug/tests
  • TestsPlugin.java - org.eclipse.debug.ui.launchview.tests/src/org/eclipse/debug/ui/launchview/tests
  • ThemeEngine.java - org.eclipse.e4.ui.css.swt.theme/src/org/eclipse/e4/ui/css/swt/internal/theme
  • UpdateManagerHelperDeprecated.java - org.eclipse.pde.core/src/org/eclipse/pde/internal/core
  • URLFieldEditor.java - org.eclipse.ant.ui/Ant Tools Support/org/eclipse/ant/internal/ui/preferences
  • Utils.java - org.eclipse.test.performance.ui/src/org/eclipse/test/performance/ui

I do not know when all of this is used, does anybody know if that can be a real issue? Can we just copy & paste the solution and trust it works?

jukzi avatar Jun 15 '22 08:06 jukzi

Why do we need all this Util methods at all? Can't this be replaced with standard java?

I do not know when all of this is used, does anybody know if that can be a real issue?

Rule 1: Never use spaces in your eclipse workspace ;-) Rule 2: Always follow rule 1

laeubi avatar Jun 15 '22 08:06 laeubi

Rule 1: Never use spaces in your eclipse workspace ;-)

Yea, I even had to disallow non-ascii characters for our developers for Workspace and Git repos.

jukzi avatar Jun 15 '22 08:06 jukzi

Have no IDE at hand, but we not only support "file:" schema, also other schemas should be possible. So redirecting everything to "new File(url.toURI())" may not always work.

iloveeclipse avatar Jun 15 '22 09:06 iloveeclipse

I suspect that some things may even rely on this wrong behavior, so sweeping changes and hoping/assuming all will work well is probably a bad assumption. At least on Windows, user IDs with spaces are not all that rare, but that does generally work okay despite this long list of questionable code...

merks avatar Jun 15 '22 09:06 merks

As far as I remember this was required for remote file conversions, will try to find the related comment and usage.

SarikaSinha avatar Jun 15 '22 09:06 SarikaSinha

Why do we need all this Util methods at all? Can't this be replaced with standard java?

I think it would definitely be better to work towards using standard Java API for that. Replacing File.toURL() by File.toURI().toURL() is a similar topic that can be done at many locations in the Eclipse code base.

I do not know when all of this is used, does anybody know if that can be a real issue?

Rule 1: Never use spaces in your eclipse workspace ;-) Rule 2: Always follow rule 1

The 90s are calling and want their problems back. 😄

I suspect that some things may even rely on this wrong behavior, so sweeping changes and hoping/assuming all will work well is probably a bad assumption. At least on Windows, user IDs with spaces are not all that rare, but that does generally work okay despite this long list of questionable code...

Although I'm for fixing this, I share that concern. What we can do is to check how the produced URL is used. If it is only used by java API (e.g. a stream is opened from it) it should be save to fix it. If it is for example converted to a String and then later parsed again, we have to ensure that the parsing code can handle a correctly encoded URL. So maybe for such cases we have to ensure that both correct and incorrectly encoded URLs can be handled, then wait some time (one or two years?) and the 'fix' the remaining parts to produce correct URLs.

HannesWell avatar Jul 17 '22 19:07 HannesWell

relates to https://bugs.eclipse.org/bugs/show_bug.cgi?id=145096#c4 FileLocator.resolve(URL) returns invalid file URL (spaces not escaped) a long known bug where they did not change encoding to not break clients that might rely on wrong encoding

jukzi avatar Oct 06 '22 11:10 jukzi

smile

I suspect that some things may even rely on this wrong behavior, so sweeping changes and hoping/assuming all will work well is probably a bad assumption. At least on Windows, user IDs with spaces are not all that rare, but that does generally work okay despite this long list of questionable code...

Although I'm for fixing this, I share that concern. What we can do is to check how the produced URL is used. If it is only used by java API (e.g. a stream is opened from it) it should be save to fix it. If it is for example converted to a String and then later parsed again, we have to ensure that the parsing code can handle a correctly encoded URL. So maybe for such cases we have to ensure that both correct and incorrectly encoded URLs can be handled, then wait some time (one or two years?) and the 'fix' the remaining parts to produce correct URLs.

Do we have any plan to fix this issue? What do you think about the suggested approach?

In the FileLocator we could work-around the compatibility problem by adding new methods like toFile/Path() that returns a File/Path directly. Isn't that what most users want to have any way? For FileLocator.resolve() we could find an alternative, for example FileLocator.toNativeURL() or just FileLocator.resolve() that explicitly states that it encodes URLs correctly. The existing wrong behaving methods could then be deprecated and removed after the usual deprecation period (or if necessary after an extended period).

HannesWell avatar Mar 07 '23 15:03 HannesWell

My suggestion how to fix this is the following:

  • Add factory/conversion methods to convert URLs to (file-) URIs or Paths and to parse Strings to URIs, which are both robust and can handle encoded and unencoded Strings/URLs.
    • https://github.com/eclipse-equinox/equinox/pull/691
    • https://github.com/eclipse-equinox/equinox/pull/920
  • Internally migrate all usage of URL to (properly encoded) URI and where a file-URL/URI is expected use Path as much as possible or at least make sure all used URLs are properly encoded.
    • I think that corresponds to the suggestions of the JDK makers:
      • https://inside.java/2023/02/15/quality-heads-up/
      • https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/net/URL.html#constructor-deprecation
  • Use the mentioned new (or suitable existing) methods wherever external 'user-input' is processed, e.g. when a location is read from the command-line or a config file.
  • preserve existing APIs that return unencoded URLs as they are and provide alternatives using (properly encoded) URIs or Path if only file-URI/URLs are supported
    • they can also be used for the internal migration
  • Continue to emit unencoded URL-Strings for example into generated config-files when a secondary Eclipse is launched, at least for some time until the version that can parse proper URIs is mainly used (maybe the usual two years time-frame?).

This should preserve the greatest level of compatibility and the least breakages while allowing us to fix the miss-creation of URLs and to replace deprecated File.toURL() and URL constructors.

But even with the robust methods mentioned above there is a chance for breakage, for example if one has an unencoded URL with a path that contains a meta-character. Assume one has a file where one element is named with%20spaces (yes that's legal in Windows) and passes that as a URL file:/c:/with%20spaces/goo. If we try the standard way first that assumes proper encoding, the resulting path will be C:\with spaces\goo (on Windows) without an exception and thus the alternative way, assuming no encoding, will not be executed, although it would have lead to the actually intended C:\with%20spaces\goo. But I think that's less likely and most problems come from not encoded spaces.

HannesWell avatar Mar 29 '25 20:03 HannesWell

one has a file where one element is named with%20spaces (yes that's legal in Windows)

https://github.com/search?q=path%3A%2520

=> 270_000 hits OMG

jukzi avatar Mar 29 '25 20:03 jukzi

one has a file where one element is named with%20spaces (yes that's legal in Windows)

https://github.com/search?q=path%3A%2520

=> 270_000 hits OMG

It's indeed not impossible, but I don't know how it should be possible to avoid it besides not changing it, but then we don't get the benefits. And for spaces in paths there are even 156Mio hits: https://github.com/search?q=path%3A%22+%22&type=code

HannesWell avatar Mar 30 '25 21:03 HannesWell

  • where a file-URL/URI is expected use Path as much as possible

A Path can not only be a File, so why restrict it here?

laeubi avatar Mar 31 '25 07:03 laeubi