Multiple enhancements and bugfixes
Apologies for the monolithic commit. This was the result of many months of tinkering and improvement, but (with these changes) I was finally able to do the migration of hundreds of RTC streams (some of which were large) with minimal developer downtime. i.e. it's good stuff ... it's just not done in a set of nice clear-cut PRs.
Bugs fixed:
- Keeps commits in the order they need to be done in, rather than just date order. This is the one that caused me a lot of trouble until I found the assumption that changesets were to happen in modification-date order (which is not always true).
- Lots of defensive error-handling code for when RTC fails. Sometimes RTC fails to get the changes it was told to obtain; sometimes it fails to apply those changes to the local filesystem correctly; sometimes it succeeds but claims it failed; sometimes it fails but claims it succeeded. It can't be trusted, so we don't leave it to trust.
- Ensure git tags only use legal characters. RTC allows baselines etc to contain any characters; git is fussy (and git-lfs doesn't cope well with commas either).
- Improved handling of .gitignore files to avoid confusing RTC. RTC will typically fail to update the filesystem if there are .gitignore files in a folder it needs to remove, so we now have the option of deleting (and re-creating) the gitignore files to ensure that RTC doesn't see anything it doesn't like. Also, RTC lets you store an empty folder whereas git does not. Lastly, RTC lets you put files under source control that are .jazzignored whereas JGit does not and hence this code would let things go missing.
- Improve error handling. Old code would printStackTrace in places and then carry on and report success.
- An RTC stream removing a component (somewhere in its history) no longer causes a NPE.
- We now support a --include-root flag to cope with streams whose components contain files/paths with the same name. Before, we'd fail to get the data, ignore the failure and thus fail to show the file changes, losing data.
- Ensure we don't leak file resources. Old code had potential to leak filesystem handles, that might've been causing issues on larger migrations.
Enhancements:
- Uses newer JGit.
- Uses RTC 6.0.6.
- Now supports a "batch" mode to limit the number of commits/changesets handled before the migration ends, allowing a migration to be done in stages (typically used to defend against a "nothing" outcome from "all or nothing" migrations - with this tactic, a failure doesn't mean the loss of all progress).
- Now supports a "update" mode to allow a migration to continue where we left off (allowing you to keep a git repo "in sync" with an RTC stream for ease of migration, if you make a snapshot pointing to where you got to last, and then use that as a starting point for next time)
- Can now set commit-comment syntax to link back to RTC (or anything else); commit comments are now highly configurable.
- Can now set git-notes just as easily as comments (but use with caution if using git-lfs as git notes are rarely used).
- Much more verbose logging in order to facilitate diagnosing discrepancies (useful if you spot that a file is missing a commit in the git history that exists in the RTC history; not all changesets are usefully accessible from RTC, and these diagnostics can help explain why).
- Initial commit date is now configurable (e.g. I typically use 00:00 on 01/01/2000, otherwise my "first" commit had a later date than all the other commits that "followed" it)
- autocrlf is now configurable. Can be useful if using Linux to migrate files that were put there by a Windows user who's never tried loading their RTC data onto a Linux machine.
- git commits now know both the committer and the author, not just the committer (data is taken from RTC's "added to stream" and "author" fields, rather than just the "author" field).
- Supports a preemptive changeset gap filling mode (uses the "list missing changesets" functionality, but use with caution as RTC is prone to lying and telling the migration that it needs changesets that are totally unrelated, causing a conflict/merge nightmare and the migration to fail)
- Multiple useful shell scripts to aid migration. I wrote these for my own use; they may be useful to others.
Notes
- The RTC CLI will now need to run using Java8 (due to JGit now requiring Java 8), so it can't use the JVM supplied with RTC6. The "make a Dockerfile" script will arrange for this to happen but if you're not using that, you'll need to remove the built-in JVM yourself if your CLI included Java7 or earlier.
- If you're migrating RTC data that contains Eclipse IDE .project files, you'll need a client that contains the fix for RTC support Case TS002463010 (raised in early 2019 through IBM-internal channels, officially logged on July 10 2019, confirmed to be a known bug on 1st Oct 2020, raised as jazz.net issue 518806, resulted in a text-fix for RTC 6.0.6 and was officially declared "complete" in March 2021 ... but never resulted in an official fix - it might be fixed in RTC 7.1 or later) otherwise the RTC client attempt to do an Eclipse "build project" of any .project files in your data ... and the migration will error unless those projects build perfectly (and the migration will also git-commit unwanted data resulting from the build unless the .jazzignore files were already in place before the Eclipse project was encountered).
- I was able to use this code to migrate over 500 RTC streams each containing thousands of changesets into git. It wasn't fun but it was possible ... and now we don't need RTC anymore 😁 🎉
Note: Quite happy to work with the project maintainer(s) on getting this massaged into a state where it can be merged in.
@pjdarton Thanks for you're work. I will try to gloss over the changes in the next days.
@pjdarton Fantastic job with this, you fixed a lot of the issues I saw back when I managed to get through this for only a few streams back in the 6.0.4. days. I'm going to clone down and pull in your PR, we're going to be testing soon with RTC 7.0.2.x and we've already noticed that the CLI breaks API compatibility in a few places. I'll let you know our status and try to get any fixes back in for support on 7.x.
Hi folks, While I'm glad this is seeing some interest at long last, I'm no longer involved with RTC at all; I finished migrating from RTC to GitHub in March 2022 and no longer have access to RTC at all ... for which I am thankful (as finishing the migration nearly finished me!) ... and have little appetite to revisit the experience. I'm quite happy to comment here to advise, but I'm not in a position to push this forward anymore. i.e. if you want this "in", I'd suggest you grab this branch/code/pr and make your own pr that builds upon it.
TL;DR: I'm done with RTC and, while I sympathise with those still using it, I'm just going to cheer y'all on from the sidelines rather than getting involved again myself.
@pjdarton Understood, and good job getting out of IBM's shadow, you worked a hell of a job doing it! I'm just glad you gave us so much for the rest of us in the trenches still doing this, your commits/changes will help a ton!
@pjdarton We just finished building your code on the branch from the merged PR into the branch "rtcTo:rtc6reorderedchangesets", we were able to successfully run this using ELM 7.0.2SR1 just fine with a few stream/component tests. Excellent work and great upgrades. Once we can get some time, we want to replicate/polish up your build.sh (it's way easier to run this to build the fragment jar instead of using Eclipse, we were able to build the jar in there and export it but it was a mess due to how much Eclipse hides from you.) We then were going to push up our changes to that, the pom.xml and .product file fixes for the new dependencies, and completely overhaul the README.MD for users in the future, and we'll try to get with the original maintainer of this repo to hopefully get this into master.
We only had one question from what we've been stuck on, and I wasn't sure if you had a quick answer or not, we've debugged this left and right and cannot figure it out (I appreciate anything you can provide): Your PR implemented a new class, JSONParser: https://github.com/rtcTo/rtc2gitcli/blob/rtc6reorderedchangesets/rtc2git.cli.extension/src/to/rtc/cli/migrate/util/JSONParser.java
There is an import we cannot get resolved at all: import org.eclipse.e4.core.services.util.JSONObject;
Maven no longer has this dependency, the Eclipse e4 folks, from what I see from eclipse.org bug issues, removed all JSONObject classes from these jars back in 2013: https://www.eclipse.org/lists//e4-dev/msg07224.html
When I build via Eclipse (using the Maven runner) and then export the fragment jar from Eclipse, it actually never resolves this dependency, Eclipse just force-compiles the .class file for it anyway, strips the leading import path (when I look at the decompiled output) and packages it all up. If I look at the raw .class file decompiled, Eclipse now generates it as: `package to.rtc.cli.migrate.util;
import JSONObject; import java.lang.reflect.Field; import java.util.List; import java.util.Map;`
If I dump the raw .class file:
Unresolved compilation problems: The import org.eclipse.e4 cannot be resolved JSONObject cannot be resolved to a type
Somehow, some way, it's resolving this at runtime, and I have no idea how. We've trawled all the ELM SCM CLI runtime jars and dependencies trying to dig through to find where it may be working (IBM has a JSONObject class in one of its RTC SCM CLI jars), I noticed you added GSon as a dependency, but none of these imports resolves missing methods for .deserialize() and .getObjects(). I suspect maybe it's being resolved by some runtime magic of the OSGI/E4 environment that the ELM SCM CLI environment spawns when using the command-line, but I haven't used a debugger yet to check, and I still cannot build it from your build.sh as Maven rejects the compilation due to this missing dependency.
Is there any chance your build still references a very old E4 jar somewhere that contains this jar, or resolved through some other means? I wanted to ask before we went back and started searching for E4 dependencies from 10 years ago or start looking at code changes.
I suspect that I was using the Eclipse that was (part of) the official RTC client ... which was pretty old when I was using it, let alone now. I suspect that my thinking ran along the lines of "if I limit my coding to only the libraries that were already present within the official RTC client, I won't have to learn how to add new external libraries" and so I went searching for JSON-parsing capabilities within the Eclipse classpath... I'd guess that RTC now has a much newer Eclipse at its heart which has very different libraries inside it, which is why that import isn't available anymore.
I'd suggest you try a different tactic - rather than try to get that code working "as-is" by replacing the import that it wants, I suspect you'll find it easier to "just" re-implement the code to use something else.
i.e. Take a look at where this JSONParser object is used from (which is ListChangesCommandOutputParser and ListMissingChangeSetsCommandOutputParser) and then see what JSON-parsing code you can find on your classpath that can do the same job. I expect that Eclipse still contains some JSON-parsing libraries, even if it doesn't contain that one anymore.
FYI the actual functionality these two calling classes need is relatively simple. They're given a string (which contains the text output from running an RTC command), parsing it as JSON, and then they're extracting particular bits of that data. e.g. ListMissingChangeSetsCommandOutputParser just wants to collect all of changes[].uuid from the JSON. ListChangesCommandOutputParser goes into much more detail than that but it's still "just" parsing a bit of JSON to extract meaning from it and populate a simpler Java construct.
So, instead of looking for org.eclipse.e4.core.services.util.JSONObject, go looking for a more modern JSON parser and change the code to use that instead. You may even find a JSON parser that can be used directly by the two ...CommandOutputParser classes without requiring an adaptor class to fix its shortcomings (as 90% of JSONParser is workarounds).
FYI There's even a unit test for both those classes, so if you can find something that "seems to work" then you can run those to be fairly sure it'll work "for real".