arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-32206: [C++] GcsFileSystem::Make should return Result

Open trifleneurotic opened this issue 1 year ago • 2 comments

This PR includes breaking changes to public APIs.

What changes are included in this PR?

GcsFileSystem::Make now returns Result, with corresponding header & test changes.

Are these changes tested?

Yes, arrow-gcsfs-test passed.

Are there any user-facing changes?

None known.

  • GitHub Issue: #32206

trifleneurotic avatar Oct 22 '24 16:10 trifleneurotic

GH-32206: Added back a line I had erroneously removed (port_ argument for storage-testbench).

trifleneurotic avatar Oct 22 '24 19:10 trifleneurotic

gcsfs.h is a public header. So this is a breaking change. Could you keep **This PR includes breaking changes to public APIs.** in the pull request template?

This has been done.

trifleneurotic avatar Oct 23 '24 10:10 trifleneurotic

Tests were re-run locally as a result of the local compilation of commit 49fd6d0e0a4fefadc7083e5e4fa392c3528669cf; arrow-gcsfs-test was successful. (EDIT: incorrect commit hash).

trifleneurotic avatar Oct 23 '24 10:10 trifleneurotic

Tests were again re-run locally as a result of the local compilation of commit e5b606e462cb362ef49092eb10797d166e2e60fd; arrow-gcsfs-test was successful.

trifleneurotic avatar Oct 23 '24 23:10 trifleneurotic

Could you fix lint failures by pre-commit run --show-diff-on-failure --color=always --all?

kou avatar Oct 24 '24 00:10 kou

Could you fix lint failures by pre-commit run --show-diff-on-failure --color=always --all?

I think those failures should be fixed now (ref 07c36f5786888c1e2398cda5911bc4f986fe30f4 & bc88c9ffe39d1d8c5e3ebb8b5ecb63b7aedf975e).

trifleneurotic avatar Oct 24 '24 05:10 trifleneurotic

I apologize sincerely for the linting back-and-forth; just committed another change (ref f0ba9c5cec42962c5b9fde69d3734ec6367ce9ef) to match method signature style to nearby method. I appreciate everyone's guidance & advice as I work towards an acceptable PR.

trifleneurotic avatar Oct 24 '24 05:10 trifleneurotic

Why did you fix it manually? Did pre-commit run --show-diff-on-failure --color=always --all not work well?

kou avatar Oct 24 '24 06:10 kou

Why did you fix it manually? Did pre-commit run --show-diff-on-failure --color=always --all not work well?

That's correct it did not, but after further investigation it turns out it was a local issue on my part - my apologies. That issue has been resolved & I was able to correctly capture & push pre-commit's changes (ref 2090bbba59471f4a1eba426557bc90674a88b308). Linting step now completes successfully. (EDIT: linter completion clarification).

trifleneurotic avatar Oct 24 '24 12:10 trifleneurotic

@kou The C++ linter tests that previously failed went through & succeeded, but the R tests are still failing. However, I'm assuming that isn't a consequence of this change?

trifleneurotic avatar Oct 24 '24 23:10 trifleneurotic

It's related:

https://github.com/apache/arrow/actions/runs/11499533014/job/32034937833?pr=44503#step:7:6721

g++ -std=gnu++17 -I"/usr/share/R/include" -DNDEBUG -I/usr/local/include    -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_ACERO -DARROW_R_WITH_SUBSTRAIT -DARROW_R_WITH_JSON -DARROW_R_WITH_S3 -DARROW_R_WITH_GCS -Werror -I'/usr/local/lib/R/site-library/cpp11/include'     -fpic  -g -O2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -ffile-prefix-map=/build/r-base-YnquTi/r-base-4.4.1=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -fdebug-prefix-map=/build/r-base-YnquTi/r-base-4.4.1=/usr/src/r-base-4.4.1-3.2404.0 -Wdate-time -D_FORTIFY_SOURCE=3   -c type_infer.cpp -o type_infer.o
filesystem.cpp: In function ‘std::shared_ptr<arrow::fs::GcsFileSystem> fs___GcsFileSystem__Make(bool, cpp11::list)’:
filesystem.cpp:433:33: error: could not convert ‘arrow::fs::GcsFileSystem::Make(const arrow::fs::GcsOptions&, const arrow::io::IOContext&)(io_context)’ from ‘arrow::Result<std::shared_ptr<arrow::fs::GcsFileSystem> >’ to ‘std::shared_ptr<arrow::fs::GcsFileSystem>’
  433 |   return fs::GcsFileSystem::Make(gcs_opts, io_context);
      |          ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
      |                                 |
      |                                 arrow::Result<std::shared_ptr<arrow::fs::GcsFileSystem> >
make: *** [/usr/lib/R/etc/Makeconf:204: filesystem.o] Error 1

Hmm. It may be better that we keep the old signature with deprecation for easy to migration. Something like:

diff --git a/cpp/src/arrow/filesystem/gcsfs.h b/cpp/src/arrow/filesystem/gcsfs.h
index f1fbc95bf9..3789da18ec 100644
--- a/cpp/src/arrow/filesystem/gcsfs.h
+++ b/cpp/src/arrow/filesystem/gcsfs.h
@@ -231,10 +231,14 @@ class ARROW_EXPORT GcsFileSystem : public FileSystem {
       const std::shared_ptr<const KeyValueMetadata>& metadata) override;
 
   /// Create a GcsFileSystem instance from the given options.
-  // TODO(ARROW-16884): make this return Result for consistency
+  ARROW_DEPRECATED("Deprecated in 19.0.0. Use the Result version instead.")
   static std::shared_ptr<GcsFileSystem> Make(
       const GcsOptions& options, const io::IOContext& = io::default_io_context());
 
+  /// Create a GcsFileSystem instance from the given options.
+  static Result<std::shared_ptr<GcsFileSystem>> Make(
+      const GcsOptions& options, const io::IOContext& = io::default_io_context());
+
  private:
   explicit GcsFileSystem(const GcsOptions& options, const io::IOContext& io_context);

FYI: You can check CI results on your for by enabling GitHub Actions on your fork.

kou avatar Oct 25 '24 00:10 kou

I am currently readying a commit/push to address the issues with R. Please stand by, but of course, feel free to comment in the meantime if desired.

trifleneurotic avatar Oct 25 '24 14:10 trifleneurotic

Hello! And thank you all for your advice & time so far in reviewing my PR.

Just to give a quick update:

For my latest commit (ref 6e9979df8fe301579c6825ae75bdfa049a6f6259), I have done the following:

  • local build with the ninja-debug-filesystems preset
  • ran all R testthat tests (gcs successful)
  • ran all cmake tests (arrow-gcsfs-test successful)
  • ran pre-commit (changes made & committed)

Of course, after that, I did the requisite pulls & rebase before push. This was an older issue so, even though it was a good-first-issue at that time, there may be more tendrils now connecting to other places in the codebase. I've been able to reduce the number of post-push job failures on my fork but haven't been able to eliminate them yet.

Thoughts on moving forward? (EDIT: with commit 0b5bb2adfa932ed629e9251e4e7cd5431133c52b, only one workflow job is now failing, so removing PR from draft mode)

trifleneurotic avatar Oct 25 '24 17:10 trifleneurotic

@kou

Hello! I see that 18.0.0 showed up on the Releases page for Arrow. What would be next steps?

trifleneurotic avatar Oct 29 '24 14:10 trifleneurotic

Could you rebase on main?

kou avatar Oct 29 '24 21:10 kou

Could you rebase on main?

@kou

Complete! Ref 51cbac705f3b398e3b67e6931bfe9c0ebab39ead.

trifleneurotic avatar Oct 30 '24 01:10 trifleneurotic

@kou

Thank you for yoiur continued assistance.

I had two workflow job failures over on my fork (NodeJS & Archery/Crossbow), although here there was just one: Python / AMD64 Conda Python 3.11 Pandas latest. Would you happen to have any context or background as to these workflow job failures?

trifleneurotic avatar Oct 30 '24 02:10 trifleneurotic

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit ba5c86d2c62421b627c609c30ed6b0c98b2882c6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.