Fix macOS build from source
What does this PR do? Please describe:
The library does not build from source on macOS (tested with Sequoia 15.0) with conda due to an incompatible compiler availability check (1), an outdated libpng submodule version (2) and an unavailable os.sched_getaffinity() (3)
Error message from (1), even after running conda forge -c conda-forge compilers:
> cmake -GNinja -B build
-- The C compiler identification is Clang 18.1.7
-- The CXX compiler identification is Clang 18.1.7
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Users/alex/miniconda3/envs/fs2-nightly/bin/arm64-apple-darwin20.0.0-clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Users/alex/miniconda3/envs/fs2-nightly/bin/arm64-apple-darwin20.0.0-clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:16 (message):
It looks like you are in a Conda environment, but the `compilers` package
is not installed. Please run `conda install -c conda-forge
compilers=1.2.0` first.
Resolved this issue by adding a conditional check in CMakeLists.txt (5fe3c62aedc936e1cc24eb5b84f02072d57784b0) when installing with conda and apple to check for clang path directly while keeping the original (non-apple compiler check) in place. Also updated the help message to remove the pin on compilers=1.2.0 as it is too old to compile without issues.
Error message from (2), which is a known issue from libpng since early 2024:
> cmake --build build
[33/185] Performing download step (git clone) for 'jpeg_turbo'
Cloning into 'jpeg_turbo'...
HEAD is now at ec32420f example.c: Fix 12-bit PPM write w/ big endian CPUs
[34/185] Performing disconnected update step for 'jpeg_turbo'
-- Already at requested tag: 3.0.1
[60/185] Generating scripts/intprefix.out
FAILED: third-party/libpng/scripts/intprefix.out /tmp/fairseq2/native/build/third-party/libpng/scripts/intprefix.out
cd /tmp/fairseq2/native/build/third-party/libpng && /Users/alex/miniconda3/envs/fs2-nightly/lib/python3.11/site-packages/cmake/data/bin/cmake -DINPUT=/tmp/fairseq2/native/third-party/libpng/scripts/intprefix.c -DOUTPUT=/tmp/fairseq2/native/build/third-party/libpng/scripts/intprefix.out -P /tmp/fairseq2/native/build/third-party/libpng/scripts/genout.cmake
In file included from /tmp/fairseq2/native/third-party/libpng/scripts/intprefix.c:22:
/tmp/fairseq2/native/third-party/libpng/scripts/../pngpriv.h:518:16: fatal error: 'fp.h' file not found
518 | # include <fp.h>
| ^~~~~~
1 error generated.
CMake Error at scripts/genout.cmake:78 (message):
Failed to generate
/tmp/fairseq2/native/build/third-party/libpng/scripts/intprefix.out.tf1
Resolved this issue by updating the submodule libpng from 1.6.34 (https://github.com/pnggroup/libpng/commit/b78804f9a2568b270ebd30eca954ef7447ba92f7) to 1.6.48 (https://github.com/pnggroup/libpng/commit/ea127968204cc5d10f3fc9250c306b9e8cbd9b80).
Error message from (3), which is already mitigated in src/fairseq2/utils/threading.py in a similar fashion. This happens during execution, not during the compilation step.
> fairseq2 lm instruction_finetune $OUTPUT_DIR --config-file $CONFIG_FILE
(...)
fairseq2 - Command failed with an unexpected error. See the logged stack trace for details.
Traceback (most recent call last):
File "/Users/aerben/Documents/fairseq2/src/fairseq2/cli/_main.py", line 35, in main
exit_code = _run()
^^^^^^
File "/Users/aerben/Documents/fairseq2/src/fairseq2/cli/_main.py", line 86, in _run
return cli.run(context)
^^^^^^^^^^^^^^^^
File "/Users/aerben/Documents/fairseq2/src/fairseq2/cli/_cli.py", line 123, in run
return args.command.run(context, args) # type: ignore[no-any-return]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aerben/Documents/fairseq2/src/fairseq2/cli/_cli.py", line 361, in run
return self._handler.run(context, self._parser, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aerben/Documents/fairseq2/src/fairseq2/cli/commands/recipe.py", line 148, in run
self._do_run(context, args)
File "/Users/aerben/Documents/fairseq2/src/fairseq2/cli/commands/recipe.py", line 205, in _do_run
recipe = self._loader(context, config, output_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aerben/Documents/fairseq2/src/fairseq2/recipes/lm/_instruction_finetune.py", line 227, in load_instruction_finetuner
setup_torch(context, config.common.torch, output_dir)
File "/Users/aerben/Documents/fairseq2/src/fairseq2/recipes/common/_torch.py", line 35, in setup_torch
_set_num_threads(context, torch_section.num_threads)
File "/Users/aerben/Documents/fairseq2/src/fairseq2/recipes/common/_torch.py", line 97, in _set_num_threads
num_threads = get_num_threads(context.env)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aerben/Documents/fairseq2/src/fairseq2/utils/threading.py", line 78, in get_num_threads
return _get_num_cpus(num_procs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aerben/Documents/fairseq2/src/fairseq2/utils/threading.py", line 84, in _get_num_cpus
affinity_mask = os.sched_getaffinity(0)
^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'os' has no attribute 'sched_getaffinity'
Resolved in 10c656f by catching the AttributeError and utilizing os.cpu_count as a fallback.
Does your PR introduce any breaking changes? If yes, please list them: None. Ran tests locally (M3 Max, Sequoia 15.0) and (Ubuntu 22.04.5 LTS, NVIDIA A100-SXM4-80GB, 535.183.01, CUDA 12.1) without issues.
Check list:
- [ ] Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
- [X] Did you read the contributor guideline?
- [X] Did you make sure that your PR does only one thing instead of bundling different changes together?
- [ ] Did you make sure to update the documentation with your changes? (if necessary)
- [ ] Did you write any new necessary tests?
- [X] Did you verify new and existing tests pass locally with your changes?
- [ ] Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)