Zia Khan
Zia Khan
@HenrikBengtsson Curious if this issue has anything to do with scheduler.latency in makeClusterFunctionsSlurm().
I tried the environment variable TORCH_HOME, but it now it segfaults. I wonder if it's abi incompatibility with liblantern.so. It's not a huge deal. I'll explore some alternatives.
That's great! Thank you! If there's a link where I can download, I can give it a try on our slurm cluster.
I got the artifact from: https://storage.googleapis.com/torch-lantern-builds/refs/heads/non-abi/latest/LinuxNonABI-cpu.zip and set the TORCHHOME environment variable. No segfault this time. Still need to test a bit more. Any chance you can build the cu101...
I still need to create a minimal example, but I've noticed that snakemake runs python interpreter from the enclosing environment in which snakemake is called using the full path and...
Here is a minimal example: enclosing_smk.yml ``` name: enclosing_smk channels: - conda-forge - bioconda - nodefaults dependencies: - snakemake ``` Here is a named environment nameed_smk.yml ``` name: named_smk channels:...
I noticed that this grad_acc > 0 error occurs if args.micro_train_batch_size != args.actor_num_gpus_per_node * args.train_batch_size. I think it has to do with the fact that deepspeed has some logic around...
@kpoeppel I think I figured out the issue. If you look at the nvcc command it has `-gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_90,code=sm_90` I think some of these architectures...
@kpoeppel I figured it out. Looks like if you set the environment variable `TORCH_CUDA_ARCH_LIST` it correctly handles the nvcc gencode parameter. This should fix all the issues posted here. ```...
I'd like to drive playwright using custom tools in my LLM agent. Being able to use `snapshotForAI` directly would be awesome instead of through playwright-mcp.