Hardware table improvement
What would you like to see added?
- [ ] Use multilevel columns for resource columns
- Primary header: "Node"
- Secondary headers: "Availability" | "Limit"
- [ ] Carefully and clearly document the
~character usage. - [ ] Ensure the table is not too wide.
- [ ] Separate table with full hardware details for active only
At https://docs.rc.uab.edu/cheaha/hardware/
Cheaha HPC Cluster
| Partition | Time Limit in Hours | Nodes (Limit/Partition) | Cores/Node (Limit/Person) | Mem GB/Node (Limit/Person) | GPU/Node (Limit/Person) |
|---|---|---|---|---|---|
| express | 2.0 | 51 (~) | 48 (264) | 754 (3072) | |
| short | 12.0 | 51 (44) | 48 (264) | 754 (3072) | |
| medium | 50.0 | 51 (44) | 48 (264) | 754 (3072) | |
| long | 150.0 | 51 (5) | 48 (264) | 754 (3072) | |
| largemem | 50.0 | 13 (10) | 24 (290) | 755 (7168) | |
| largemem-long | 150.0 | 5 (10) | 24 (290) | 755 (7168) | |
| pascalnode | 12.0 | 18 (~) | 28 (56) | 252 (500) | 4 (8) |
| pascalnodes-medium | 48.0 | 7 (~) | 28 (56) | 252 (500) | 4 (8) |
| amperenodes | 12.0 | 20 (TBD) | 32 (64) | 189 (384) | 2 (4) |
| amperenodes-medium | 48.0 | 20 (TBD) | 32 (64) | 189 (384) | 2 (4) |
| amd-hdr100 | 150.0 | 34 (5) | 128 (264) | 504 (3072) | |
| Interactive | |||||
| Intel DCB |
Detailed Hardware Overview
| CPU GPU Generation | Compute Type | Die Name | Gpu Name | Gb Mem Node | Gpu Mem Gb | Gpu Per Node | Total Nodes | Total Gpus | Total Cores | Total Memory Gb | Cores Per Node | Cores Per Die | Dies Per Node | Die Frequency Ghz |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | cpu: amd | AMD Opteron 242 | 16 | 64 | 128 | 1024 | 2 | 1 | 2 | 1.6 | ||||
| 10 | cpu: amd | AMD Epyc 7713 Milan | 512 | 34 | 4352 | 17408 | 128 | 64 | 2 | 2 | ||||
| 1 | cpu: intel | Intel Xeon Gold 6248R | 192 | 5 | 240 | 960 | 48 | 12 | 4 | 3 | ||||
| 1 | cpu: intel | Intel Xeon Gold 6248R | 192 | 3 | 144 | 576 | 48 | 12 | 4 | 3 | ||||
| 8 | cpu: intel | Intel Xeon E5-2680 v4 | 192 | 21 | 504 | 4032 | 24 | 12 | 2 | 2.5 | ||||
| 2 | cpu: intel | Intel Xeon E5450 | 48 | 24 | 192 | 1152 | 8 | 4 | 2 | 3 | ||||
| 3 | cpu: intel | Intel Xeon X5650 | 48 | 32 | 384 | 1536 | 12 | 6 | 2 | 2.66 | ||||
| 3 | cpu: intel | Intel Xeon X5650 | 96 | 16 | 192 | 1536 | 12 | 6 | 2 | 2.66 | ||||
| 4 | cpu: intel | Intel Xeon X5650 | 384 | 3 | 48 | 1152 | 16 | 8 | 2 | 2.7 | ||||
| 5 | cpu: intel | Intel Xeon E2650 | 96 | 12 | 192 | 1152 | 16 | 8 | 2 | 2 | ||||
| 6 | cpu: intel | Intel Xeon E5-2680 v3 | 384 | 14 | 336 | 5376 | 24 | 12 | 2 | 2.5 | ||||
| 6 | cpu: intel | Intel Xeon E5-2680 v3 | 256 | 38 | 912 | 9728 | 24 | 12 | 2 | 2.5 | ||||
| 6 | cpu: intel | Intel Xeon E5-2680 v3 | 128 | 44 | 1056 | 5632 | 24 | 12 | 2 | 2.5 | ||||
| 9 | cpu: intel | Intel Xeon Gold 6248R | 768 | 52 | 2496 | 39936 | 48 | 24 | 2 | 3 | ||||
| 8 | mem: large | Intel Xeon E5-2680 v4 | 768 | 10 | 240 | 7680 | 24 | 12 | 2 | 2.5 | ||||
| 8 | mem: large | Intel Xeon E5-2680 v4 | 1536 | 4 | 96 | 6144 | 24 | 12 | 2 | 2.5 | ||||
| 7 | gpu: pascal | Intel Xeon E5-2680 v4 | NVIDIA Tesla P100 | 256 | 16.0 | 4.0 | 18 | 72.0 | 504 | 4608 | 28 | 14 | 2 | 2.4 |
| 11 | gpu: ampere | AMD Epyc 7742 Rome | NVIDIA A100 | 1024 | 40.0 | 8.0 | 4 | 32.0 | 512 | 4096 | 128 | 64 | 2 | 2.25 |
| 11 | gpu: ampere | AMD Epyc 7742 Rome | NVIDIA A100 | 1024 | 40.0 | 8.0 | 4 | 32.0 | 512 | 4096 | 128 | 64 | 2 | 2.25 |
| 11 | gpu: ampere | AMD Epyc 7763 Milan | NVIDIA A100 | 512 | 80.0 | 2.0 | 20 | 40.0 | 2560 | 10240 | 128 | 64 | 2 | 2.45 |
Sorry I didn't see this ticket before I did this but I updated the tables to make them more readable. Putting the relevant information closer to the left and combining some columns. I also wrote some copy for the Jupyter Lab page to help people fill out the form.
Launching a JupyterLab server in a High-Performance Computing (HPC) environment. Here's how to fill out each section:
-
Environment Setup: This section allows you to specify any additional software modules or specific versions of Anaconda that you'd like to load. If you need a particular Python package or software tool, you can list it here. Use the format
module load example_module/VERSION. -
Extra JupyterLab arguments: If you have any additional command-line arguments to pass to JupyterLab, you can enter them here.
-
Number of hours: Enter the number of hours you expect to need the JupyterLab server. Make sure this aligns with the partition's time limit you'll select later.
-
Partition: Use the dropdown to select the type of computational resources you need. Choose based on your job's requirements and the time it may take to complete. You can use the information on the Hardware page in the docs to help make your decision.
-
Number of GPUs: If your job will be using GPU resources, specify the number of GPUs required here.
-
Number of CPUs: Enter the number of CPU cores you need for your computation.
-
Memory per CPU (GB): Specify the amount of RAM you need per CPU core in gigabytes.
Example Use Cases
Use Case 1: CPU-Only
-
Environment Setup:
module load scipy/1.5.0 numpy/1.19.0 -
Extra JupyterLab arguments OPTIONAL:
--no-browser -
Number of hours:
4 -
Partition:
medium -
Number of GPUs:
0 -
Number of CPUs:
4 -
Memory per CPU (GB):
8
This configuration is suitable for medium-scale data processing tasks that require specific versions of SciPy and NumPy but do not need GPU acceleration. The job is expected to complete within 4 hours, and it will use 4 CPU cores, each with 8 GB of RAM.
Use Case 2: CPU and GPU
-
Environment Setup:
module load tensorflow-gpu/2.4.0 -
Extra JupyterLab arguments OPTIONAL:
--no-browser -
Number of hours:
6 -
Partition:
amperenodes -
Number of GPUs:
2 -
Number of CPUs:
8 -
Memory per CPU (GB):
16
This configuration is aimed at machine learning tasks that require TensorFlow with GPU support. The job is expected to run for up to 6 hours. It will use 2 GPUs from the Ampere architecture and 8 CPU cores, each with 16 GB of RAM.
By filling out the form with settings appropriate to your computational needs, you can efficiently utilize the HPC resources available to you.
Computational Partitions
- express: This is a partition designed for quick, small-scale jobs. The time limit is often very short (e.g., 2 hours), but the jobs get scheduled quickly.
- short: This partition is meant for jobs that are expected to complete in a relatively short time (e.g., 12 hours). It typically has moderate resource limits.
-
medium: Designed for jobs that need more time to complete (e.g., 50 hours), but not as much as long-running jobs. Resources are often similar to the
shortpartition. -
long: This partition is for long-running jobs that may need up to several days to complete (e.g., 150 hours). The resource allocation might be similar to
mediumbut with a longer time allowance. - largemem: This partition is specialized for jobs that require a large amount of memory (RAM). The time and core limits may vary, but the focus is on providing more memory.
-
largemem-long: A specialized version of
largemem, designed for jobs that both require a lot of memory and take a long time to complete.
GPU Partitions
- pascalnode: This partition is for specialized for jobs that require Pascal architecture GPUs. The QoS limits would focus on GPU availability.
-
pascalnodes-medium: Similar to
pascalnode, but designed for jobs that may need more time to complete. - amperenodes: This partition is for specialized for jobs requiring Ampere architecture GPUs.
-
amperenodes-medium: An extension of
amperenodes, designed for jobs that require more time but use Ampere architecture GPUs.
Specialized Hardware
- amd-hdr100: HDR 100 is a key enabler of HPC and AI workloads, which are increasingly data-intensive and require high-speed communication between compute nodes. It can significantly improve the performance of HPC applications, such as computational fluid dynamics (CFD), molecular dynamics, and machine learning.
Interactive and Miscellaneous
- Interactive: A partition designed for interactive sessions rather than batch jobs. Useful for development, debugging, or data analysis in real-time.
- Intel DCB: This could refer to a partition optimized for Intel's Data Center Blocks (DCB), which are fully-validated server systems that can help accelerate time to market with reliable, pre-configured server solutions.
Pascal and Ampere Architecture
Pascal Architecture: Introduced in 2016 and built on a 16nm FinFET process, Pascal GPUs are the Tesla P100 were aimed at general-purpose computing, gaming, and early machine learning applications. While they offered significant improvements over their predecessor, Maxwell, they generally lack specialized Tensor Cores for AI and do not support real-time ray tracing.
Ampere Architecture: Launched in 2020 on an 8nm process, Ampere GPUs are the A100 Tensor Core are designed for modern computational needs, offering substantial gains in performance and power efficiency. They feature faster GDDR6 or HBM2 memory, specialized Tensor Cores for AI tasks, and native support for real-time ray tracing, making them more versatile for current and future applications.
Node: A standalone unit within a larger computer cluster, equipped with its own memory and processing capabilities. Nodes perform individual tasks and can communicate with other nodes in the network.
Core: A sub-unit within a CPU or GPU that handles specific tasks. Multiple cores can operate simultaneously to execute various tasks, improving overall performance.
Die: The physical piece of silicon that serves as the base for components like cores, memory caches, and other internal structures. It's essentially the platform that holds the computational elements of a CPU or GPU.
To allocate for specific resources for example a A100 GPU with 80 GB VRAM, 125 GB RAM, and 8 vCPUs for a Jupyter Lab AI project on Cheaha, you'd typically follow these steps:
-
Partition Selection: Choose a partition that offers A100 GPUs and meets your memory and CPU requirements.Jupyter Lab form select something like
amperenodesoramperenodes-mediumfrom the "Partition" dropdown. -
Number of GPUs: Enter
1in the "Number of GPUs" field, as you want a single A100 GPU. -
Number of CPUs: Enter
8in the "Number of CPUs" field, to request 8 vCPUs. -
Memory per CPU: Enter
15.625(or the closest allowable value) in the "Memory per CPU (GB)" field. Since you want 125 GB RAM in total and you have 8 vCPUs, each CPU should ideally have 125/8 = 15.625 GB. - Number of Hours: Enter the estimated time you think your Jupyter Lab session will need to complete your AI project.
-
Environment Setup: If your project needs specific Python libraries or environments, specify them under "Environment Setup". For AI projects, you might load a module like
module load tensorflow-gpu/2.4.0. -
Extra JupyterLab Arguments: If you have any special JupyterLab settings, specify them here. For example, you can specify a custom port using
--port=8889. - Submit: Finally, submit the form to allocate the resources and start your JupyterLab session.
Here's how you might fill out the form:
-
Environment Setup:
module load tensorflow-gpu/2.4.0 -
Extra JupyterLab arguments:
--port=8889 -
Number of hours:
6(or as needed) -
Partition:
amperenodes -
Number of GPUs:
1 -
Number of CPUs:
8 -
Memory per CPU (GB):
15.625
Once submitted, Cheaha will allocate the resources based on availability and queue priority, and your Jupyter Lab session should start with the resources you requested.
Introduction to the Hardware Section
In high-performance computing (HPC) environments like Cheaha, efficient resource allocation is crucial. To help manage this, Cheaha employs a resource management system known as Slurm. Slurm essentially acts as a traffic cop, directing jobs to various computational resources based on a set of rules and policies. These policies are encapsulated in what is referred to as Quality of Service or QoS Restrictions.
The concept of Quality of Service (QoS) Restrictions is vital for maintaining a harmonious multi-user environment. In simple terms, QoS Restrictions set the boundaries for resource utilization—be it cores, memory, or GPUs—by each job submitted to the cluster. These restrictions ensure that resources are allocated fairly among all users, preventing any single job from monopolizing the system. But bear in mind that QoS limits are not a reservation of resources; they are more like guidelines that govern the maximum usage per user or job, helping to keep the system both available and equitable.
Now, let's delve into the table that follows. It outlines the various computational resources on Cheaha and the corresponding QoS Restrictions. The table also categorizes resources into Slurm partitions—a Slurm partition being essentially a collection of nodes with similar characteristics and constraints. Partitions have their own QoS limits on aspects like cores, memory, and GPUs, and these limits are applied to each partition independently. Additionally, each researcher is individually subject to these limits, offering a level playing field for all.
With this background, you should find it easier to navigate Cheaha's computational landscape. Below are some practical examples to further elucidate how to interpret and make the most of the resource table.