You are here: aktuelles » en » lmgpu-info
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:lmgpu-info [2023-07-19 13:35] – Thomas Thiel | en:lmgpu-info [2024-11-29 14:00] (current) – [Shared Storage] Anna Gierlach | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== The Lamarr GPU Cluster ====== | ====== The Lamarr GPU Cluster ====== | ||
- | We are building and expanding a GPU Cluster as part of the [[https:// | + | We are building and expanding a GPU Cluster as part of the [[https:// |
- | AMD Epyc 7713 (64 Cores)\\ | + | 2 AMD Epyc 7713 Processors |
2TB System-RAM\\ | 2TB System-RAM\\ | ||
+ | 28TB local SSD storage\\ | ||
8x A100 SXM4 HGX w/each 80GB GPU-Memory, NVLink (600GB/s) connected\\ | 8x A100 SXM4 HGX w/each 80GB GPU-Memory, NVLink (600GB/s) connected\\ | ||
10GbE Network\\ | 10GbE Network\\ | ||
Line 17: | Line 18: | ||
selective groups of users via standard Posix file permissions/ | selective groups of users via standard Posix file permissions/ | ||
groups. | groups. | ||
+ | |||
+ | All nodes have 28TB local SSD storage available, mounted under / | ||
+ | Please remove your data from there after you are finished with your | ||
+ | calculations. | ||
The workload management software/ | The workload management software/ | ||
Line 23: | Line 28: | ||
[[https:// | [[https:// | ||
- | |||
- | The SLURM job time limits are set to a maximum of 72h, with a default job time | ||
- | limit of 3h. | ||
Full container support is enabled. We recommend the usage of enroot as | Full container support is enabled. We recommend the usage of enroot as | ||
Line 31: | Line 33: | ||
setup around your software, containers will be your way of choice. | setup around your software, containers will be your way of choice. | ||
- | ===== SLURM workload | + | As a general rule, realistic resource allocation for your jobs is **mandatory**. You are only allowed to request the resources from the workload |
- | At this time, there is only one active | + | ===== SLURM partitions ===== |
- | the timelimit for jobs in this queue is set to a maximum of 72h and a | + | |
+ | At this time, there is only one active | ||
+ | the timelimit for jobs in this partition | ||
default of 3h. | default of 3h. | ||
Line 45: | Line 49: | ||
===== Example of custom software environments using containers | ===== Example of custom software environments using containers | ||
- | Obviously you will not get administrative privileges on the cluster to setup your personal software environment. But you can import containers to setup any libraries or frameworks you need to use. We tested the cluster setup using [[https:// | + | Obviously you will not get administrative privileges on the cluster to setup your personal software environment. But you can import containers to setup any libraries or frameworks you need to use. We tested the cluster setup using [[https:// |
In enroot, you import first import external container images, but those images are immutable. You have to create a container and start it read/write to make persistent changes to it. | In enroot, you import first import external container images, but those images are immutable. You have to create a container and start it read/write to make persistent changes to it. | ||
Line 83: | Line 87: | ||
the .sqfs file from anywhere to the login node (i.e. via scp/sftp) and | the .sqfs file from anywhere to the login node (i.e. via scp/sftp) and | ||
start your container via srun just like shown above. | start your container via srun just like shown above. | ||
+ | |||
+ | ===== Shared Storage | ||
+ | |||
+ | If you want to share bigger datasets with other users on the cluster you can request shared storage on the cluster. It will be located under / | ||
+ | |||
+ | ===== Requesting GPUs ===== | ||
+ | |||
+ | If you want to allocate a certain number of GPUs for your job, use the parameter ' | ||
+ | |||
+ | ===== Email Notifications ===== | ||
+ | |||
+ | If you want to receive email notifications about the status of your jobs, you can specify a recipient email address at job submission, i.e.: | ||
+ | |||
+ | sbatch --mail-type=ALL --mail-user=someuser@somedomain.com ... | ||
+ | |||
+ | **Please specify a full (with domain!) and valid email address here.** | ||
+ | |||
+ | |||
+ | ===== A Note about Interactive Jobs ===== | ||
+ | |||
+ | Please remember that interactive jobs often leave things unnecessarily idle (especially if forgotten and/or unterminated), | ||
===== Exchange with other users ===== | ===== Exchange with other users ===== | ||
Line 89: | Line 114: | ||
# | # | ||
+ | |||