Gemeinsame Systemgruppe IfI/b-it

You are here: aktuelles » en » lmgpu-info

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:lmgpu-info [2023-08-28 13:34] Thomas Thielen:lmgpu-info [2024-04-11 10:49] (current) Thomas Thiel
Line 1: Line 1:
 ====== The Lamarr GPU Cluster ====== ====== The Lamarr GPU Cluster ======
  
-We are building and expanding a GPU Cluster as part of the [[https://lamarr-institute.org/|Lamarr Institute]]. At the moment, the cluster consists of seven nodes with the following basic configuration:+We are building and expanding a GPU Cluster as part of the [[https://lamarr-institute.org/|Lamarr Institute]]. At the moment, the cluster consists of nine nodes with the following basic configuration:
  
-AMD Epyc 7713 (64 Cores)\\+AMD Epyc 7713 Processors (64 Cores * 2 Threads each => 256 Cores per Node, 248 available for jobs)\\
 2TB System-RAM\\ 2TB System-RAM\\
 28TB local SSD storage\\ 28TB local SSD storage\\
Line 32: Line 32:
 container backend for SLURM. If you need a personalized library or OS  container backend for SLURM. If you need a personalized library or OS 
 setup around your software, containers will be your way of choice. setup around your software, containers will be your way of choice.
 +
 +As a general rule, realistic resource allocation for your jobs is **mandatory**. You are only allowed to request the resources from the workload manager that you are actually needing to complete your calculations. Wasting precious machine time and making other users wait in line needlessly will lead to revocation of access to the cluster.
  
 ===== SLURM partitions ===== ===== SLURM partitions =====
Line 90: Line 92:
 If you want to share bigger datasets with other users on the cluster you can request shared storage on the cluster. It will be located under /home/Shared/. Please specify the usernames of all users that should be able to access this data (and any other hepful informations for us regarding your reequest) in an [[gsg+lmgpu@informatik.uni-bonn.de|informal request email]]. We will then create a user group for the specified users, create the directory and set the necessary permissions up for you. If you want to share bigger datasets with other users on the cluster you can request shared storage on the cluster. It will be located under /home/Shared/. Please specify the usernames of all users that should be able to access this data (and any other hepful informations for us regarding your reequest) in an [[gsg+lmgpu@informatik.uni-bonn.de|informal request email]]. We will then create a user group for the specified users, create the directory and set the necessary permissions up for you.
  
 +===== Requesting GPUs =====
  
 +If you want to allocate a certain number of GPUs for your job, use the parameter '--gres=gpu:COUNT' in your srun/squeue calls, where COUNT is the requested number of GPUs for your job.
 +
 +===== Email Notifications =====
 +
 +If you want to receive email notifications about the status of your jobs, you can specify a recipient email address at job submission, i.e.:
 +
 +sbatch --mail-type=ALL --mail-user=someuser@somedomain.com ...
 +
 +**Please specify a full (with domain!) and valid email address here.**
 +
 +
 +===== A Note about Interactive Jobs =====
 +
 +Please remember that interactive jobs often leave things unnecessarily idle (especially if forgotten and/or unterminated), so please think about your impatient colleagues in the queue behind you and try to avoid interactive jobs whenever possible.
  
 ===== Exchange with other users ===== ===== Exchange with other users =====
Line 97: Line 114:
  
 #lmgpu-tech:matrix.informatik.uni-bonn.de #lmgpu-tech:matrix.informatik.uni-bonn.de
 +