Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
pub:hpc:hellbender [2026/02/20 17:37] – [Moving Data] redmonppub:hpc:hellbender [2026/04/21 16:15] (current) – [What is Hellbender?] redmonp
Line 9: Line 9:
 **Hellbender** is the latest High Performance Computing (HPC) resource available to researchers and students (with sponsorship by a PI) within the UM-System. **Hellbender** is the latest High Performance Computing (HPC) resource available to researchers and students (with sponsorship by a PI) within the UM-System.
  
-**Hellbender** consists of 222 mixed x86-64 CPU nodes providing 22,272 cores as well as 40 GPU nodes consisting of mix of Nvidia GPU'(see hardware section for more details). Hellbender is attached to our Research Data Ecosystem ('RDE') that consists of 8PB of high performance and general purpose research storage. RDE can be accessible from other devices outside of Hellbender to create a single research data location across different computational environments.+**Hellbender** consists of 263 mixed x86-64 CPU nodes providing 24844 cores as well as 41 GPU nodes with total of 132 GPUs consisting of Nvidia GPU'including Volta, Ampere, and Hopper generations, with more details at [[https://docs.itrss.umsystem.edu/pub/hpc/hellbender#hardware]]. Hellbender is attached to our Research Data Ecosystem ('RDE') that consists of 8PB of high performance and general purpose research storage. RDE can be accessible from other devices outside of Hellbender to create a single research data location across different computational environments.
  
 ==== Investment Model ==== ==== Investment Model ====
Line 25: Line 25:
  
 General access will be open to any research or teaching faculty, staff, and students for any UM system campus. General access is defined as open access to all resources available to users of the cluster at an equal fairshare value. This means that all users will have the same level of access to the general resource. General access will be open to any research or teaching faculty, staff, and students for any UM system campus. General access is defined as open access to all resources available to users of the cluster at an equal fairshare value. This means that all users will have the same level of access to the general resource.
-Research users of the general access portion of the cluster will be given the RDE Standard Allocation to operate from. Larger storage allocations will be provided through RDE Advanced Allocations, and independent of HPC priority status+Research users of the general access portion of the cluster will be given the RDE Standard Allocation to operate from. Larger storage allocations can be attained via investment, and independent of HPC priority status.
- +
-=== Hellbender Advanced: Priority Access === +
- +
-When researcher needs are not being met at the general access level, researchers may request an advanced allocation on Hellbender to gain priority access. Priority access will give research groups a limited set of resources that will be available to them without competition from  general access users. Priority Access will be provided to a specific set of hardware through a priority partition which contains these resources. This partition will be created, and limited to use by the user and their associated group. These resources will also be in an overlapping pool of resources available to general access users. This pool will be administered such that if a priority access user submits jobs to their priority access partition, any jobs running on those resources from the overlapping partition will be requeued and begin execution again on another resource in that partition if available, or return to wait in the queue for resources. Priority access users will retain general access status, fairshare will still play a part in moderating their access to the general resource. Fairshare inside a priority partition determine which user’s jobs are selected for execution next inside this partition. The jobs running inside this priority partition will also affect a user’s fairshare calculations even for resources in the general access partition. Meaning that running a large amount of jobs inside a priority partition will lower a user’s priority for the general resources as well. +
- +
-=== Traditional Investment === +
- +
-Hellbender Advanced Allocation requests that are not approved for DRII Priority Designation may be treated as traditional investments with the researcher paying for the resources used to create the Advanced Allocation at the defined rate. These rates are subject to change based on the determination of DRII, and hardware costs.+
  
 === Resource Management === === Resource Management ===
Line 42: Line 34:
  
 Priority access resources will generally be made available from existing hardware in the general access pool and the funds will be retained for a future time to allow a larger pool of funds to accumulate for expansion of the resource. This will allow the greatest return on investment over time. If the general availability resources are less than 50% of the overall resource, an expansion cycle will be initiated to ensure all users will still have access to a significant amount of resources. If a researcher or research group is contributing a large amount of funding, it may trigger an expansion cycle if that is determined to be advantageous at the time of the contribution. Priority access resources will generally be made available from existing hardware in the general access pool and the funds will be retained for a future time to allow a larger pool of funds to accumulate for expansion of the resource. This will allow the greatest return on investment over time. If the general availability resources are less than 50% of the overall resource, an expansion cycle will be initiated to ensure all users will still have access to a significant amount of resources. If a researcher or research group is contributing a large amount of funding, it may trigger an expansion cycle if that is determined to be advantageous at the time of the contribution.
 +
 +=== Hellbender Advanced: Priority Access - Investment ===
 +
 +When researcher needs are not being met at the general access level, researchers may request an advanced allocation on Hellbender to gain priority access via investment. Priority access will give research groups a limited set of resources that will be available to them without competition from  general access users. Priority Access will be provided to a specific set of hardware through a priority partition which contains these resources. This partition will be created, and limited to use by the user and their associated group. These resources will also be in an overlapping pool of resources available to general access users. This pool will be administered such that if a priority access user submits jobs to their priority access partition, any jobs running on those resources from the overlapping partition will be requeued and begin execution again on another resource in that partition if available, or return to wait in the queue for resources. Priority access users will retain general access status, fairshare will still play a part in moderating their access to the general resource. Fairshare inside a priority partition determine which user’s jobs are selected for execution next inside this partition. The jobs running inside this priority partition will also affect a user’s fairshare calculations even for resources in the general access partition. Meaning that running a large amount of jobs inside a priority partition will lower a user’s priority for the general resources as well.
  
 === Benefits of Investing === === Benefits of Investing ===
Line 70: Line 66:
 ==== How Much Does Investing Cost? ==== ==== How Much Does Investing Cost? ====
  
-See our rates for FY 2024-2025:+See our rates for FY 2025-2026:
  
 ^ Service                              ^ Rate        ^ Unit         ^ Support        ^ ^ Service                              ^ Rate        ^ Unit         ^ Support        ^
Line 169: Line 165:
 **The 2025 pricing is: General Storage: $25/TB/Year, High Performance Storage: $95/TB/Year** **The 2025 pricing is: General Storage: $25/TB/Year, High Performance Storage: $95/TB/Year**
  
-To order storage please fill out our [[https://missouri.qualtrics.com/jfe/form/SV_6zkkwGYn0MGvMyO| RSS Services Order Form]]+To order storage please fill out our [[https://tdx.umsystem.edu/TDClient/36/DoIT/Requests/ServiceDet?ID=605| RSS Services Order Form]]
  
 ==== Research Data Archive ==== ==== Research Data Archive ====
Line 274: Line 270:
 === GPU nodes === === GPU nodes ===
  
-| **Model**   | **Nodes** | **Cores/Node** | **System Memory** | **GPU**  | **GPU Memory** | **GPUs/Node** | **Local Scratch** | **Cores** | **Node Names** |+| **Model**   | **Nodes** | **Cores/Node** | **System Memory** | **GPU**  | **GPU Memory/GPU** | **GPUs/Node** | **Local Scratch** | **Cores** | **Node Names** |
 | Dell R750xa | 17        | 64             | 490 GB            | A100     | 80 GB          | 4        | 1.6 TB            | 1088    | g001-g017      | | Dell R750xa | 17        | 64             | 490 GB            | A100     | 80 GB          | 4        | 1.6 TB            | 1088    | g001-g017      |
 | Dell XE8640 | 2         | 104            | 2002 GB           | H100     | 80 GB          | 4        | 3.2 TB            | 208     | g018-g019      | | Dell XE8640 | 2         | 104            | 2002 GB           | H100     | 80 GB          | 4        | 3.2 TB            | 208     | g018-g019      |
Line 284: Line 280:
 | Dell R760xa | 6         | 64             | 490 GB            | H100     | 94 GB          | 2        | 1.8 TB            | 384      | g029-g034  | | Dell R760xa | 6         | 64             | 490 GB            | H100     | 94 GB          | 2        | 1.8 TB            | 384      | g029-g034  |
 | Dell R760 | 6         | 64             | 490 GB            | L40S     | 45 GB          | 2        | 3.5 TB            | 384      | g035-g040  | | Dell R760 | 6         | 64             | 490 GB            | L40S     | 45 GB          | 2        | 3.5 TB            | 384      | g035-g040  |
-|        |                          |                            | Total GPU      | 124      | Total Cores       2476                   |+| Dell XE9680 | 1         | 96             | 2048 GB            | H200     | 141 GB          | 8        | 28 TB            | 96      | g041  | 
 +|        |                          |                            | Total GPU      | 132      | Total Cores       2572                   |
  
 A specially formatted sinfo command can be ran on Hellbender to report live information about the nodes and the hardware/features they have. A specially formatted sinfo command can be ran on Hellbender to report live information about the nodes and the hardware/features they have.