This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| pub:hpc:hellbender [2025/01/31 16:04] – [Tutorial: Globus File Manager] bjmfg8 | pub:hpc:hellbender [2025/11/05 21:53] (current) – [Links] bjmfg8 | ||
|---|---|---|---|
| Line 3: | Line 3: | ||
| **Request an Account:** | **Request an Account:** | ||
| You can request an account for access to Hellbender by filling out the form found at: | You can request an account for access to Hellbender by filling out the form found at: | ||
| - | [[https://request.itrss.umsystem.edu/ | + | [[https://tdx.umsystem.edu/ |
| ==== What is Hellbender? ==== | ==== What is Hellbender? ==== | ||
| Line 9: | Line 9: | ||
| **Hellbender** is the latest High Performance Computing (HPC) resource available to researchers and students (with sponsorship by a PI) within the UM-System. | **Hellbender** is the latest High Performance Computing (HPC) resource available to researchers and students (with sponsorship by a PI) within the UM-System. | ||
| - | **Hellbender** consists of 208 mixed x86-64 CPU nodes (112 AMD, 96 Intel) | + | **Hellbender** consists of 222 mixed x86-64 CPU nodes providing |
| ==== Investment Model ==== | ==== Investment Model ==== | ||
| Line 73: | Line 73: | ||
| ^ Service | ^ Service | ||
| - | |Hellbender CPU Node* | $2,702.00 | Per Node/Year | Year to Year | | + | |Hellbender CPU Node | $2,702.00 | Per Node/Year | Year to Year | |
| - | |Hellbender GPU Node* | $7,691.38 | Per Node/Year | Year to Year | | + | |Hellbender |
| + | |Hellbender L40s GPU Node* | $4,785.00 | Per Node/Year | Year to Year | | ||
| + | |Hellbender H100 GPU Node* | $13, | ||
| |RDE Storage: High Performance | $95.00 | Per TB/Year | Year to Year | | |RDE Storage: High Performance | $95.00 | Per TB/Year | Year to Year | | ||
| |RDE Storage: General Performance | $25.00 | Per TB/Year | Year to Year | | |RDE Storage: General Performance | $25.00 | Per TB/Year | Year to Year | | ||
| - | ***Update | + | ***Update |
| Line 89: | Line 91: | ||
| * When running on the ' | * When running on the ' | ||
| * When running on the ' | * When running on the ' | ||
| - | * To get started please fill out our [[https://request.itrss.umsystem.edu/ | + | * To get started please fill out our [[https://tdx.umsystem.edu/ |
| - **Paid access (Investor) tier compute**: | - **Paid access (Investor) tier compute**: | ||
| Line 99: | Line 101: | ||
| * All accounts are given 50GB of storage in /home/$USER as well as 500GB in / | * All accounts are given 50GB of storage in /home/$USER as well as 500GB in / | ||
| * MU PI's are eligible for 1 free 5TB group storage in our RDE environment | * MU PI's are eligible for 1 free 5TB group storage in our RDE environment | ||
| - | * To get started please fill our our general [[https://request.itrss.umsystem.edu/ | + | * To get started please fill our our general [[https://tdx.umsystem.edu/ |
| - **Paid access (Investor) tier storage**: | - **Paid access (Investor) tier storage**: | ||
| Line 119: | Line 121: | ||
| | Dell C6525 | 112 | 128 | 490 GB | 1.6 TB | 14336 | c001-c112 | | Dell C6525 | 112 | 128 | 490 GB | 1.6 TB | 14336 | c001-c112 | ||
| - | **The 2024 pricing is: $2,702 per node per year.** | + | **The 2025 pricing is: $2,702 per node per year.** |
| ==== GPU Node Lease ==== | ==== GPU Node Lease ==== | ||
| Line 126: | Line 128: | ||
| The investment structure for GPU nodes is the same as CPU - per node per year. f you have funds available that you would like to pay for multiple years up front we can accommodate that. Once Hellbender has hit 50% of the total GPU nodes in the cluster being investor-owned we will restrict additional leases until more nodes become available via either purchase or surrendered by other PI's. The GPU nodes available for investment comprise of the following: | The investment structure for GPU nodes is the same as CPU - per node per year. f you have funds available that you would like to pay for multiple years up front we can accommodate that. Once Hellbender has hit 50% of the total GPU nodes in the cluster being investor-owned we will restrict additional leases until more nodes become available via either purchase or surrendered by other PI's. The GPU nodes available for investment comprise of the following: | ||
| - | | Model | # Nodes | Cores/Node | System Memory | GPU | GPU Memory | # GPU | Local Scratch | # Core | + | | Model | # Nodes | Cores/Node | System Memory | GPU | GPU Memory | # GPU/Node | Local Scratch | # Cores |
| - | | Dell R740xa | 17 | 64 | + | | Dell R740xa | 17 | 64 |
| + | | Dell R740xa | 6 | 64 | 490 GB | H100 | 94 GB | 2 | 1.8 TB | 384 | ||
| + | | Dell R760 | 6 | 64 | 490 GB | L40S | 45 GB | 2 | 3.5 TB | 384 | ||
| - | **The 2024 pricing is: $7,692 per node per year.** | + | |
| + | | ||
| + | | ||
| ==== Storage: Research Data Ecosystem (' | ==== Storage: Research Data Ecosystem (' | ||
| Line 138: | Line 144: | ||
| * Storage lab allocations are protected by associated security groups applied to the share, with group member access administered by the assigned PI or appointed representative. | * Storage lab allocations are protected by associated security groups applied to the share, with group member access administered by the assigned PI or appointed representative. | ||
| - | **What is the Difference between High Performance and General Performance Storage?** | + | **What is the Difference between High Performance and General Performance Storage? ** |
| - | On Pixstor, which is used for standard HPC allocations, | + | On Pixstor, which is used for standard HPC allocations, |
| On VAST, which is used for non HPC and mixed HPC / SMB workloads, the disks are all flash but general storage allocations have a QOS policy attached that limits IOPS to prevent the share from the possibility of saturating the disk pool to the point where high-performance allocations are impacted. | On VAST, which is used for non HPC and mixed HPC / SMB workloads, the disks are all flash but general storage allocations have a QOS policy attached that limits IOPS to prevent the share from the possibility of saturating the disk pool to the point where high-performance allocations are impacted. | ||
| Line 153: | Line 159: | ||
| * Workloads that require sustained use of low latency read and write IO with multiple GB/s, generally generated from jobs utilizing multiple NFS mounts | * Workloads that require sustained use of low latency read and write IO with multiple GB/s, generally generated from jobs utilizing multiple NFS mounts | ||
| + | |||
| + | **Snapshots** | ||
| + | |||
| + | *VAST default policy retains 7 daily and 4 weekly snapshots for each share | ||
| + | *Pixstor default policy is 10 daily snapshots | ||
| **__None of the cluster attached storage available to users is backed up in any way by us__**, this means that if you delete something and don't have a copy somewhere else, it is gone. Please note the data stored on cluster attached storage is limited to Data Class 1 and 2 as defined by [[https:// | **__None of the cluster attached storage available to users is backed up in any way by us__**, this means that if you delete something and don't have a copy somewhere else, it is gone. Please note the data stored on cluster attached storage is limited to Data Class 1 and 2 as defined by [[https:// | ||
| - | **The 2024 pricing is: General Storage: $25/ | + | **The 2025 pricing is: General Storage: $25/ |
| To order storage please fill out our [[https:// | To order storage please fill out our [[https:// | ||
| Line 169: | Line 180: | ||
| **Costs** | **Costs** | ||
| - | The cost associated with using the RDE tape archive is $8/TB for short term data kept in inside the tape library for 1-3 years or $140 per tape rounded to the number of tapes for tapes sent offsite for long term retention up to 10 years. We send these tapes off to record management where they are stored in a climate-controlled environment. Each tape from the current generation LTO 9 holds approximately 18TB of data These are flat onetime costs and you have the option to do both a short term in library copy, and a longer-term offsite copy, or one or the other, providing flexibility. | + | The cost associated with using the RDE tape archive is $8/TB for short term data kept in inside the tape library for 1-3 years or $144 per tape rounded to the number of tapes for tapes sent offsite for long term retention up to 10 years. We send these tapes off to record management where they are stored in a climate-controlled environment. Each tape from the current generation LTO 9 holds approximately 18TB of data These are flat onetime costs, and you have the option to do both a short term in library copy, and a longer-term offsite copy, or one or the other, providing flexibility. |
| **Request Process** | **Request Process** | ||
| Line 175: | Line 186: | ||
| To utilize the tape archive functionality that RSS has setup, the data to be archived will need to be copied to RDE storage if it does not exist there already. This would require the following steps. | To utilize the tape archive functionality that RSS has setup, the data to be archived will need to be copied to RDE storage if it does not exist there already. This would require the following steps. | ||
| * Submit a RDE storage request if the data resides locally and a RDE share is not already available to the researcher: [[http:// | * Submit a RDE storage request if the data resides locally and a RDE share is not already available to the researcher: [[http:// | ||
| - | * Create an archive folder or folders in the relevant RDE storage share to hold the data you would like to archive. The folder(s) can be named to signify the contents, but we ask that the name includes _archive at then end. For example, something akin to: labname_projectx_archive_2024. | + | * Create an archive folder or folders in the relevant RDE storage share to hold the data you would like to archive. The folder(s) can be named to signify the contents, but we ask that the name includes _archive at the end. For example, something akin to: labname_projectx_archive_2024. |
| * Copy the contents to be archived to the newly created archive folder(s) within the RDE storage share. | * Copy the contents to be archived to the newly created archive folder(s) within the RDE storage share. | ||
| - | * Submit a RDE tape Archive request: [[https://missouri.qualtrics.com/ | + | * Submit a RDE tape Archive request: [[https://archiverequest.itrss.umsystem.edu]] |
| - | * Once the tape archive jobs are completed ITRSS will notify you and send you a Archive job report after which you can delete the contents of the archive folder. | + | * Once the tape archive jobs are completed ITRSS will notify you and send you an Archive job report after which you can delete the contents of the archive folder. |
| - | * We request that subsequent archive jobs be added to a separate folder or the initial folder renamed to something that signifies the time of archive for easier retrieval *_archive2024, | + | * We request that subsequent archive jobs be added to a separate folder, or the initial folder renamed to something that signifies the time of archive for easier retrieval *_archive2024, |
| **Recovery** | **Recovery** | ||
| Line 221: | Line 232: | ||
| * **[[https:// | * **[[https:// | ||
| - | * **[[https:// | + | * **[[https:// |
| * **[[https:// | * **[[https:// | ||
| - | * **[[https:// | + | * **[[https:// |
| * **[[https:// | * **[[https:// | ||
| * **[[https:// | * **[[https:// | ||
| Line 233: | Line 244: | ||
| ==== Software ==== | ==== Software ==== | ||
| - | The Foundry | + | Hellbender |
| ==== Hardware ==== | ==== Hardware ==== | ||
| Line 249: | Line 260: | ||
| Dell C6420: .5 unit server containing dual 24 core Intel Xeon Gold 6252 CPUs with a base clock of 2.1 GHz. Each C6420 node contains 384 GB DDR4 system memory. | Dell C6420: .5 unit server containing dual 24 core Intel Xeon Gold 6252 CPUs with a base clock of 2.1 GHz. Each C6420 node contains 384 GB DDR4 system memory. | ||
| - | Dell R6620: 1 unit server containing dual 128 core AMD EPYC 9754 CPUs with a base clock of 2.25 GHz. Each R6620 node contains 1 TB DDR5 system memory. | + | Dell R6625: 1 unit server containing dual 128 core AMD EPYC 9754 CPUs with a base clock of 2.25 GHz. Each R6625 node contains 1 TB DDR5 system memory. |
| + | |||
| + | Dell R6625: 1 unit server containing dual 128 core AMD EPYC 9754 CPUs with a base clock of 2.25 GHz. Each R6625 node contains 6 TB DDR5 system memory. | ||
| | **Model** | | **Model** | ||
| | Dell C6525 | 112 | 128 | 490 GB | AMD EPYC 7713 64-Core | | Dell C6525 | 112 | 128 | 490 GB | AMD EPYC 7713 64-Core | ||
| - | | Dell R640 | 32 | 40 | + | | Dell R640 | 32 | 40 |
| - | | Dell C6420 | 64 | 48 | + | | Dell C6420 | 64 | 48 |
| - | | Dell R6620 | 12 | 256 | 1 TB | + | | Dell R6625 | 12 | 256 | 994 GB |
| - | | | | + | | Dell R6625 | 2 | 256 | 6034 GB | AMD EPYC 9754 128-Core Processor |
| + | | | | ||
| === GPU nodes === | === GPU nodes === | ||
| - | | **Model** | + | | **Model** |
| - | | Dell R740xa | + | | Dell R750xa |
| | Dell XE8640 | 2 | 104 | 2002 GB | H100 | 80 GB | 4 | 3.2 TB | 208 | g018-g019 | | Dell XE8640 | 2 | 104 | 2002 GB | H100 | 80 GB | 4 | 3.2 TB | 208 | g018-g019 | ||
| | Dell XE9640 | 1 | 112 | 2002 GB | H100 | 80 GB | 8 | 3.2 TB | 112 | g020 | | | Dell XE9640 | 1 | 112 | 2002 GB | H100 | 80 GB | 8 | 3.2 TB | 112 | g020 | | ||
| - | | Dell R730 | 4 | 20 | + | | Dell R730 | 4 | 20 |
| - | | Dell R7525 | 1 | + | | Dell R7525 | 1 |
| - | | Dell R740xd | 3 | 44 | 384 GB | V100 | 32 GB | 3 | 240 GB | 132 | g026-g028 | + | | Dell R740xd | 2 | 40 | 364 GB | V100 | 32 GB | 3 | 240 GB | 80 |
| - | | | + | | Dell R740xd | 1 | 44 | 364 GB | V100 | 32 GB | 3 | 240 GB | 44 | g028 | |
| + | | Dell R760xa | 6 | 64 | 490 GB | H100 | 94 GB | 2 | 1.8 TB | 384 | g029-g034* | ||
| + | | Dell R760 | 6 | 64 | ||
| + | | * = Available Oct 14 | ||
| A specially formatted sinfo command can be ran on Hellbender to report live information about the nodes and the hardware/ | A specially formatted sinfo command can be ran on Hellbender to report live information about the nodes and the hardware/ | ||
| Line 353: | Line 370: | ||
| ==== Open OnDemand ==== | ==== Open OnDemand ==== | ||
| - | * https:// | + | * https:// |
| - | * https:// | + | * https:// |
| OnDemand provides an integrated, single access point for all of your HPC resources. The following apps are currently available on Hellbender' | OnDemand provides an integrated, single access point for all of your HPC resources. The following apps are currently available on Hellbender' | ||
| Line 604: | Line 621: | ||
| Below is process for setting up a class on the OOD portal. | Below is process for setting up a class on the OOD portal. | ||
| - | - Send the class name, the list of students and TAs, and any shared storage requirements to itrss-support@umsystem.edu. | + | - Send the class name, the list of students and TAs, and any shared storage requirements to itrss-support@umsystem.edu. |
| - We will add the students to the group allowing them access to OOD. | - We will add the students to the group allowing them access to OOD. | ||
| - If the student does not have a Hellbender account yet, they will be presented with a link to a form to fill out requesting a Hellbender account. | - If the student does not have a Hellbender account yet, they will be presented with a link to a form to fill out requesting a Hellbender account. | ||
| Line 795: | Line 812: | ||
| **Documentation**: | **Documentation**: | ||
| + | |||
| + | ==== RStudio ==== | ||
| + | |||
| + | [[https:// | ||
| ==== Visual Studio Code ==== | ==== Visual Studio Code ==== | ||
| Line 908: | Line 929: | ||
| After you’ve signed up and logged in to Globus, you’ll begin at the File Manager. | After you’ve signed up and logged in to Globus, you’ll begin at the File Manager. | ||
| + | |||
| + | **note: | ||
| + | https:// | ||
| + | If symlinks need to be copied, consider using the rsync on the DTN with with the -l flag** | ||
| + | |||
| + | |||
| + | |||
| The first time you use the File Manager, all fields will be blank: | The first time you use the File Manager, all fields will be blank: | ||