Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
pub:hpc:start [2023/07/24 20:28] – [Hellbender Partitions] bjmfg8pub:hpc:start [2024/10/15 17:41] (current) keelerm
Line 6: Line 6:
  
 ====Hellbender==== ====Hellbender====
-Hellbender is a traditional HPC research cluster built at MU in 2023 to support research efforts for UM researchers. Mizzou's Division of Research, Innovation & Impact (DRII) is the primary source of funding for the hardware and support of Hellbender. Hellbender is made up of 112 compute nodes containing 2 AMD 7713 processors and 512GB of RAM and 17 GPU nodes containing 4 Nvidia A100 80GB RAM GPUs, 2 Intel Xeon 6338 processors, and 256GB of system RAM.    
  
-DRII has made it clear the mission of Hellbender is to accelerate Mizzou Forward initiatives. There are 2 access levels made available for UM researchers, general and priority access. General access is free and available to all UM researchers. General access provides an equal share of at least 50% of the resources available to all users. Priority access provides dedicated access to some number of nodes on Hellbender and is available through either direct investment or DRII allocations. Direct investments are subsidized by DRII at a rate of 25% of the investment. For more specific details regarding access levels and costs for investment please see %%[[PUTLINKHERE|the computing model document]]%% for Hellbender.+Hellbender is a traditional HPC research cluster built at MU in 2023 to support research efforts for UM researchers. Mizzou's Division of Research, Innovation & Impact (DRII) is the primary source of funding for the hardware and support of Hellbender. Hellbender started as 112 compute nodes containing 2 AMD 7713 processors and 512GB of RAM and 17 GPU nodes containing 4 Nvidia A100 80GB RAM GPUs, 2 Intel Xeon 6338 processors, and 256GB of system RAM. Since then, it has expanded thanks to researcher investments and repurposing the newest portions of the previous HPC cluster. See a more detailed overview of Hellbender architecture at {{ :pub:hpc:hellbender_system_overview.pdf |}}. 
 + 
 +DRII has made it clear the mission of Hellbender is to accelerate Mizzou Forward initiatives. There are 2 access levels made available for UM researchers, general and priority access. General access is free and available to all UM researchers. General access provides an equal share of at least 50% of the resources available to all users. Priority access provides dedicated access to some number of nodes on Hellbender and is available through investment. 
  
 Requesting access to Hellbender can be done through our [[https://request.itrss.umsystem.edu|request form]]. Each form entry will need a faculty sponsor listed as the principal investigator for the group (PI) to be the primary contact for the request. The PI will be the responsible party for managing members  The form entry can also request access to our [[pub:rde:start|Research Data Ecosystem]] (RDE) at the same time as an HPC request or the RDE request can be made separately later if you find a need for it. Requesting access to Hellbender can be done through our [[https://request.itrss.umsystem.edu|request form]]. Each form entry will need a faculty sponsor listed as the principal investigator for the group (PI) to be the primary contact for the request. The PI will be the responsible party for managing members  The form entry can also request access to our [[pub:rde:start|Research Data Ecosystem]] (RDE) at the same time as an HPC request or the RDE request can be made separately later if you find a need for it.
 +
 +[[pub:hpc:hellbender|Hellbender Documentation]]
  
 ====The Foundry==== ====The Foundry====
Line 20: Line 23:
 In all publications or products resulting from work performed using the Foundry the NSF Grant which provided funding for the Foundry must be acknowledged. This can be done by adding this sentence to the publication “This work was supported in part by the National Science Foundation under Grant No. OAC-1919789.” Or something to this effect. In all publications or products resulting from work performed using the Foundry the NSF Grant which provided funding for the Foundry must be acknowledged. This can be done by adding this sentence to the publication “This work was supported in part by the National Science Foundation under Grant No. OAC-1919789.” Or something to this effect.
  
 +[[pub:hpc:foundry|Foundry Documentation]]
  
 +====Nautilus====
 +Researchers with workflows that involve AI, machine learning, simulation, or similar computation that can be parallelized at the job level may be interested to know of another available HPC resource: the NSF National Research Platform Nautilus cluster. NSF grants orchestrated by faculty at the University of Missouri and in the Great Plains Network have contributed substantially to this resource.
 + 
 +The Nautilus HPC System is a public cluster utilizing Kubernetes containerization. Its resources include 1,352 compute nodes, 32 NVIDIA A100 nodes, 26 petabytes of DDN storage and a 200 Gbps NVIDIA Mellanox InfiniBand interconnect.
 + 
 +Users will need to be learn how to use Kubernetes and containers, a very different system than the SLURM HPC Systems we have at the University, but may find more resource availability.  Nautilus also differs from other NSF programs (like ACCESS) as it is not based on proposals and approvals to gain access. All resources requested are expected to be used, so there is a strong need for users to understand what their jobs require. There is GitHub documentation to assist with these learning points.
 + 
 +Data is expected to be DCL 1 or 2; higher classifications of data are not appropriate for this cluster.  Researchers should understand that the work they are doing on Nautilus should be considered open to the public.
 + 
 +If you think that Nautilus may be a resource that can help your research excel, please let us know of your interest.
  
  
  
 =====General Policies===== =====General Policies=====
 +
 The following are RSS policies and guidelines for different services and groups: The following are RSS policies and guidelines for different services and groups:
  
Line 30: Line 45:
  
 ==Open Source Software== ==Open Source Software==
 +
 Software installed cluster-wide must have an open source (https://opensource.org/licenses) license or be obtained utilizing the procurement process even if there is not a cost associated with it. Software installed cluster-wide must have an open source (https://opensource.org/licenses) license or be obtained utilizing the procurement process even if there is not a cost associated with it.
  
 ==Licensed Software== ==Licensed Software==
 +
 Licensed software (any software that requires a license or agreement to be accepted) must follow the procurement process to protect users, their research, and the University. To ensure this, for RSS to install and support licensed software RSS must manage the license and the license server. Licensed software (any software that requires a license or agreement to be accepted) must follow the procurement process to protect users, their research, and the University. To ensure this, for RSS to install and support licensed software RSS must manage the license and the license server.
  
Line 39: Line 56:
 ==Singularity Support== ==Singularity Support==
 A majority of scientific software and software libraries can be installed in users’ accounts or in group space. We also provide limited support for Singularity (https://sylabs.io/docs/) for advanced users who require more control over their computing environment. We cannot knowingly assist users to install software that may put them, the University, or their intellectual property at risk. A majority of scientific software and software libraries can be installed in users’ accounts or in group space. We also provide limited support for Singularity (https://sylabs.io/docs/) for advanced users who require more control over their computing environment. We cannot knowingly assist users to install software that may put them, the University, or their intellectual property at risk.
 +
 ====Use Policy==== ====Use Policy====
 +
 ====Data Policy==== ====Data Policy====
  
-=====Connecting===== +=== Research Data Ecosystem Storage (RDE) Allocation Model === 
-To connect to Hellbender please first make sure that you have an accountTo get an account please fill out our account request form [FORM LINK HERE].+ 
 +The Research Data Ecosystem (RDE) is a collection of technical resources supporting research data. These resources are intended to meet researcher needs for physical storage, data movement, and institutional management. They are provided in partnership with the Division of Research Innovation and Impact (DRII) and are intended to work in conjunction with DRII policy drivers and priorities.  
 +The Ecosystem brings together storage platforms, data movement, metadata management, data protection, technical and practice expertise, and administration to fully support the research data lifecycle.  
 +DRII has invested in RDE as an institutional resource. Details of the specific underlying platforms may change over time, but always directed towards ease of use, access, and performance to purpose.  
 +Throughout May 2023, portions of the RDE are moving into production. These include: 
 + 
 +  * On-premises high-performance and general-purpose research storage  
 +  * Globus for advanced data movement  
 +  * Specialized need consultation 
 +  * Data archival 
 + 
 +These resources work in conjunction with RSS services related to grant support, HPC infrastructure, and data management plan development. Capabilities that are not yet generally available but in development include: 
 + 
 +  * Data backup 
 +  * Data analytics and reporting 
 + 
 +Additionally, effective use of some resources may require changes to network architecture, so additional limitations may apply at this time.   
 +We invite researchers needing solutions (including those dependent on resources not yet generally available) to consult with RSS. We may be able to find effective workarounds or make use of pilot projects when appropriate. We are committed to finding solutions supporting your research productivity needs! 
 + 
 +=== Resources available to all researchers: RDE Standard Allocation === 
 + 
 +All researchers are eligible for the RDE Standard Allocation. The Standard Allocation provides a base level of storage suitable for use in connection with High Performance Computing clusters, or for general-purpose lab shares (SMB or NFS). The exact capacity is subject to change based on utilization and DRII directionSee the appendices for current specifications. 
 + 
 +=== RDE Advanced Allocation === 
 + 
 +For needs beyond the RDE Standard Allocation, researchers may request one or more RDE Advanced Allocations. Advanced Allocations can provide for larger or specialized research storage needs. Storage is provided at a per-TB/per-year rate which is subject to change under DRII guidance. See the appendices for current rates and how the cost model is being implemented. RDE Advanced Allocations should be associated with research services or defined projects. Research Cores, Research Centers, labs providing services to other labs, and RSS may be considered research services. Defined projects include sponsored programs or otherwise well-defined initiatives.  
 +All RDE allocations must include appropriate data management planning. A plan may be a formal Data Management Plan associated with a grant, or an operational workflow description appropriate for a core or service entity, as long as data protection, transfer, and disposition requirements are documented.   
 +Advanced allocations require consultation with RSS. RSS will work with researchers to match allocated resources with capacity, performance, protection, and connectivity needs. 
 + 
 +**Priority Designation** 
 + 
 +RDE Advanced Allocations are eligible for DRII Priority Designation. This means DRII has determined the proposed use case (such as a core or grant-funded project) presents a strategic advantage or high priority service to the University and agrees to subsidize the resources assigned in that designation. DRII is responsible for determination of criteria for Priority Designation. 
 + 
 +**Traditional Investment** 
 + 
 +RDE Advanced Allocations that are not approved for DRII Priority Designation or that inherently receive funding for storage may be treated as traditional investments, with the researcher paying for the allocation at the defined rate. 
 + 
 +**Data Compliance** 
 + 
 +By default, the RDE environment supports data classified as DCL 1 or 2. It may be possible to arrange for a higher DCL, but this must be vetted and approved by appropriate security and compliance authorities.
  
-Once you have been notified by the RSS team that your account has been created on Hellbender, open a terminal and type in ssh [user_id]@hellbender-login.rnet.missouri.edu. Using your UM-system password you will be able to login directly to Hellbender if you are on campus or on the VPN. +**Allocation Maintenance**
-Once connected you will land on the login node and will see a screen similar to this: +
-[HELLBENDER LANDING PAGE EXAMPLE].+
  
-You are now on the login node and are ready to proceed to submit jobs and work on the cluster. +Researchers are expected to ensure allocation information is kept currentAnnual confirmation that the allocation is still needed will be required for all Standard AllocationsFor lab (group) and Advanced Allocations, annual vetting of group membership will be required, as well as updates to data management planning if changes (duration, disposition, etc.) are needed.
-====SSH==== +
-If you won't be primarily connecting to Hellbender from on campus and do not want to use the VPN - another option is to use public/private key authenticationYou can add your ssh keypairs to any number of computers and they will be able to access Hellbender from outside the campus network.+
  
-===Generating an SSH Key on Windows=== +=== Appendix ===
-  -  To generate an SSH key on a Windows computer - you will need to first download a terminal program - we suggest MobaXterm (https://mobaxterm.mobatek.net/). +
-  -  Once you have MobaXterm downloaded - start a new session by selecting "Start Local Terminal" +
-  - [Insert local terminal mobaxterm image here]. +
-  - Type 'ssh-keygen' and when prompted press enter to save the key in the default location (/home/<username>/.ssh/id_rsa) then enter a strong passphrase (required) twice. +
-  - After you generate your key - you will need to send us the public key. To see what your public key is you can type: 'cat ~/.ssh/id_rsa.pub'. The output will be a random string of characters and numbers. Please copy this information and send to RSS and we will add your key to your account.+
  
-===Generating an SSH Key on MacOS/Linux=== +**Appendix ARDE Standard Allocation**
-  - Open your terminal application of choice +
-  - Type 'ssh-keygen' and when prompted press enter to save the key in the default location (/home/<username>/.ssh/id_rsa) then enter a strong passphrase (required) twice. +
-  - After you generate your key - you will need to send us the public key. To see what your public key is you can type'cat ~/.ssh/id_rsa.pub'. The output will be a random string of characters and numbers. Please copy this information and send to RSS and we will add your key to your account.+
  
-====Hellbender Partitions====+  * Individual researcher: 500GB (In addition to 50GB home directory space for HPC users)  
 +  * Lab group: 5TB  
 +  * Duration/renewal cycle: Annual
  
-===Partition Usage Guidelines===+**Appendix B: RDE Advanced Allocation**
  
-==General== +  * Capacities and duration determined in consultation.  
-The general partition is intended for non-investors to run multi-node, multi-day jobs+  * Cost per TB per year (equipment/licensing)$95 for high performance storage, $25 for general purpose storage*  
-  * default time limit1 hour +  * **Supplemental services** 
-  * maximum time limit: 7 days+    * Snapshotting (pending implementation and potential cost evaluation)  
 +    * Performance optimization  
 +    * Backup (pending implementation and potential cost evaluation)  
 +    * Archival (pending implementation and potential cost evaluation)  
 +    * Globus endpoint
  
-==Requeue== +**Note**All currently available storage is high performance. As capacity is consumed, general purpose (lower tier) storage will be added to the hardware environment and data priced as “general purpose” will be subject to automatic migration to the lower tier. 
-The requeue partition is intended for non-investor jobs that have been re-queued due to their landing on an investor-owned node. +
-  default time limit: 10 minutes +
-  maximum time limit7 days+
  
-==Gpu== 
-The Gpu partition is composed of Nvidia A100 cards (4 per node). Acceptable use includes jobs that utilize a GPU for the majority of the run 
-  * default time limit: 1 hour 
-  * maximum time limit: 7 days 
  
-==Interactive== +=====Getting Help===== 
-This partition is designed for short interactive testing, interactive debugging, and general interactive jobs. Please use this for light testing as opposed to the login node. +====Office Hours====
-  * default time limit: 1 hour +
-  * maximum time limit: 7 days+
  
 +RSS office hours are now virtual. In person library RSS office hours have been suspended until further notice. Our team will be still available to help during the office hours via Zoom.
  
-==Logical_cpu== 
  
-=====Job submission=====+^Office Hours ^Date and Time ^Location ^ 
 +|RSS |Wed 10:00 - 12:00 |https://umsystem.zoom.us/j/910886067| 
 +|Engineering/RSS |Mon, Tue, Thu 2:00 - 4:00 |https://umsystem.zoom.us/j/910886067| 
 +|BioCompute |Please message RSS or join |zoom above for Biocompute questions| 
 +Note that above Zoom links are password protected, please contact us for receiving sessions password.
  
-=====Data Transfer=====+We are also available to answer questions outside of these hours via email: itrss-support@umsystem.edu
  
-====Open On Demand====+====Grant Proposal Assistance====
  
 +The RSS team is here to help with grants. We offer consultations and project reviews that include but are not limited to:
  
 +  * **Security Reviews**
 +  * **Vendor Quotes**
 +    * We work with university approved vendors to get preferred pricing and support
 +  * **Letters of Support**
 +    * Some grants require proof of campus IT knowledge/support of the proposal
 +  * **Regional Partnerships**
 +    * We are active members in several different regional network groups (Great Plains Network, CIMUSE, Campus Champions and more) that can be assets in finding partnerships for multi-institution projects.
 +  * **Data Management Plans**
 +    * MU Libraries has good resources for DMP's for many of the most common granting agencies [[https://libraryguides.missouri.edu/datamanagement| MU Libraries Data Management]]
 +  * **Facilities Description**  {{ :pub:hpc:RSS Overview.docx |(available here) }}: