This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
pub:hpc:start [2023/07/24 20:28] – [Hellbender Partitions] bjmfg8 | pub:hpc:start [2024/10/15 17:41] (current) – keelerm | ||
---|---|---|---|
Line 6: | Line 6: | ||
====Hellbender==== | ====Hellbender==== | ||
- | Hellbender is a traditional HPC research cluster built at MU in 2023 to support research efforts for UM researchers. Mizzou' | ||
- | DRII has made it clear the mission of Hellbender is to accelerate Mizzou Forward initiatives. There are 2 access levels made available for UM researchers, | + | Hellbender is a traditional HPC research cluster built at MU in 2023 to support research efforts for UM researchers. Mizzou' |
+ | |||
+ | DRII has made it clear the mission of Hellbender is to accelerate Mizzou Forward initiatives. There are 2 access levels made available for UM researchers, | ||
Requesting access to Hellbender can be done through our [[https:// | Requesting access to Hellbender can be done through our [[https:// | ||
+ | |||
+ | [[pub: | ||
====The Foundry==== | ====The Foundry==== | ||
Line 20: | Line 23: | ||
In all publications or products resulting from work performed using the Foundry the NSF Grant which provided funding for the Foundry must be acknowledged. This can be done by adding this sentence to the publication “This work was supported in part by the National Science Foundation under Grant No. OAC-1919789.” Or something to this effect. | In all publications or products resulting from work performed using the Foundry the NSF Grant which provided funding for the Foundry must be acknowledged. This can be done by adding this sentence to the publication “This work was supported in part by the National Science Foundation under Grant No. OAC-1919789.” Or something to this effect. | ||
+ | [[pub: | ||
+ | ====Nautilus==== | ||
+ | Researchers with workflows that involve AI, machine learning, simulation, or similar computation that can be parallelized at the job level may be interested to know of another available HPC resource: the NSF National Research Platform Nautilus cluster. NSF grants orchestrated by faculty at the University of Missouri and in the Great Plains Network have contributed substantially to this resource. | ||
+ | |||
+ | The Nautilus HPC System is a public cluster utilizing Kubernetes containerization. Its resources include 1,352 compute nodes, 32 NVIDIA A100 nodes, 26 petabytes of DDN storage and a 200 Gbps NVIDIA Mellanox InfiniBand interconnect. | ||
+ | |||
+ | Users will need to be learn how to use Kubernetes and containers, a very different system than the SLURM HPC Systems we have at the University, but may find more resource availability. | ||
+ | |||
+ | Data is expected to be DCL 1 or 2; higher classifications of data are not appropriate for this cluster. | ||
+ | |||
+ | If you think that Nautilus may be a resource that can help your research excel, please let us know of your interest. | ||
=====General Policies===== | =====General Policies===== | ||
+ | |||
The following are RSS policies and guidelines for different services and groups: | The following are RSS policies and guidelines for different services and groups: | ||
Line 30: | Line 45: | ||
==Open Source Software== | ==Open Source Software== | ||
+ | |||
Software installed cluster-wide must have an open source (https:// | Software installed cluster-wide must have an open source (https:// | ||
==Licensed Software== | ==Licensed Software== | ||
+ | |||
Licensed software (any software that requires a license or agreement to be accepted) must follow the procurement process to protect users, their research, and the University. To ensure this, for RSS to install and support licensed software RSS must manage the license and the license server. | Licensed software (any software that requires a license or agreement to be accepted) must follow the procurement process to protect users, their research, and the University. To ensure this, for RSS to install and support licensed software RSS must manage the license and the license server. | ||
Line 39: | Line 56: | ||
==Singularity Support== | ==Singularity Support== | ||
A majority of scientific software and software libraries can be installed in users’ accounts or in group space. We also provide limited support for Singularity (https:// | A majority of scientific software and software libraries can be installed in users’ accounts or in group space. We also provide limited support for Singularity (https:// | ||
+ | |||
====Use Policy==== | ====Use Policy==== | ||
+ | |||
====Data Policy==== | ====Data Policy==== | ||
- | =====Connecting===== | + | === Research Data Ecosystem Storage (RDE) Allocation Model === |
- | To connect | + | |
+ | The Research Data Ecosystem (RDE) is a collection of technical resources supporting research data. These resources are intended to meet researcher needs for physical storage, data movement, and institutional management. They are provided in partnership with the Division of Research Innovation and Impact (DRII) and are intended to work in conjunction with DRII policy drivers and priorities. | ||
+ | The Ecosystem brings together storage platforms, data movement, metadata management, data protection, technical and practice expertise, and administration to fully support the research data lifecycle. | ||
+ | DRII has invested in RDE as an institutional resource. Details of the specific underlying platforms may change over time, but always directed towards ease of use, access, and performance to purpose. | ||
+ | Throughout May 2023, portions of the RDE are moving into production. These include: | ||
+ | |||
+ | * On-premises high-performance and general-purpose research storage | ||
+ | * Globus for advanced data movement | ||
+ | * Specialized need consultation | ||
+ | * Data archival | ||
+ | |||
+ | These resources work in conjunction with RSS services related to grant support, HPC infrastructure, | ||
+ | |||
+ | * Data backup | ||
+ | * Data analytics and reporting | ||
+ | |||
+ | Additionally, | ||
+ | We invite researchers needing solutions (including those dependent on resources not yet generally available) to consult with RSS. We may be able to find effective workarounds or make use of pilot projects when appropriate. We are committed to finding solutions supporting your research productivity needs! | ||
+ | |||
+ | === Resources available to all researchers: | ||
+ | |||
+ | All researchers are eligible for the RDE Standard Allocation. The Standard Allocation provides a base level of storage suitable for use in connection with High Performance Computing clusters, or for general-purpose lab shares (SMB or NFS). The exact capacity is subject | ||
+ | |||
+ | === RDE Advanced Allocation === | ||
+ | |||
+ | For needs beyond the RDE Standard Allocation, researchers may request | ||
+ | All RDE allocations must include appropriate data management planning. A plan may be a formal Data Management Plan associated with a grant, or an operational workflow description appropriate for a core or service entity, as long as data protection, transfer, and disposition requirements are documented. | ||
+ | Advanced allocations require consultation with RSS. RSS will work with researchers to match allocated resources with capacity, performance, | ||
+ | |||
+ | **Priority Designation** | ||
+ | |||
+ | RDE Advanced Allocations are eligible for DRII Priority Designation. This means DRII has determined the proposed use case (such as a core or grant-funded project) presents a strategic advantage or high priority service to the University and agrees to subsidize the resources assigned in that designation. DRII is responsible for determination of criteria for Priority Designation. | ||
+ | |||
+ | **Traditional Investment** | ||
+ | |||
+ | RDE Advanced Allocations that are not approved for DRII Priority Designation or that inherently receive funding for storage may be treated as traditional investments, | ||
+ | |||
+ | **Data Compliance** | ||
+ | |||
+ | By default, the RDE environment supports data classified as DCL 1 or 2. It may be possible to arrange for a higher DCL, but this must be vetted and approved by appropriate security and compliance authorities. | ||
- | Once you have been notified by the RSS team that your account has been created on Hellbender, open a terminal and type in ssh [user_id]@hellbender-login.rnet.missouri.edu. Using your UM-system password you will be able to login directly to Hellbender if you are on campus or on the VPN. | + | **Allocation Maintenance** |
- | Once connected you will land on the login node and will see a screen similar to this: | + | |
- | [HELLBENDER LANDING PAGE EXAMPLE]. | + | |
- | You are now on the login node and are ready to proceed to submit jobs and work on the cluster. | + | Researchers |
- | ====SSH==== | + | |
- | If you won't be primarily connecting to Hellbender from on campus and do not want to use the VPN - another option | + | |
- | ===Generating an SSH Key on Windows=== | + | === Appendix |
- | - To generate an SSH key on a Windows computer - you will need to first download a terminal program - we suggest MobaXterm (https:// | + | |
- | - Once you have MobaXterm downloaded - start a new session by selecting "Start Local Terminal" | + | |
- | - [Insert local terminal mobaxterm image here]. | + | |
- | - Type ' | + | |
- | - After you generate your key - you will need to send us the public key. To see what your public key is you can type: 'cat ~/ | + | |
- | ===Generating an SSH Key on MacOS/ | + | **Appendix A: RDE Standard Allocation** |
- | - Open your terminal application of choice | + | |
- | - Type ' | + | |
- | - After you generate your key - you will need to send us the public key. To see what your public key is you can type: 'cat ~/ | + | |
- | ====Hellbender Partitions==== | + | * Individual researcher: 500GB (In addition to 50GB home directory space for HPC users) |
+ | * Lab group: 5TB | ||
+ | * Duration/ | ||
- | ===Partition Usage Guidelines=== | + | **Appendix B: RDE Advanced Allocation** |
- | ==General== | + | * Capacities and duration determined in consultation. |
- | The general partition is intended for non-investors to run multi-node, multi-day jobs. | + | * Cost per TB per year (equipment/ |
- | * default time limit: 1 hour | + | * **Supplemental services** |
- | * maximum time limit: 7 days | + | * Snapshotting (pending implementation and potential cost evaluation) |
+ | * Performance optimization | ||
+ | * Backup (pending implementation and potential cost evaluation) | ||
+ | * Archival (pending implementation and potential cost evaluation) | ||
+ | * Globus endpoint | ||
- | ==Requeue== | + | **Note**: All currently available storage is high performance. As capacity is consumed, general purpose (lower tier) storage will be added to the hardware environment and data priced as “general purpose” will be subject to automatic migration to the lower tier. |
- | The requeue partition is intended for non-investor jobs that have been re-queued due to their landing on an investor-owned node. | + | |
- | | + | |
- | | + | |
- | ==Gpu== | ||
- | The Gpu partition is composed of Nvidia A100 cards (4 per node). Acceptable use includes jobs that utilize a GPU for the majority of the run | ||
- | * default time limit: 1 hour | ||
- | * maximum time limit: 7 days | ||
- | ==Interactive== | + | =====Getting Help===== |
- | This partition is designed for short interactive testing, interactive debugging, and general interactive jobs. Please use this for light testing as opposed to the login node. | + | ====Office Hours==== |
- | * default time limit: 1 hour | + | |
- | * maximum time limit: 7 days | + | |
+ | RSS office hours are now virtual. In person library RSS office hours have been suspended until further notice. Our team will be still available to help during the office hours via Zoom. | ||
- | ==Logical_cpu== | ||
- | =====Job submission===== | + | ^Office Hours ^Date and Time ^Location ^ |
+ | |RSS |Wed 10:00 - 12: | ||
+ | |Engineering/ | ||
+ | |BioCompute |Please message RSS or join |zoom above for Biocompute questions| | ||
+ | Note that above Zoom links are password protected, please contact us for receiving sessions password. | ||
- | =====Data Transfer===== | + | We are also available to answer questions outside of these hours via email: itrss-support@umsystem.edu |
- | ====Open On Demand==== | + | ====Grant Proposal Assistance==== |
+ | The RSS team is here to help with grants. We offer consultations and project reviews that include but are not limited to: | ||
+ | * **Security Reviews** | ||
+ | * **Vendor Quotes** | ||
+ | * We work with university approved vendors to get preferred pricing and support | ||
+ | * **Letters of Support** | ||
+ | * Some grants require proof of campus IT knowledge/ | ||
+ | * **Regional Partnerships** | ||
+ | * We are active members in several different regional network groups (Great Plains Network, CIMUSE, Campus Champions and more) that can be assets in finding partnerships for multi-institution projects. | ||
+ | * **Data Management Plans** | ||
+ | * MU Libraries has good resources for DMP's for many of the most common granting agencies [[https:// | ||
+ | * **Facilities Description** | ||