Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
pub:hpc:mill [2024/05/10 16:59] – [Anaconda] added details for basic operations. May still revisit later. tvmwd6pub:hpc:mill [2025/10/07 22:24] (current) – Added network scratch as an option and edited local scratch to fit. tvmwd6
Line 1: Line 1:
 ====== The Mill ====== ====== The Mill ======
 ===== Request an account ===== ===== Request an account =====
-You can request an account on the Mill by filling out the form found at https://missouri.qualtrics.com/jfe/form/SV_e5n4I8cvzo77jr8+You can request an account on the Mill by filling out the account request form at help.mst.edu in the HPC Cluster category:[[https://tdx.umsystem.edu/TDClient/48/Portal/Requests/TicketRequests/NewForm?ID=PEUIinShX0k_&RequestorType=Service|help.mst.edu]]
 ===== System Information ===== ===== System Information =====
 +====DOI and Citing the Mill====
 +Please ensure you use The Mill's DOI on any publications in which The Mill's resources were utilized. The DOI is https://doi.org/10.71674/PH64-N397
 +
 +Please also feel free to use these files with your citation manager to create formatted citations.
 +
 +BibTex Citation:
 +<file bib mill_cluster_citation.bib>
 +@Article{Gao2024,
 +  author    = {Gao, Stephen and Maurer, Jeremy and Information Technology Research Support Solutions },
 +  title     = {The Mill HPC Cluster},
 +  year      = {2024},
 +  doi       = {10.71674/PH64-N397},
 +  language  = {en},
 +  publisher = {Missouri University of Science and Technology},
 +  url       = {https://scholarsmine.mst.edu/the-mill/1/},
 +}
 +</file>
 +RIS Citation:
 +<file ris mill_cluster_citation.ris>
 +TY  - JOUR
 +AU  - Stephen, Gao
 +AU  - Jeremy, Maurer
 +AU  - Solutions, Information Technology Research Support
 +DO  - 10.71674/PH64-N397
 +LA  - en
 +PU  - Missouri University of Science and Technology
 +PY  - 2024
 +ST  - The Mill HPC Cluster
 +TI  - The Mill HPC Cluster
 +UR  - https://scholarsmine.mst.edu/the-mill/1/
 +ER  - 
 +</file>
 ==== Software ==== ==== Software ====
 The Mill was built and managed with Puppet. The underlying OS for the Mill is Alma 8.9. For resource management and scheduling we are using SLURM Workload manager version 22.05.2 The Mill was built and managed with Puppet. The underlying OS for the Mill is Alma 8.9. For resource management and scheduling we are using SLURM Workload manager version 22.05.2
Line 22: Line 54:
 | Model | CPU Cores | System Memory | Node Count |  | Model | CPU Cores | System Memory | Node Count | 
 | Dell R6525 | 128 | 512 GB | 25 |  | Dell R6525 | 128 | 512 GB | 25 | 
-| Dell C6525 | 64 | 256 GB | 128 +| Dell C6525 | 64 | 256 GB | 160 
-| Dell C6420 | 40 | 192 GB | +| Dell C6420 | 40 | 192 GB | 44 
  
  
Line 33: Line 65:
 | Model | CPU cores | System Memory | GPU | GPU Memory | GPU Count | Node Count |  | Model | CPU cores | System Memory | GPU | GPU Memory | GPU Count | Node Count | 
 | Dell XE9680 | 112 | 1 TB | H100 SXM5 | 80 GB | 8 | 1 |  | Dell XE9680 | 112 | 1 TB | H100 SXM5 | 80 GB | 8 | 1 | 
-| Dell C4140 | 40 | 192 GB | V100 SXM2 | 32 GB | 4 | +| Dell C4140 | 40 | 192 GB | V100 SXM2 | 32 GB | 4 | 
 | Dell R740xd | 40 | 384 GB | V100 PCIe | 32 GB | 2 | 1 | | Dell R740xd | 40 | 384 GB | V100 PCIe | 32 GB | 2 | 1 |
  
Line 49: Line 81:
 The Mill home directory storage is available from an NFS share backed by our enterprise SAN, meaning your home directory is the same across the entire cluster. This storage will provide 10 TB of raw storage, limited to 50GB per user. **This volume is not backed up, we do not provide any data recovery guarantee in the event of a storage system failure.** System failures where data loss occurs are rare, but they do happen. All this to say, you ** should not ** be storing the only copy of your critical data on this system. Please contact us if you require more storage and we can provide you with the currently available options. The Mill home directory storage is available from an NFS share backed by our enterprise SAN, meaning your home directory is the same across the entire cluster. This storage will provide 10 TB of raw storage, limited to 50GB per user. **This volume is not backed up, we do not provide any data recovery guarantee in the event of a storage system failure.** System failures where data loss occurs are rare, but they do happen. All this to say, you ** should not ** be storing the only copy of your critical data on this system. Please contact us if you require more storage and we can provide you with the currently available options.
  
-==Scratch Directories==+==Scratch Directories== 
  
-High speed network scratch is not yet available at the time of writing+In addition to your 50GB home directory, you also have access to a high-performance network-mounted scratch storage space. This is meant for temporary storage of files that are currently being used for computations, not for anything resembling permanent or project storage. If there are intermediate results or files that you need to store between jobs that you will submit back-to-back, this is a good place to store such things. To change directory to this storage you can use the “cdsc” command which we have provided for ease of use. The path for this area is “/share/ceph/scratch/$USER” where $USER is the username you use to log in to the Mill. We may clean this space during maintenance and reserve the right to clean it immediately if that becomes necessary for continued quality of service, so make sure you are not storing anything there that you cannot afford to lose, or do not have backed up elsewhere
  
-There is always local scratch on each compute node for use during calculations in /local/scratch. On the Mill this is 1.5TB in size. There is no quota placed on this spaceand it is cleaned regularly as well, but things stored in this space will only be available to processes executing on the node in which they were created. Meaning if you create it in /local/scratch in a job, you won't be able to see it on the login node, and other processes won't be able to see it if they are on a different node than the process which created the file. +There is also local scratch on each compute node for use during calculations in /local/scratch. On the Mill this is 1.5TB in size. If you use this space we request that you add a cleanup command to your job to delete your files at the end of your job and make sure the space remains available for others. Things stored in this space will only be available to processes executing on the node in which they were created. Meaning if you create it in /local/scratch in a job, you won't be able to see it on the login node, and other processes won't be able to see it if they are on a different node than the process which created the file. 
  
 ==Leased Space== ==Leased Space==
  
-If home directory, and scratch space availability aren't enough for your storage needs we also lease out quantities of cluster attached space. If you are interested in leasing storage please contact us. If you already are leasing storage, but need a reference guide on how to manage the storage please go [[ ~:storage here]].+If home directory, and scratch space availability aren't enough for your storage needs we also lease out quantities of cluster attached space. If you are interested in leasing storage please contact us. Additional information on the STRIDE storage allocations can be found here [[https://mailmissouri.sharepoint.com/:b:/s/MUandSTITRSS-Ogrp/EVgQcegipVtPgs4Rc9tcqAcBxWf2qEXeZawWYSrco7sKpQ|STRIDE storage model]].  
 +Below is a cost model of our storage offerings: 
 + 
 +Vast Storage Cluster: 
 +| Total Size | 250 TB | 
 +| Storage Technology | Flash |  
 +| Primary Purpose | High Performance Computing Storage | 
 +| Cost | $160/TB/Year | 
 + 
 + 
 +Ceph Storage Cluster: 
 +| Total Size | 800 TB | 
 +| Storage Technology | Spinning Disk | 
 +| Primary Purpose | HPC-attached Utility Storage |  
 +| Cost | $100/TB/Year |
  
  
Line 81: Line 127:
 | general | 2 days | 800MB| | general | 2 days | 800MB|
 | gpu     | 2 days | 800MB| | gpu     | 2 days | 800MB|
 +| interactive | 4 hours | 800MB|
 +| rss-class | 4 hours | 2GB |
 | any priority partition | 28 days | varies by hardware| | any priority partition | 28 days | varies by hardware|
  
  
 +
 +==== Priority Partition Leasing ====
 +
 +For the full information on our computing model please visit this page on [[ https://mailmissouri.sharepoint.com/:b:/s/MUandSTITRSS-Ogrp/EcDtEFkTU6xPr2hh9ES4hCcBfDOiGto7OZqbonsU9m6qdQ?e=owPLpd&xsdata=MDV8MDJ8fGE4ZjUwZGQzMDU0MTRlNDAzNzAxMDhkY2MyMjgyYmMyfGUzZmVmZGJlZjdlOTQwMWJhNTFhMzU1ZTAxYjA1YTg5fDB8MHw2Mzg1OTg3MjQ5NjgzOTY1NDV8VW5rbm93bnxWR1ZoYlhOVFpXTjFjbWwwZVZObGNuWnBZMlY4ZXlKV0lqb2lNQzR3TGpBd01EQWlMQ0pRSWpvaVYybHVNeklpTENKQlRpSTZJazkwYUdWeUlpd2lWMVFpT2pFeGZRPT18MXxMM1JsWVcxekx6RTVPbUUwT0dWbVkyVXlOREF6WmpRM1lUazRNbUV6WkdKaE56ZzNNakV4WkRGalFIUm9jbVZoWkM1MFlXTjJNaTlqYUdGdWJtVnNjeTh4T1RwaVl6SmlOak14T1RZMllXVTBZell3WWpCbU5qZzJObUUzTjJZeU1tVTRORUIwYUhKbFlXUXVkR0ZqZGpJdmJXVnpjMkZuWlhNdk1UY3lOREkzTlRZNU5qRXdOQT09fDFhYzdjZjQ4Mzg4YTQwODQzNzAxMDhkY2MyMjgyYmMyfDdkNTA4MmU3OGJmOTQ5YmZiZGI1ZGFhMjMyZWMzMmQx&sdata=cUJJZ2hxMjVZc1VNeVowajEyV29sNG5ZcDJVcGtSNHdIODZLY1EwZm1QRT0%3D
 + | The Mill Computing Model ]] which will provide more information what a priority partition is.
 +
 +Below is a list of hardware which we have available for priority leases:
 +
 +
 +
 +| | C6525 | R6525 | C4140|
 +| CPU type | AMD 7502 | AMD 7713 | Intel 6248 |
 +| CPU count | 2 | 2 | 2 | 
 +| Core count | 64 | 128 | 40 |
 +| Base Clock (GHz) | 2.5 | 2.0 | 2.5 |
 +| Boost Clock (GHz) | 3.35 | 3.675 | 3.2 |
 +| GPU | N/A | N/A | Nvidia V100 |
 +| GPU Count | 0 | 0 | 4 |
 +| GPU RAM (GB) | 0 | 0 | 32x4 |
 +| RAM (GB) | 256 | 512 | 192 |
 +| Local Scratch (TB) | 2.6 SSD | 1.6 NVMe | 1.6 NVMe |
 +| Network | HDR-100 | HDR-100 | HDR-100|
 +| Internal Bandwidth | 100Gb/s | 100Gb/s | 100Gb/s |
 +| Latency | <600ns | <600ns | <600ns |
 +| Priority lease ($/year) | $3,368.30 | $4,379.80 | $7,346.06 |
 +| Current Quantity | 160 | 25 | 6 |
 +
 +
 +==== Researcher Funded Nodes ====
 +Researcher funded hardware will gain priority access for a minimum of 5 years. Hosting fees will start at $1,200 per year and will be hardware dependent. The fees will be broke down as follows:
 +
 +| Fee | Cost | Annual Unit of Measure |
 +| Networking Fee | $90 | Per Network Connection |
 +| Rack Space | $260 | Per Rack U |
 +| RSS Maintenance | $850 | Per Node |
 ===== Quick Start ===== ===== Quick Start =====
  
Line 265: Line 348:
 Now you should see a a.out executable in your current working directory, this is your mpi compiled code that we will run when we submit it as a job. Now you should see a a.out executable in your current working directory, this is your mpi compiled code that we will run when we submit it as a job.
  
 +==== Parallelizing your Code ====
 +
 +The following link provides basic tutorials and examples for parallel code in Python, R, Julia, Matlab, and C/C++.
 +
 +[[https://berkeley-scf.github.io/tutorial-parallelization/]]
  
 ==== Submitting an MPI job ==== ==== Submitting an MPI job ====
Line 356: Line 444:
 <code> sbatch array_test.sub </code> <code> sbatch array_test.sub </code>
  
 +=====Priority Access=====
 +Information coming on priority access leases.
  
 ===== Applications ===== ===== Applications =====