This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
pub:hpc:foundry [2023/06/28 16:05] – created ark3m6 | pub:hpc:foundry [2024/05/07 14:28] (current) – [EOL PLAN!!] blspcy | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | =====Helpdesker' | + | ====== The Foundry |
- | ====User Access Requests==== | + | ====EOL PLAN!!==== |
- | Students | + | **THE FOUNDRY WILL BE DECOMISSIONED IN JUNE 2024** |
+ | |||
+ | The Foundry will no longer have compute resources as of June 1st 2024. | ||
+ | |||
+ | The login nodes will be shut down on June 3rd. | ||
+ | |||
+ | Scratch storage will be shut down on June 4th. | ||
+ | |||
+ | The Globus node will shut down on June 30th 2024. | ||
+ | |||
+ | You will be able to transfer data with Globus through June 30th 2024 from your home directory. | ||
+ | ===== System Information ===== | ||
+ | As of 22 Jan 2024 we will not be creating new Foundry accounts. Please look into requesting | ||
+ | ==== Software ==== | ||
+ | The Foundry was built and managed with Puppet. The underlying OS for the Foundry is Ubuntu 18.04 LTS. With the Foundry we made the conversion from Centos | ||
+ | |||
+ | ==== Hardware ==== | ||
+ | ===Management nodes=== | ||
+ | |||
+ | The head nodes are virtual servers, the login nodes match that of one of the compute node types. | ||
+ | |||
+ | |||
+ | ===Compute nodes=== | ||
+ | |||
+ | The newly added compute nodes are Dell C6525 nodes configured as follows. | ||
+ | |||
+ | Dell C6525: 4 node chassis with each node containing dual 32 core AMD EPYC Rome 7452 CPUs with 256 GB DDR4 ram and 6 480GB SSD drives in raid 0. | ||
+ | |||
+ | As of 06/17/21 we currently have over 11,000 cores of compute capacity on the Foundry. | ||
+ | |||
+ | ===GPU nodes=== | ||
+ | |||
+ | The newly added GPU nodes are Dell C4140s configured as follows. | ||
+ | |||
+ | Dell C4140: 1 node chassis with 4 Nvidia V100 GPUs connected via NV-link and interconnect with other nodes via HDR-100 infiniband. Each has dual 20 core intel processors and 192GB of DDR4 ram. | ||
+ | |||
+ | As of 06/17/21 we currently have 24 V100 GPUs available for use. | ||
+ | |||
+ | ===Storage=== | ||
+ | |||
+ | ==General Policy Notes== | ||
+ | |||
+ | None of the cluster attached storage available | ||
+ | |||
+ | ==Home Directories== | ||
+ | |||
+ | The Foundry home directory storage is available from an NFS share backed by our enterprise SAN, meaning your home directory is the same across the entire cluster. | ||
+ | |||
+ | ==Scratch Directories== | ||
+ | |||
+ | Each user will get a scratch directory created for them at / | ||
+ | |||
+ | Along with the networked scratch space, there is always local scratch on each compute node for use during calculations in /tmp. There is no quota placed on this space, | ||
+ | |||
+ | ==Leased Space== | ||
+ | |||
+ | If home directory, and scratch space availability aren't enough for your storage needs we also lease out quantities of cluster attached space. If you are interested in leasing storage please contact us. If you already are leasing storage, but need a reference guide on how to manage the storage please go [[ ~:storage | here]]. | ||
+ | |||
+ | |||
+ | |||
+ | ==== Policies ==== | ||
+ | ** | ||
+ | __Under no circumstances should your code be running on the login node.__** | ||
+ | |||
+ | You are allowed to install software in your home directory for your own use. Know that you will *NOT* be given root/sudo access, so if your software requires it you will not be able to use that software without our help. Contact ITRSS about having | ||
+ | |||
+ | User data on the Foundry is **not backed up** meaning it is your responsibility to back up important research data to a location off site via any of the methods in the [[# | ||
+ | |||
+ | If you are a student your jobs can run on any compute node in the cluster, even the ones dedicated to researchers, | ||
+ | |||
+ | If you are a researcher who as purchased a priority lease, you will need to submit your job to your priority partition, otherwise your job will fall into the same partition which contains all nodes. Jobs submitted to your priority partition will requeue any job running on the node you need in a lower priority partition. This means that even your own jobs, if running in the requeue partition, are subject to being requeued by your higher priority job. This also means that other users with access to your priority partition may submit jobs that will compete with yours for resources, but not bump yours into requeued status. If you submit your job to your priority partition it will run to completion, failure, or until it runs through the entire execution time you've given it. | ||
+ | |||
+ | If you are a researcher who has purchased an allocation of CPU hours you will run on all nodes at the same priority as the students. Your job will not run on any dedicated nodes and will be susceptible to preemption by any other user unless you submit it to the non-dedicated pool of nodes. | ||
+ | |||
+ | In all publications or products resulting from work performed using the Foundry the NSF Grant which provided funding | ||
\\ | \\ | ||
+ | ==== Partitions ==== | ||
+ | The Hardware in the Foundry is split up into separate groups, or partitions. Some hardware is in more than one partition, if you do not define which partition to use, it will fall into the default partition requeue. However there are a few cases that you will want to assign a job to a specific partition. Please see the table below for a list of the limits or default values given to jobs based on the partition. The important thing to note is how long you can request your job to run. | ||
+ | |||
+ | | Partition | Time Limit | Default Memory per CPU | | ||
+ | | requeue | 7 days | 800MB| | ||
+ | | general | 14 days | 800MB| | ||
+ | | any priority partition | 30 days | varies by hardware| | ||
+ | |||
+ | |||
+ | ===== Quick Start ===== | ||
+ | |||
+ | We have created a quick start video, it can be found at. [[https:// | ||
+ | |||
+ | We also have provided written instruction below which you may use for quick reference if needed. | ||
+ | |||
+ | ==== Logging in ==== | ||
+ | === SSH (Linux)=== | ||
+ | |||
+ | Open a terminal and type < | ||
+ | Enter your sso password | ||
+ | |||
+ | Logging in places you onto the login node. __Under no circumstances should you run your code on the login node.__ | ||
+ | |||
+ | If you are submitting a batch file, then your job will be redirected to a compute node to be computed. | ||
+ | |||
+ | However, if you are attempting use a GUI, ensure that you __do not run your session on the login node__ (Example: username@login-44-0). Use an interactive session to be directed to a compute node to run your software. | ||
+ | |||
+ | < | ||
+ | |||
+ | For further description of sinteractive, | ||
+ | |||
+ | === Putty (Windows)=== | ||
+ | |||
+ | Open Putty and connect to foundry.mst.edu using your campus SSO. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | === Off Campus Logins === | ||
+ | |||
+ | Our off campus logins use public key authentication only, password authentication is disabled for off campus users unless they are connected to the campus VPN. To learn how to connect from off campus please see our how to on [[ ~: | ||
+ | |||
+ | ==== Submitting a job ==== | ||
+ | |||
+ | Using SLURM, you need to create a submission script to execute on the backend nodes, then use a command line utility to submit the script to the resource manager. See the file contents of a general submission script complete with comments. | ||
+ | == Example Job Script == | ||
+ | <file bash batch.sub> | ||
+ | #!/bin/bash | ||
+ | #SBATCH --job-name=Change_ME | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --time=0-00: | ||
+ | #SBATCH --mail-type=begin, | ||
+ | #SBATCH --export=all | ||
+ | #SBATCH --out=Foundry-%j.out | ||
+ | |||
+ | # %j will substitute to the job's id | ||
+ | #now run your executables just like you would in a shell script, Slurm will set the working directory as the directory the job was submitted from. | ||
+ | #e.g. if you submitted from / | ||
+ | |||
+ | # | ||
+ | echo "this is a general submission script" | ||
+ | echo " | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | Now you need to submit that batch file to the scheduler so that it will run when it is time. | ||
+ | |||
+ | < | ||
+ | |||
+ | You will see the output of sbatch after the job submission that will give you the job number, if you would like to monitor the status of your jobs you may do so with the [[forge# | ||
+ | |||
+ | == Common SBATCH Directives == | ||
+ | | **Directive** | **Valid Values** | **Description**| | ||
+ | | --job-name=| string value no spaces | Sets the job name to something more friendly, useful when examining the queue.| | ||
+ | | --ntasks=| integer value | Sets the requested CPUS for the job| | ||
+ | | --nodes=| integer value | Sets the number of nodes you wish to use, useful if you want all your tasks to land on one node.| | ||
+ | | --time=| D-HH:MM:SS, HH:MM:SS | Sets the allowed run time for the job, accepted formats are listed in the valid values column.| | ||
+ | | --mail-type=|begin, | ||
+ | | --mail-user=|email address | Sets the mailto address for this job| | ||
+ | | --export=| ALL,or specific variable names| By default Slurm exports the current environment variables so all loaded modules will be passed to the environment of the job| | ||
+ | | --mem=| integer value | number in MB of memory you would like the job to have access to, each queue has default memory per CPU values set so unless your executable runs out of memory you will likely not need to use this directive.| | ||
+ | | --mem-per-cpu=| integer | Number in MB of memory you want per cpu, default values vary by queue but are typically greater than 1000Mb.| | ||
+ | | --nice= | integer | Allows you to lower a jobs priority if you would like other jobs set to a higher priority in the queue, the higher the nice number the lower the priority.| | ||
+ | | --constraint= | please see sbatch man page for usage | Used only if you want to constrain your job to only run on resources with specific features, please see the next table for a list of valid features to request constraints on.| | ||
+ | | --gres= | name:count | Allows the user to reserve additional resources on the node, specifically for our cluster gpus. e.g. --gres=gpu: | ||
+ | | -p | partition_name | Not typically used, if not defined jobs get routed to the highest priority partition your user has permission to use. If you were wanting to specifically use a lower priority partition because of higher resource availability you may do so.| | ||
+ | |||
+ | == Valid Constraints == | ||
+ | | **Feature**| **Description**| | ||
+ | | intel | Node has intel CPUs | | ||
+ | | amd | Node has amd CPUs | | ||
+ | | EDR | Node has an EDR (100Gbit/ | ||
+ | | FDR | Node has a FDR (56Gbit/ | ||
+ | | QDR | Node has a QDR (36Gbit/ | ||
+ | | DDR | Node has a DDR (16Gbit/ | ||
+ | | serial | Node has no high speed interconnect | | ||
+ | | gpu | Node has GPU acceleration capabilities | | ||
+ | | cpucodename* | Node is running the codename of cpu you desire e.g. rome | | ||
+ | |||
+ | Note if some combination of your constraints and requested resources is unfillable you will get a submission error when you attempt to submit your job. | ||
+ | |||
+ | ==== Monitoring your jobs ==== | ||
+ | |||
+ | < | ||
+ | JOBID | ||
+ | 719 | ||
+ | </ | ||
+ | |||
+ | ====Cancel your job=== | ||
+ | scancel - Command to cancel a job, user must own the job being cancelled or must be root. | ||
+ | < | ||
+ | ==== Viewing your results ==== | ||
+ | |||
+ | Output from your submission will go into an output file in the submission directory, this will either be slurm-jobnumber.out or whatever you defined in your submission script. In our example script we set this to Foundry-jobnumber.out, | ||
+ | ==== Moving Data ==== | ||
+ | |||
+ | Moving data in and out of the Foundry can be done with a few different tools depending on your operating system and preference. | ||
+ | |||
+ | ===Globus=== | ||
+ | |||
+ | The Foundry has a globus endpoint configured, which will allow you access to move data in and out if you sign in to [[https:// | ||
+ | |||
+ | From sign in you will need to find the end point you are going to move data to/from. If you are moving data from global endpoint to global endpoint, e.g. Forge to Foundry, you will need to find both of them using the search tool. They are named appropriately for you to find them easily. | ||
+ | |||
+ | Once you have connected your account with these endpoints/ | ||
+ | |||
+ | You can install Globus software to create a personal endpoint on any number of your personal devices to move data back and forth from them to The Foundry as well. | ||
+ | |||
+ | Predrag has made a short video on using globus if you'd like to get a better idea of how this all looks. [[https:// | ||
+ | |||
+ | |||
+ | ===DFS volumes=== | ||
+ | |||
+ | Missouri S&T users can mount their web volumes and S Drives with the < | ||
+ | |||
+ | You can un-mount your user directories with the < | ||
+ | |||
+ | === Windows === | ||
+ | |||
+ | ==WinSCP== | ||
+ | |||
+ | Using winSCP connect to foundry.mst.edu using your SSO just as you would with ssh or putty and you will be presented with the contents of your home directory. Now you will be able to drag files into the winscp window and drop them in the folder you want them in and the copying process should begin. It should also work the same in the opposite direction to get data back out. | ||
+ | |||
+ | ==Filezilla== | ||
+ | |||
+ | Using Filezilla you connect to foundry.mst.edu using your SSO and you will have the contents of your home directory displayed, drag and drop works with Filezilla as well. | ||
+ | |||
+ | ==Git== | ||
+ | |||
+ | git is installed on the cluster and is recommended to keep track of code changes across your research. See [[https:// | ||
+ | |||
+ | |||
+ | === Linux === | ||
+ | |||
+ | == Filezilla == | ||
+ | |||
+ | See windows instructions | ||
+ | |||
+ | == scp == | ||
+ | |||
+ | scp is a command line utility that allows for secure copies from one machine to another through ssh, scp is available on most Linux distributions. If I wanted to copy a file in using scp I would open a terminal on my workstation and issue the following command. | ||
+ | |||
+ | < | ||
+ | |||
+ | It will then ask me to authenticate using my campus SSO, then copy the file from my local local location of / | ||
+ | |||
+ | == rsync == | ||
+ | |||
+ | rsync is a more powerful command line utility than scp, it has a simpler syntax, and checks to see if the file has actually changed before performing the copy. See the man page for usage details or [[http:// | ||
+ | |||
+ | == git == | ||
+ | |||
+ | See git for windows for instruction, | ||
+ | |||
+ | |||
+ | ==== Modules ==== | ||
+ | |||
+ | An important concept for running on the cluster is modules. Unlike a traditional computer where you can run every program from the command line after installing it, with the cluster we install the programs to a main " | ||
+ | |||
+ | Here is the output of module avail as of 03/18/2020 | ||
+ | < | ||
+ | blspcy@login-14-42:/ | ||
+ | |||
+ | ------------------------------ / | ||
+ | | ||
+ | |||
+ | ------------------------------- / | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | Use " | ||
+ | Use " | ||
+ | " | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Compiling Code ==== | ||
+ | |||
+ | There are several compilers available through modules, to see a full list of modules run < | ||
+ | |||
+ | MPI_PROTOCOL/ | ||
+ | e.g openmpi/ | ||
+ | The exception to this rule of naming is the intelmpi, which is just intelmpi/ | ||
+ | |||
+ | After you have decided which compiler you want to use you need to load it. | ||
+ | |||
+ | < | ||
+ | |||
+ | |||
+ | Then compile your code, use mpicc for c code and mpif90 for fortran code. Here is an MPI hello world C code. | ||
+ | |||
+ | <file c helloworld.c> | ||
+ | /* C Example */ | ||
+ | #include < | ||
+ | #include < | ||
+ | |||
+ | |||
+ | int main (argc, argv) | ||
+ | int argc; | ||
+ | char *argv[]; | ||
+ | { | ||
+ | int rank, size; | ||
+ | |||
+ | MPI_Init (&argc, & | ||
+ | MPI_Comm_rank (MPI_COMM_WORLD, | ||
+ | MPI_Comm_size (MPI_COMM_WORLD, | ||
+ | printf( "Hello world from process %d of %d\n", rank, size ); | ||
+ | MPI_Finalize(); | ||
+ | return 0; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Use mpicc to compile it. | ||
+ | |||
+ | < | ||
+ | |||
+ | Now you should see a a.out executable in your current working directory, this is your mpi compiled code that we will run when we submit it as a job. | ||
+ | |||
+ | |||
+ | **IMPORTANT NOTE!!** | ||
+ | |||
+ | The **openmpi** based mpi libraries will throw errors when you use mpirun on your compiled code about not being able to initialize the fabric. These errors are **false errors**, your job is running, and running fine. The error is a bug introduced by having the newest infiniband cards, which haven' | ||
+ | ==== Submitting an MPI job ==== | ||
+ | |||
+ | You need to be sure that you have the same module loaded in your job environment as you did when you compiled the code to ensure that the compiled executables will run correctly, you may either load them before submitting a job and use the directive < | ||
+ | |||
+ | <file bash helloworld.sub> | ||
+ | #!/bin/bash | ||
+ | #SBATCH -J MPI_HELLO | ||
+ | #SBATCH --ntasks=8 | ||
+ | #SBATCH --export=all | ||
+ | #SBATCH --out=Foundry-%j.out | ||
+ | #SBATCH --time=0-00: | ||
+ | #SBATCH --mail-type=begin, | ||
+ | |||
+ | module load openmpi/ | ||
+ | mpirun ./a.out | ||
+ | </ | ||
+ | |||
+ | Now we need to submit that file to the scheduler to be put into the queue. | ||
+ | |||
+ | < | ||
+ | |||
+ | You should see the scheduler report back what job number your job was assigned just as before, and you should shortly see an output file in the directory you submitted your job from. | ||
+ | |||
+ | |||
+ | ==== Interactive jobs ==== | ||
+ | |||
+ | Some things can't be run with a batch script because they require user input, or you need to compile some large code and are worried about bogging down the login node. To start an interactive job simply use the < | ||
+ | |||
+ | If you will need a GUI Window for whatever you are running inside the interactive job you will need to connect to The Foundry with X forwarding enabled. For Linux this is simply adding the -X switch to the ssh command. < | ||
+ | |||
+ | ==== Job Arrays ==== | ||
+ | |||
+ | If you have a large number of jobs you need to start I recommend becoming familiar with using job arrays, basically it allows you to submit one job file to start up to 10000 jobs at once. | ||
+ | |||
+ | One of the ways you can vary the input of the job array from task to task is to set a variable based on which array id the job is and then use that value to read the matching line of a file. For instance the following line when put into a script will set the variable PARAMETERS to the matching line of the file data.dat in the submission directory. | ||
+ | |||
+ | < | ||
+ | PARAMETERS=$(awk -v line=${SLURM_ARRAY_TASK_ID} '{if (NR == line) { print $0; };}' ./data.dat) | ||
+ | </ | ||
+ | |||
+ | You can then use this variable in your execution line to do whatever you would like to do, you just have to have the appropriate data in the data.dat file on the appropriate lines for the array you are submitting. See the sample data.dat file below. | ||
+ | |||
+ | <file txt data.dat> | ||
+ | "I am line number 1" | ||
+ | "I am line number 2" | ||
+ | "I am line number 3" | ||
+ | "I am line number 4" | ||
+ | </ | ||
+ | |||
+ | you can then submit your job as an array by using the --array directive, either in the job file or as an argument at submission time, see the example below. | ||
+ | |||
+ | <file bash array_test.sub> | ||
+ | #!/bin/bash | ||
+ | #SBATCH -J Array_test | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --out=Foundry-%j.out | ||
+ | #SBATCH --time=0-00: | ||
+ | #SBATCH --mail-type=begin, | ||
+ | |||
+ | |||
+ | PARAMETERS=$(awk -v line=${SLURM_ARRAY_TASK_ID} '{if (NR == line) { print $0; };}' ./data.dat) | ||
+ | |||
+ | echo $PARAMETERS | ||
+ | |||
+ | </ | ||
+ | |||
+ | I prefer to use the array as an argument at submission time so I don't have to touch my submission file again, just the data.dat file that it reads from. | ||
+ | |||
+ | < | ||
+ | |||
+ | Will execute lines 1,2, and 4 of data.dat which echo out what line number they are from my data.dat file. | ||
+ | |||
+ | You may also add this as a directive in your submission file and submit without any switches as normal. Adding the following line to the header of the submission file above will accomplish the same thing as supplying the array values at submission time. | ||
+ | |||
+ | <file array_test.sub># | ||
+ | |||
+ | Then you may submit it as normal | ||
+ | |||
+ | < | ||
+ | |||
+ | |||
+ | ==== Checking your account usage ==== | ||
+ | |||
+ | If you have purchased a number of CPU hours from us you may check on how many hours you have used by issuing the < | ||
+ | |||
+ | **Note this is usage for your account, not your user.** | ||
+ | |||
+ | ===== Applications ===== | ||
+ | ** The applications portion of this wiki is currently a Work in progress, not all applications are currently here, nor will they ever be as the applications we support continually grows. ** | ||
+ | |||
+ | ==== Abaqus ==== | ||
+ | |||
+ | * Default Version = 2022 | ||
+ | * Other versions available: 2020 | ||
+ | |||
+ | |||
+ | === Using Abaqus === | ||
+ | |||
+ | Abaqus should not be operated on the login node at all. | ||
\\ | \\ | ||
- | Faculty members | + | Be sure you are connected to the Foundry with X forwarding enabled, and running inside an interactive job using command |
+ | sinteractive | ||
+ | Before you attempt to run Abaqus. Running sinteractive without any switches will give you 1 cpu for 10 minutes, if you need more time or resources you may request it. See [[pub: | ||
+ | \\ | ||
+ | Once inside an interactive job you need to load the Abaqus module. | ||
+ | module load abaqus | ||
+ | Now you may run abaqus. | ||
+ | ABQLauncher cae -mesa | ||
+ | |||
+ | |||
+ | |||
+ | ====Anaconda==== | ||
+ | If you would like to install python modules via conda, you may load the anaconda module to get access to conda for this purpose. After loading the module you will need to initialize conda to work with your shell. | ||
+ | < | ||
+ | module load anaconda | ||
+ | conda init | ||
+ | </ | ||
+ | This will ask you what shell you are using, and after it is done it will ask you to log out and back in again to load the conda environment. After you log back in your command prompt will look different than it did before. It should now have (base) on the far left of your prompt. This is the virtual environment you are currently in. Since you do not have permissions to modify base, you will need to create and activate your own virtual environment to build your software inside of. | ||
+ | < | ||
+ | conda create --name myenv | ||
+ | conda activate myenv | ||
+ | </ | ||
+ | Now instead of (base) it should say (myenv) or whatever you have named your environment in the create step. These environments are stored in your home directory so they are unique to you. If you are working together with a group, everyone in your group will either need a copy of the environment you've built in $HOME/ | ||
+ | \\ | ||
+ | Once you are inside your virtual environment you can run whatever conda installs you would like and it will install them and dependencies inside this environment. If you would like to execute code that depends on the modules you install you will need to be sure that you are inside your virtual environment. (myenv) should be shown on your command prompt, if it is not, activate it with `conda activate`. | ||
+ | ==== Ansys ==== | ||
+ | |||
+ | * Default Version = 2019r2 | ||
+ | * Other versions available: none yet | ||
+ | |||
+ | === Running the Workbench === | ||
+ | |||
+ | Be sure you are connected to the Foundry with X forwarding enabled, and running inside an interactive job using command | ||
+ | sinteractive | ||
+ | before you attempt to launch the work bench. Running sinteractive without any switches will give you 1 cpu for 1 hour, if you need more time or resources you may request it. See [[pub: | ||
+ | \\ | ||
+ | Once inside an interactive job you need to load the ansys module. | ||
+ | module load ansys | ||
+ | Now you may run the workbench. | ||
+ | runwb2 | ||
+ | \\ | ||
+ | === Job Submission Information === | ||
+ | \\ | ||
+ | Fluent is the primary tool in the Ansys suite of software used on the Foundry.\\ | ||
+ | Most of the fluent simulation creation process is done on your Windows or Linux workstation.\\ | ||
+ | The ' | ||
+ | Fluent will output a lengthy file, based on the simulation being run and that lengthy output file would be used on your Windows or Linux Workstation to do the final review and analysis of your simulation. | ||
+ | |||
+ | === The basic steps === | ||
+ | \\ | ||
+ | 1. Create your geometry\\ | ||
+ | 2. Setup your mesh\\ | ||
+ | 3. Setup your solving method\\ | ||
+ | 4. Use the .cas and .dat files, generated from the first three steps, to construct your jobfile\\ | ||
+ | 5. Copy those files to the Foundry, to your home folder\\ | ||
+ | 6. Create your jobfile using the slurm tools on the Foundry Documentation page\\ | ||
+ | 7. Load the Ansys module\\ | ||
+ | 8. Submit your newly created jobfile with sbatch\\ | ||
+ | |||
+ | === Serial Example. === | ||
+ | |||
+ | I used the Turbulent Flow example from Cornell' | ||
+ | On the Foundry, I have this directory structure for this example. | ||
+ | |||
+ | < | ||
+ | TurbulentFlow/ | ||
+ | |-- flntgz-48243.cas | ||
+ | |-- flntgz-48243.dat | ||
+ | |-- output.dat | ||
+ | |-- slurm-8731.out | ||
+ | |-- TurbulentFlow_command.txt | ||
+ | |-- TurbulentFlow.sbatch | ||
+ | </ | ||
+ | |||
+ | The .cas file is the CASE file that contains the parameters define by you when creating the model.\\ | ||
+ | The .dat file is the data result file used when running the simulation.\\ | ||
+ | The .txt file, is the actual, command equivalent, of your model, in a form that the Foundry understands.\\ | ||
+ | The .sbatch file, is the slurm job file that you will use to submit your model for analysis.\\ | ||
+ | The .out file is the output from the run.\\ | ||
+ | The .dat file is the binary (ansys specific) file created during the solution, that could be imported into Ansys back on the Windows/ | ||
+ | |||
+ | |||
+ | === Jobfile Example. === | ||
+ | |||
+ | <file bash TurbulentFlow.sbatch> | ||
+ | # | ||
+ | #SBATCH --job-name=TurbulentFlow.sbatch | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --time=01: | ||
+ | #SBATCH -o foundry-%j.out | ||
+ | |||
+ | fluent 2ddp -g < / | ||
+ | </ | ||
+ | |||
+ | The SBATCH commands are explained in the [[pub: | ||
+ | |||
+ | The job-name is a name given to help you determine which job is which.\\ | ||
+ | This job will be in the --- **partition=requeue** queue.\\ | ||
+ | It will use 1 node --- **nodes=1**.\\ | ||
+ | It will use 4 processors in one node --- **ntasks=4**.\\ | ||
+ | It has a wall clock time of 1 hour --- **time=01: | ||
+ | It will email the user when it begins, ends, or if it fails. **--mail-type** & **--mail-user**\\ | ||
+ | **fluent** is the command we are going to run.\\ | ||
+ | **2ddp** is the mode we want fluent to use\\ | ||
+ | |||
+ | //Modes | ||
+ | The [mode] option must be supplied and is one of the following: | ||
+ | * 2d runs the two-dimensional, | ||
+ | * 3d runs the three-dimensional, | ||
+ | * 2ddp runs the two-dimensional, | ||
+ | * 3ddp runs the three-dimensional, | ||
+ | **-g** turns off the GUI\\ | ||
+ | Path to the command file we are calling in fluent. | ||
+ | |||
+ | Contents of command file\\ | ||
+ | This file can get long. As it contains the .cas file & .dat file information as well as saving frequency and iteration count \\ | ||
+ | **NOTE**, this is all in one line when creating the command file\\ | ||
+ | |||
+ | < | ||
+ | /file/rcd / | ||
+ | </ | ||
+ | |||
+ | When the simulation is finished, you will have a foundry-##### | ||
+ | < | ||
+ | / | ||
+ | / | ||
+ | Loading "/ | ||
+ | Done. | ||
+ | / | ||
+ | Starting / | ||
+ | |||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | This product is subject to U.S. laws governing export and re-export. | ||
+ | For full Legal Notice, see documentation. | ||
+ | |||
+ | Build Time: Apr 29 2014 13:56:31 EDT Build Id: 10581 | ||
+ | |||
+ | Loading "/ | ||
+ | Done. | ||
+ | |||
+ | | ||
+ | This is an academic version of ANSYS FLUENT. Usage of this product | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | |||
+ | Cleanup script file is / | ||
+ | |||
+ | > | ||
+ | Reading "/ | ||
+ | 3000 quadrilateral cells, zone 2, binary. | ||
+ | 5870 2D interior faces, zone 1, binary. | ||
+ | 30 2D velocity-inlet faces, zone 5, binary. | ||
+ | 30 2D pressure-outlet faces, zone 6, binary. | ||
+ | 100 2D wall faces, zone 7, binary. | ||
+ | 100 2D axis faces, zone 8, binary. | ||
+ | 3131 nodes, binary. | ||
+ | 3131 node flags, binary. | ||
+ | |||
+ | Building... | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | mixture | ||
+ | | ||
+ | pipewall | ||
+ | outlet | ||
+ | inlet | ||
+ | interior-surface_body | ||
+ | centerline | ||
+ | surface_body | ||
+ | Done. | ||
+ | |||
+ | Reading "/ | ||
+ | Done. | ||
+ | iter continuity | ||
+ | ! 389 solution is converged | ||
+ | | ||
+ | ! 390 solution is converged | ||
+ | | ||
+ | |||
+ | Writing "/ | ||
+ | Done. | ||
+ | </ | ||
+ | |||
+ | |||
+ | ===Parallel Example=== | ||
+ | |||
+ | |||
+ | To use fluent in parallel please you need set the PBS_NODEFILE envrionment variable inside your job. Please see example submission file below. | ||
+ | |||
+ | <file bash TurbulentFlow.sbatch> | ||
+ | # | ||
+ | |||
+ | #SBATCH --job-name=TurbulentFlow.sbatch | ||
+ | #SBATCH --ntasks=32 | ||
+ | #SBATCH --time=01: | ||
+ | #SBATCH -o foundry-%j.out | ||
+ | |||
+ | #generate a node file | ||
+ | export PBS_NODEFILE=`generate_pbs_nodefile` | ||
+ | #run fluent in parallel. | ||
+ | fluent 2ddp -g -t32 -pinfiniband -cnf=$PBS_NODEFILE -ssh < / | ||
+ | </ | ||
+ | |||
+ | ===Interactive Fluent=== | ||
+ | |||
+ | If you would like to run the full GUI you may do so inside an interactive job, make sure you've connected to The Foundry with X Forwarding enabled. Start the job with. < | ||
+ | |||
+ | Once inside the interactive job you will need to load the ansys module. < | ||
+ | |||
+ | ==== Comsol ==== | ||
+ | |||
+ | Comsol Multiphysics is available for general | ||
+ | |||
+ | <file bash comsol.sub> | ||
+ | # | ||
+ | #SBATCH -J Comsol_job | ||
+ | #SBATCH --ntasks-per-node=1 | ||
+ | #SBATCH --cpus-per-task=64 | ||
+ | #SBATCH --mem=0 | ||
+ | #SBATCH --time=1-00: | ||
+ | #SBATCH --export=ALL | ||
+ | |||
+ | module load comsol/ | ||
+ | ulimit -s unlimited | ||
+ | ulimit -c unlimited | ||
+ | |||
+ | comsol batch -mpibootstrap slurm -inputfile input.mph -outputfile out.mph | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Cuda ==== | ||
+ | |||
+ | |||
+ | Our login nodes don't have the CUDA toolkit installed so to compile your code you will need to start an interactive job on these nodes to do your compilation. < | ||
+ | |||
+ | To submit a job for batch processing please see this example submission file below. | ||
+ | <file bash cuda.sub> | ||
+ | # | ||
+ | #SBATCH -J Cuda_Job | ||
+ | #SBATCH -p cuda | ||
+ | #SBATCH -o Forge-%j.out | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --gres=gpu: | ||
+ | #SBATCH --time=01: | ||
+ | |||
+ | ./a.out | ||
+ | </ | ||
+ | |||
+ | This file requests 1 cpu and 1 gpu on 1 node for 1 hour, to request more cpus or more gpus you will need to modify the values related to ntasks and gres=gpu. It is recommended that you at least have 1 cpu for each gpu you intend to use, we currently only have 2 gpus available per node. Once we incorporate the remainder of the GPU nodes we will have 7 gpus available in one chassis. | ||
+ | |||
+ | ====Gaussian==== | ||
+ | |||
+ | Gaussian has 2 different versions on the Foundry, the sample submission file below uses the g09 executable however if you load the version 16 module you will need to use g16. | ||
+ | <file bash gaussian.sub> | ||
+ | # | ||
+ | #SBATCH --job-name=gaussian | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --time=10: | ||
+ | #SBATCH --mem-per-cpu=1000 | ||
+ | module load gaussian/ | ||
+ | g09 < Fe_CO5.inp | ||
+ | </ | ||
+ | |||
+ | You will need to replace the file name of the input file in the sample provided with your own. | ||
+ | |||
+ | |||
+ | ==== Matlab ==== | ||
+ | |||
+ | **IMPORTANT NOTE** | ||
+ | Currently campus has 100 Matlab seat licenses to be shared between the Foundry and research desktops. | ||
+ | |||
+ | Matlab is available to run in batch form or interactively on the cluster. | ||
+ | * Default version = 2021a | ||
+ | * Other installed version(s): 2019b, 2020a, 2020b (run " | ||
+ | |||
+ | === Interactive Matlab === | ||
+ | |||
+ | To get started with Matlab, run the following sequence of commands from the login node. This will start an interactive job on a backend node, load the default module for Matlab, and then launch Matlab. If you have connected with X forwarding, you will get the full Matlab GUI to use however you woud like. By default, this limits you to 1 core for 4 hours maximum on one of our compute nodes. To use more than 1 core, or to run for longer than 4 hours, you will need to either add additional parameters to the " | ||
+ | |||
+ | < | ||
+ | module load matlab | ||
+ | matlab | ||
+ | </ | ||
+ | |||
+ | Please note that by default Matlab does not parallelize your code so unless you use parallelized calls in your code. If you have parallelized your code you will need to first open a parallel pool to run your code in. | ||
+ | |||
+ | === Batch Submit === | ||
+ | |||
+ | If you want to use Batch Submissions for Matlab you will need to create a submission script similar to the ones above in quick start, but you will want to limit the nodes your job runs on 1, please | ||
+ | |||
+ | <file bash matlab.sub> | ||
+ | # | ||
+ | |||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --ntasks=12 | ||
+ | #SBATCH -J Matlab_job | ||
+ | #SBATCH -o Foundry-%j.out | ||
+ | #SBATCH --time=01: | ||
+ | #sbatch --mem-per-cpu=4000 | ||
+ | |||
+ | module load matlab | ||
+ | matlab < helloworld.m | ||
+ | |||
+ | </ | ||
+ | |||
+ | This submission asks for 12 processors on 1 node for an hour, the maximum per node we currently have is 64. Without using the distributive computing engine, which is outside the scope of this tutorial, you will only be able to use 64 processors in a ' | ||
+ | |||
+ | To make use of this new found power you must implement opening a parallel pool < | ||
+ | |||
+ | ==== Python ==== | ||
+ | |||
+ | Python versions 2.7.17 and 3.6.9 are available, and users may install python modules for themselves via the pip modules available in python. Please note that the pip and pip3 commands are links to old wrapper scripts which come packaged in the OS and may not be able to install the newest version of whatever module you are trying to install. Because of this I will include instructions for how to use the pip utilities and also how to upgrade them for your user and use them with the new syntax. | ||
+ | |||
+ | For the old standard pip and pip3 utilities you would simply call them from the command line to install, uninstall, search for, or list installed modules. In the following examples you may swap pip with pip3 and it will perform operations on python3 instead of python2. | ||
+ | |||
+ | pip list #This lists all available modules and their versions. | ||
+ | pip install --user numpy #This will install the newest available version of the numpy module for your user. | ||
+ | pip uninstall --user numpy #This will uninstall the numpy module for your user. | ||
+ | pip install --user --upgrade numpy==1.18.5 #This will uninstall the old version and install the specified version of numpy. | ||
+ | pip search numpy #This will perform a search for all python modules that you could install that match the search term numpy. | ||
+ | |||
+ | Again, using pip3 the syntax is all the same but it will install modules for python3 instead. | ||
+ | |||
+ | Now to get the newest version of pip or pip3 you will need to run pip the new way and have it perform the upgrade on itself. Just like pip and pip3 were interchangeable in the examples above, in the following examples python and python3 will be interchangeable as well. python will perform operations on version 2, and python3 will perform operations on version 3. | ||
+ | |||
+ | python -m pip install --upgrade --user pip #This will upgrade pip to the newest version for your user. | ||
+ | python -m pip install --user numpy #This will install the newest available version of the numpy module for your user. | ||
+ | |||
+ | All of the syntax for pip is the same after calling python -m pip as it was for the pip and pip3 wrapper scripts. Also if you upgrade pip, you must use the new method of pip installing modules unless you uninstall your user's pip module. | ||
+ | |||
+ | If your user's python environment gets broken by pip installing anything for your user, you may start over by removing or moving the `$HOME/ | ||
+ | |||
+ | |||
+ | ====Singularity==== | ||
+ | |||
+ | With the Foundry we've introduced the ability to build your own software in a singularity container, or use publicly available containers inside a Foundry job. Please keep in mind that you will need to still abide by the rules of running either through interactive jobs, or through batch submissions. Singularity does not automatically create a job environment for you, it, like most other executables runs where it is called from. | ||
+ | |||
+ | No module is needed to call singularity, | ||
+ | |||
+ | I highly suggest reading | ||
+ | |||
+ | An important thing to understand is that you may create these containers anywhere and then move them into the Foundry for execution which would give you the ability to configure the container as you see fit on your own computer where you have administrative privileges and execute your code as a user on the Foundry. | ||
+ | |||
+ | Running interactive in a singularity container inside an interactive session would look like the following set of commands. | ||
+ | sinteractive | ||
+ | singularity shell library:// | ||
+ | |||
+ | The first puts you interactively on a compute node. The second loads a singularity shell on the remote image from the singularity library. The singularity command has a large amount of help built into the command. I suggest starting with. | ||
+ | singularity --help | ||
+ | Which will give you a list of commands available and a description of what each could do for you. For example, if you know what command you need to run inside the container you don't need to drop into the shell of the container, you can simply run the command. | ||
+ | singularity exec library:// | ||
+ | Which will run the command pwd inside the container. | ||
+ | |||
+ | Another thing to note is the flexibility of singularity. It can run containers from it's own library, docker, dockerhub, singularityhub, | ||
+ | |||
+ | |||
+ | ====StarCCM+==== | ||
+ | |||
+ | Engineering Simulation Software\\ | ||
+ | |||
+ | Default version | ||
+ | |||
+ | Other working versions: | ||
+ | * 2020.1 | ||
+ | * 12.02.010 | ||
+ | |||
+ | |||
+ | |||
+ | Job Submission Information | ||
+ | |||
+ | Copy your .sim file from the workstation to your cluster home profile.\\ | ||
+ | Once copied, create your job file. | ||
+ | |||
+ | Example job file: | ||
+ | |||
+ | <file bash starccm.sub> | ||
+ | |||
+ | # | ||
+ | #SBATCH --job-name=starccm_test | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --ntasks=12 | ||
+ | #SBATCH --mem=40000 | ||
+ | #SBATCH --partition=requeue | ||
+ | #SBATCH --time=12: | ||
+ | #SBATCH --mail-type=BEGIN | ||
+ | #SBATCH --mail-type=FAIL | ||
+ | #SBATCH --mail-type=END | ||
+ | #SBATCH --mail-user=username@mst.edu | ||
+ | |||
+ | module load starccm/2021.2 | ||
+ | |||
+ | time starccm+ -batch -np 12 / | ||
+ | </ | ||
+ | |||
+ | ** It's prefered that you keep the ntasks and -np set to the same processor count.**\\ | ||
+ | |||
+ | Breakdown of the script:\\ | ||
+ | This job will use **1** node, asking for **12** processors, **40,000 MB** of memory for a total wall time of **12 hours** and will email you when the job starts, finishes or fails. | ||
+ | |||
+ | The StarCCM commands: | ||
+ | |||
+ | |||
+ | |-batch| tells Star to utilize more than one processor| | ||
+ | |-np| number of processors to allocate| | ||
+ | |/ | ||
+ | |||
+ | |||
+ | ====TensorFlow with GPU support==== | ||
+ | |||
+ | https:// | ||
+ | |||
+ | We have been able to get TensorFlow to work with GPU support if we install it within an anacoda environment. Other methods do not seem to work as smoothly (if they even work at all). | ||
+ | |||
+ | First use [[#Anaconda|Anaconda]] to create and activate a new environment (e.g. tensorflow-gpu). Then use anaconda to install TensorFlow with GPU support: | ||
+ | |||
+ | conda install tensorflow-gpu | ||
+ | |||
+ | At this point you should be able to activate that anaconda environment and run TensorFlow with GPU support. | ||
+ | |||
+ | Job Submission Information | ||
+ | |||
+ | Copy your python script to the cluster. Once copied, create your job file. | ||
+ | |||
+ | Example job file: | ||
+ | |||
+ | <file bash tensorflow-gpu.sub> | ||
+ | # | ||
+ | #SBATCH --job-name=tensorflow_gpu_test | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --partition=cuda | ||
+ | #SBATCH --time=01: | ||
+ | #SBATCH --gres=gpu: | ||
+ | #SBATCH --mail-type=BEGIN | ||
+ | #SBATCH --mail-type=FAIL | ||
+ | #SBATCH --mail-type=END | ||
+ | #SBATCH --mail-user=username@mst.edu | ||
+ | |||
+ | module load anaconda/ | ||
+ | conda activate tensorflow | ||
+ | python tensorflow_script_name.py | ||
+ | </ | ||
+ | |||
+ | ==== Thermo-Calc ==== | ||
+ | |||
+ | * Default Version = 2021a | ||
+ | * Other versions available: none yet | ||
+ | |||
+ | === Accessing Thermo-Calc === | ||
+ | |||
+ | Thermo-Calc is a restricted software. If you need access please email nic-cluster-admins@mst.edu for more info. | ||
+ | |||
+ | === Using Thermo-Calc === | ||
+ | |||
+ | Thermo-Calc will not operate on the login node at all. | ||
+ | \\ | ||
+ | Be sure you are connected to the Foundry | ||
+ | sinteractive | ||
+ | before you attempt to run Thermo-Calc. Running sinteractive without any switches will give you 1 cpu for 10 minutes, if you need more time or resources you may request it. See [[pub: | ||
+ | \\ | ||
+ | Once inside an interactive job you need to load the Thermo-Calc module. | ||
+ | module load thermo-calc | ||
+ | Now you may run thermo-calc. | ||
+ | Thermo-Calc.sh | ||
+ | |||
+ | ====Vasp==== | ||
+ | |||
+ | To use our site installation of Vasp you must first prove that you have a license to use it by emailing your vasp license confirmation to < | ||
+ | |||
+ | Once you have been granted access to using vasp you may load the vasp module < | ||
+ | |||
+ | and create a vasp job file, in the directory that your input files are, that will look similar to the one below. | ||
- | ====User group management==== | + | <file bash vasp.sub> |
- | If a faculty member or department decides to go with a priority queue on specific hardware they get their own netgroup, which they may add whoever they wish. Please see the table below for valid netgroups and the administrative entity on the account. Please note you must get authorization from the administrative entity to add a student to a netgroup other than forge-cluster-general. | + | #!/bin/bash |
- | |Netgroup|Administrator|Admin Email|Details| | + | # |
- | |it-foundry-sgao|Stephen Gao|sgao@mst.edu|Purpose built node access| | + | # |
- | |it-foundry-dawesr|Richard Dawes|dawesr@mst.edu|Purpose built node access| | + | # |
- | |it-foundry-hosders|Serhat Hosder|hosders@mst.edu|Purpose built node access| | + | # |
- | |it-foundry-vasp-5|many users|before granting access proof of license is required|Vasp version 5 software access, license are version specific| | + | |
- | |it-foundry-medvedeva|Julia Medvedeva|juliaem@mst.edu|Purpose built node access| | + | |
- | |it-foundry-vojtat|Thomas Vojta|vojtat@mst.edu|Purpose built node access| | + | |
- | ====Software Installations==== | + | module load vasp |
- | All software requests should be sent to Research Support' | + | module load libfabric |
- | ====Connection problem troubleshooting==== | + | srun vasp |
- | - Make sure user is on campus and is trying to connect to foundry.mst.edu, | + | |
- | - If the user can't use the VPN please direct them to our instructions for public key auth in the [[pub: | + | |
- | - Make sure to reference the connection documentation at the Foundry [[pub: | + | |
- | - Make sure the user is in it-foundry-general, | + | |
- | - make sure the user isn't using the UM-AD prefix to log in. | + | |
- | ====Job Errors==== | + | </ |
- | " | + | This example will run the standard vasp compilation on 8 cpus for 1 hour. \\ |
- | This error is commonly caused because | + | |
+ | If you need the gamma only version of vasp use < | ||
- | A lot of times the user simply doesn' | + | If you need the non-colinear version |
+ | It might work to launch vasp with " | ||
+ | There are some globally available Psudopoetentials available, the module sets the environment variable $POTENDIR to the global directory. | ||