- Why should I specify an estimated time needed to complete a slurm job?
- How can I specify openMPI mca parameters using srun?
- How can I select (force) openMPI to use the infiniband HCAs on the compute node?
- How can I exclude network interfaces and suppress warnings resulting from an exclude clause if openmpi is invoked via srun?
In case job has been submitted without the option '--time=<timeString>' (see for example the srun man page about the format to be used for the 'timeString') the scheduler will assume that the job is requesting the longest possible walltime interval applicable for the partition in question. The SLURM control daemon is bookkeeping all existing free timeslots of the compute resources that might appear because of resource and priority reservations. These free time slots of resources (nodes, CPUs) can be used by a job with lower priority and lower resource demands to be executed while another (usually larger) job is waiting for its resources to be become available. This means in this case the smaller job overtake the larger job while the larger job is waiting for its resources to become free.
Create a directory in $HOME/.openmpi and save all parameters in the file $HOME/.openmpi/mca-params.conf.
Add the following line:
btl_openib_allow_ib = 1
to file $HOME/.openmpi/mca-params.conf
How can I exclude network interfaces and suppress warnings resulting from an exclude clause if openmpi is invoked via srun?
Some of the latest Dell hardware are shipped with NICs with RoCE, iWARP capablilities. OpenMPI will detect this interfaces. To avoid the usage of these devices (as EDR infiniband will provide better performance concerning throughput and latency) add the lines:
btl_openib_if_exclude = qedr0
btl_openib_warn_nonexistent_if = 0
to file $HOME/.openmpi/mca-params.conf.
The upper line exclude the ethernet adapter qedr0 from the openMPI interface list. In case the parameter file is used on a node without RoCE or iWARP ethernet adapters, too, the last line will suppress a boost of warnings resulting from the excluse statement.