Advanced scripts for postprocessing with OverlayFS
Overview
Teaching: 30 min
Exercises: 30 minQuestions
How can I postprocess a large number of results?
Objectives
Use provided scripts to postprocess a large number of results without increasing dramatically the existing number of files in the system
0. Introduction
Postprocessing results in batches
- The workflow presented here is extremelly similar to that in the previous episode (please study previous one first)
- Some minor differences in this episode are: 1) the
overlay*
files are kept in the directory./overlayFSDir
, and 2) the renamedbak.processor*
directories are moved to the directory./bakDir
- And the main difference from the previous episode is: the reconstruction of the existing results can handle many time-results in the same job
- Nevertheless, in order to avoid the creation of a large number of files in the host file system, the extraction of the result-times from the overlayFS files is performed in batches of small size
- All the decomposed result-times of each small batch are postprocessed (reconstructed) and then deleted before the following batch is extracted/processed
Use of bash functions
- Another difference in the workflow scripts for this episode is the use of bash functions
- These functions are defined in a separate bash script contained in the directory named
../../A1_auxiliaryScripts
- That script is sourced from all the workflow (A,B,C..G) scripts
- In practice, users should keep this directory in a known location and source the definition of bash functions from there
0.I Accessing the scripts for this episode
In this episode, we make use of a series of scripts to cover a typical compilation/execution workflow. Lets start by listing the scripts.
List the content of the
A1_auxiliaryScripts
directory:zeus-1:~> cd $MYSCRATCH/pawseyTraining/containers-openfoam-workshop-scripts zeus-1:*-scripts> ls A1_auxiliaryScripts
ofContainersOverlayFunctions.sh
- This file contains the definition of several functions utilised within the workflow scripts
cd into the directory that contains the scripts for executing the workflow of this exercise. In this case we’ll use OpenFOAM-2.4.x.
zeus-1:*-scripts> cd 06_advancedScriptsForPostProcessingWithOverlayFS/example_OpenFOAM-2.4.x zeus-1:*-2.4.x> ls
A.extractAndAdpatTutorial.sh C.setupOverlayFoam.sh F.extractFromOverlayIntoBak.sh run B.decomposeFoam.sh D.runFoam.sh G.reconstructFromBak.sh caseSettingsFoam.sh E.reconstructFromOverlay.sh imageSettingsSingularity.sh
Sections and scripts for this episode
- In the following sections, there are instructions for submitting these workflow scripts for execution in the supercomputer one by one:
A.extractAndAdaptTutorial.sh
(already pre-executed) is for copying an adapting a case to solveB.decomposeFoam.sh
is decomposing the case to solveC.setupOverlayFoam.sh
is for creating the OverlayFS files to store the resultD.runFoam.sh
is for executing the solver (and writing results to the interior of the overlay files)E.reconstructFromOverlay.sh
is for reconstructing several result-times in batchesF.extractFromOverlayIntoBak.sh
is for extracting a batch of result-times from the overlay files (no reconstruction)G.reconstructFromBak.sh
is for reconstructing existing results in thebak.processor*
directories
caseSettingsFoam.sh
is a script that defines the settings for using OpenFOAM within all the workflow scripts (it is being sourced from all the A,B,C,..G scripts)imageSettingsSingularity.sh
is a script that defines the settings for using Singularity within all the other scripts (it is being sourced from all the workflow scripts)
../../A1_auxiliaryScripts/ofContainersOverlayFunctions.sh
is the script that contains the definition of the bash functions utilised in the workflow scripts (This script is sourced from all the workflow scripts)
So how will this episode flow?
- The script of section “A. has already been pre-executed.
- The usage of these scripts is extremely similar to that of previous episode (we recommend to practise previous episode first)
- Additional explanations are dedicated to the use of the bash functions
- And on the reconstruction made in batches (section E.)
- Sections F. and G. are left to the user to explore by themselves
A. Extract and adapt the tutorial to be solved - [Pre-Executed]
The
A.extractAndAdaptTutorial.sh
script (main parts to be discussed):#1. Loading the container settings, case settings and auxiliary functions (order is important) source $SLURM_SUBMIT_DIR/imageSettingsSingularity.sh source $SLURM_SUBMIT_DIR/caseSettingsFoam.sh overlayFunctionsScript=$auxScriptsDir/ofContainersOverlayFunctions.sh
The
imageSettingsSingularity.sh
script (main sections to be discussed):#Module environment module load singularity
#Defining the container to be used theRepo=/group/singularity/pawseyRepository/OpenFOAM theContainerBaseName=openfoam theVersion=2.4.x theProvider=pawsey theImage=$theRepo/$theContainerBaseName-$theVersion-$theProvider.sif
#Defining the path of the auxiliary scripts for dealing with overlayFS #(Define the path to a more permanent directory for production workflows) auxScriptsDir=$SLURM_SUBMIT_DIR/../../A1_auxiliaryScripts
The
caseSettingsFoam.sh
script (main sections to be discussed):#Choosing the tutorial case tutorialAppDir=incompressible/pimpleFoam tutorialName=channel395 tutorialCase=$tutorialAppDir/$tutorialName
#Choosing the working directory for the case to solve baseWorkingDir=$SLURM_SUBMIT_DIR/run if ! [ -d $baseWorkingDir ]; then echo "Creating baseWorkingDir=$baseWorkingDir" mkdir -p $baseWorkingDir fi caseName=$tutorialName caseDir=$baseWorkingDir/$caseName
A.I Steps for dealing with the extraction and adaptation of the case to be solved
Submit the job (no need for reservation as the script uses the
copyq
partition)zeus-1:*-2.4.x> sbatch A.extractAndAdaptTutorial.sh
Submitted batch job 4632758
Check that the tutorial has been copied to our host file system
zeus-1:*-2.4.x> ls ./run/channel395/
0 0.orig Allrun constant system
Read the
controlDict
dictionary:zeus-1:*-2.4.x> view ./run/channel395/system/controlDict ~ ~ ~ :q
The settings that were adapted in
./run/channel395/system/controlDict
To use binary writing format to accelerate writing and reduce the size of the files
writeFormat binary;
Never use
runTimeModifiable
. This option creates permanent reading of dictionaries (each time step) which overloads the shared file system.runTimeModifiable false;
B. Decomposition
The
B.decomposeFoam.sh
script (main points to be discussed):#1. Loading the container settings, case settings and auxiliary functions (order is important) source $SLURM_SUBMIT_DIR/imageSettingsSingularity.sh source $SLURM_SUBMIT_DIR/caseSettingsFoam.sh overlayFunctionsScript=$auxScriptsDir/ofContainersOverlayFunctions.sh
The
imageSettingsSingularity.sh
script (main sections to be discussed):#Module environment module load singularity
#Defining the container to be used theRepo=/group/singularity/pawseyRepository/OpenFOAM theContainerBaseName=openfoam theVersion=2.4.x theProvider=pawsey theImage=$theRepo/$theContainerBaseName-$theVersion-$theProvider.sif
#Defining the path of the auxiliary scripts for dealing with overlayFS #(Define the path to a more permanent directory for production workflows) auxScriptsDir=$SLURM_SUBMIT_DIR/../../A1_auxiliaryScripts
The
caseSettingsFoam.sh
script (main sections to be discussed):#Choosing the working directory for the case to solve baseWorkingDir=$SLURM_SUBMIT_DIR/run if ! [ -d $baseWorkingDir ]; then echo "Creating baseWorkingDir=$baseWorkingDir" mkdir -p $baseWorkingDir fi caseName=$tutorialName caseDir=$baseWorkingDir/$caseName
#7. Perform all preprocessing OpenFOAM steps up to decomposition srun -n 1 -N 1 singularity exec $theImage blockMesh 2>&1 | tee $logsDir/log.blockMesh srun -n 1 -N 1 singularity exec $theImage decomposePar -cellDist 2>&1 | tee $logsDir/log.decomposePar
B.I Steps for dealing with decomposition:
Submit the decomposition script from the scripts directory (use the reservation for the workshop if available)
zeus-1:*-2.4.x> myReservation=containers
zeus-1:*-2.4.x> sbatch --reservation=$myReservation B.decomposeFoam.sh
- If a reservation is not available, do not use the option. (Or you can use the debugq:
--partition=debugq
instead.)Submitted batch job 4632558
Check that the decomposition has been performed:
zeus-1:*-2.4.x> ls ./run/channel395/processor*
./run/channel395/processor0: 0 constant ./run/channel395/processor1: 0 constant ./run/channel395/processor2: 0 constant ./run/channel395/processor3: 0 constant ./run/channel395/processor4: 0 constant
- Note that one
processor*
directory was created pernumberOfSubdomains
(The number of subdomains is set in thesystem/decomposeParDict
dictionary)- Also note that this tutorial uses 5 subdomains (and 5 cores when executing the solver (below))
You should also check for success/errors in:
- the slurm output file:
slurm-<SLURM_JOBID>.out
- the log files created when executing the OpenFOAM tools in:
./run/channel395/logs/pre/
C. Setup the overlayFS
The
C.setupOverlayFoam.sh
script (main points to be discussed):#SBATCH --ntasks=4 #Several tasks will be used for copying files. (Independent from the numberOfSubdomains)
#1. Loading the container settings, case settings and auxiliary functions (order is important) source $SLURM_SUBMIT_DIR/imageSettingsSingularity.sh source $SLURM_SUBMIT_DIR/caseSettingsFoam.sh overlayFunctionsScript=$auxScriptsDir/ofContainersOverlayFunctions.sh
The
imageSettingsSingularity.sh
script (main sections to be discussed):#Module environment module load singularity
#Defining the container to be used theRepo=/group/singularity/pawseyRepository/OpenFOAM theContainerBaseName=openfoam theVersion=2.4.x theProvider=pawsey theImage=$theRepo/$theContainerBaseName-$theVersion-$theProvider.sif
#Defining settings for the OverlayFS overlaySizeGb=1
#Defining the path of the auxiliary scripts for dealing with overlayFS #(Define the path to a more permanent directory for production workflows) auxScriptsDir=$SLURM_SUBMIT_DIR/../../A1_auxiliaryScripts
The
caseSettingsFoam.sh
script (main sections to be discussed):#Choosing the working directory for the case to solve baseWorkingDir=$SLURM_SUBMIT_DIR/run if ! [ -d $baseWorkingDir ]; then echo "Creating baseWorkingDir=$baseWorkingDir" mkdir -p $baseWorkingDir fi caseName=$tutorialName caseDir=$baseWorkingDir/$caseName
#Defining the name of the directory inside the overlay* files at which results will be saved baseInsideDir=/overlayOpenFOAM/run insideName=$caseName insideDir=$baseInsideDir/$insideName
#3. Create the directory where the OverlayFS files are going to be kept if ! [ -d ./overlayFSDir ]; then echo "Creating the directory ./overlayFSDir which will contain the overlay* files:" mkdir -p ./overlayFSDir else echo "For some reason, the directory ./overlayFSDir for saving the overlay* files already exists:" echo "Warning:No creation needed" fi
#5. Rename the processor* directories into bak.processor* and move them into ./bakDir #(OpenFOAM wont be able to see these directories) #(Access will be performed through soft links) echo "Renaming the processor directories" rename processor bak.processor processor* if ! [ -d ./bakDir ]; then echo "Creating the directory ./bakDir that will contain the bak.processor* directories:" mkdir -p ./bakDir else echo "For some reason, the directory ./bakDir for containing the bak.processor* dirs already exists:" echo "Warning:No creation needed" fi if ! [ -d ./bakDir/bak.processor0 ]; then echo "Moving all bak.processor* directories into ./bakDir" mv bak.processor* ./bakDir else echo "The directory ./bakDir/bak.processor0 already exists" echo "No move/replacement of bak.processor* directories will be performed" echo "Exiting";exit 1 fi
#6. Creating a first ./overlayFSDir/overlayII file (./overlayFSDir/overlay0) createOverlay0 $overlaySizeGb;success=$? #Calling the function for creating the ./overlayFSDir/overlay0 file if [ $success -eq 222 ]; then echo "./overlayFSDir/overlay0 already exists" echo "Exiting";exit 1 elif [ $success -ne 0 ]; then echo "Failed creating ./overlayFSDir/overlay0, exiting" echo "Exiting";exit 1 fi
- Note the use of the
createOverlay0
function- The function receives as an argument the size (in Gb) of the file to be created
- The returned value of the function is saved in the
success
variable and then checked- Read the definition of the function in
../../A1_auxiliaryScripts/ofContainersOverlayFunctions.sh
#7. Replicating the ./overlayFSDir/overlay0 file into the needed number of ./overlayFSDir/overlay* files (as many as processors*) echo "Replication ./overlayFSDir/overlay0 into the rest of the ./overlayFSDir/overlay* files" for ii in $(seq 1 $(( foam_numberOfSubdomains - 1 ))); do if [ -f ./overlayFSDir/overlay${ii} ]; then echo "./overlayFSDir/overlay${ii} already exists" echo "Deal with it first and remove it from the working directory" echo "Exiting";exit 1 else echo "Replicating ./overlayFSDir/overlay0 into ./overlayFSDir/overlay${ii}" srun -n 1 -N 1 --mem-per-cpu=0 --exclusive cp ./overlayFSDir/overlay0 ./overlayFSDir/overlay${ii} & fi done wait
#8. Creating the processor* directories inside the ./overlayFSDir/overlay* files createInsideProcessorDirs $insideDir $foam_numberOfSubdomains;success=$? #Calling the function for creatingthe inside directories if [ $success -eq 222 ]; then echo "$insideDir/processor0 already exists inside the ./overlayFSDir/overlay0 file" echo "Exiting";exit 1 elif [ $success -ne 0 ]; then echo "Failed creating the inside directories, exiting" echo "Exiting";exit 1 fi
- Note the use of the
createInsideProcessorDirs
function- The function receives as arguments: 1) the path inside the OverlayFS files where to create the
processor*
dirs, and 2) the number of subdomains.- The returned value of the function is saved in the
success
variable and then checked- Read the definition of the function in
../../A1_auxiliaryScripts/ofContainersOverlayFunctions.sh
#9. Transfer the content of the ./bakDir/bak.processor* directories into the ./overlayFSDir/overlay* files echo "Copying OpenFOAM the files inside ./bakDir/bak.processor* into the ./overlayFSDir/overlay* files" copyIntoOverlayII './bakDir/bak.processor${ii}/*' "$insideDir/"'processor${ii}/' "$foam_numberOfSubdomains" "true";success=$? if [ $success -ne 0 ]; then echo "Failed creating the inside directories, exiting" echo "Exiting";exit 1 fi
- Note the use of the
copyIntoOverlayII
function- The function receives as arguments: 1) the source of the copy, 2) the destination, 3) the number of subdomains and 4) a boolean for replacing content or not
- Note the use of single quotes for passing the wildcard ‘*’ to the function without evaluation
- Also note the use of single quotes ‘…${ii}…’ in the place where the number of the overlay${ii} (or processor${ii}) is needed
- The returned value of the function is saved in the
success
variable and then checked- Read the definition of the function in
../../A1_auxiliaryScripts/ofContainersOverlayFunctions.sh
#10. Mark the initial conditions time directory as already fully reconstructed echo "Marking the time directory \"0\" as fully reconstructed" touch 0/.reconstructDone
- A dummy empty and hidden file named
.reconstructDone
is used to mark those result-times that have been successfully reconstructed- In this case, as the time
0
is originally reconstructed by default, it is marked
#11. List the content of directories inside the ./overlayFSDir/overlay* files echo "Listing the content in ./overlayFSDir/overlay0 $insideDir/processor0" srun -n 1 -N 1 singularity exec --overlay ./overlayFSDir/overlay0 $theImage ls -lat $insideDir/processor0/
C.I Steps for dealing with the Overlay setup
Submit the solver script (from the scripts directory)
zeus-1:*-2.4.x> sbatch --reservation=$myReservation C.prepareOverlayFoam.sh
- If a reservation is not available, do not use the option. (Or you can use the debugq:
--partition=debugq
instead.)Submitted batch job 4642685 on cluster zeus
Check that the processor* directories were renamed/moved to ./bakDir/bak.processor*:
zeus-1:*-2.4.x> cd run/channel395 zeus-1:channel395> ls ./bakDir
bak.processor0 bak.processor1 bak.processor2 bak.processor3 bak.processor4
- All
processor*
directories have been renamed/moved to./bakDir/bak.processor*
Check that ./overlayFSDir/overlay* files exist:
zeus-1:channel395> ls ./overlayFSDir
overlay0 overlay1 overlay2 overlay3 overlay4
- All
./overlayFSDir/overlay*
files existExplore the content of one of the overlay files:
zeus-1:channel395> module load singularity zeus-1:channel395> theImage=/group/singularity/pawseyRepository/OpenFOAM/openfoam-2.4.x-pawsey.sif zeus-1:channel395> insideDir=/overlayOpenFOAM/run/channel395 zeus-1:channel395> singularity exec --overlay ./overlayFSDir/overlay1 $theImage ls -lat $insideDir/processor1/
total 16 drwxr-s---+ 4 espinosa pawsey0001 4096 May 24 20:38 . drwxr-s---+ 2 espinosa pawsey0001 4096 May 24 20:38 0 drwxr-s---+ 3 espinosa pawsey0001 4096 May 24 20:38 constant drwxr-s---+ 3 espinosa pawsey0001 4096 May 24 20:38 ..
D. Executing the solver
The
D.runFoam
script (main points to be discussed):#SBATCH --ntasks=5
#5. Defining OpenFOAM controlDict settings for this run foam_endTime=40 foam_writeInterval=1 foam_purgeWrite=0 #Just for testing in this exercise. In reality this should have a reasonable value if possible #foam_purgeWrite=10 #Just 10 times will be preserved
#7. Creating soft links towards directories inside the ./overlayFSDir/overlay* files #These links and directories will be recognized by each mpi instance of the container #(Initially these links will appear broken as they are pointing towards the interior of the ./overlayFSDir/overlay* files. # They will only be recognized within the containers) pointToOverlay $insideDir $foam_numberOfSubdomains;success=$? #Calling function to point towards the interior if [ $success -ne 0 ]; then echo "Failed creating the soft links" echo "Exiting";exit 1 fi
- Note the use of the
pointToOverlay
function- The function receives as arguments: 1) the path inside the OverlayFS files where to create the
processor*
dirs, and 2) the number of subdomains.- The returned value of the function is saved in the
success
variable and then checked- Read the definition of the function in
../../A1_auxiliaryScripts/ofContainersOverlayFunctions.sh
#8. Execute the case echo "About to execute the case" srun -n $SLURM_NTASKS -N $SLURM_JOB_NUM_NODES bash -c "singularity exec --overlay ./overlayFSDir/"'overlay${SLURM_PROCID}'" $theImage pimpleFoam -parallel 2>&1" | tee $logsDir/log.pimpleFoam.$SLURM_JOBID echo "Execution finished"
- VERY IMPORTANT: Note that the singularity command is called inside a
bash -c
command- This is the way we allow each MPI task to pick a different
./overlayFSDir/overlay*
file through theSLURM_PROCID
variable- Here,
SLURM_PROCID
is slurm environment variable which needs to be evaluated when executing the container, so we use the section in single quotes'...'
to allow the internal evaluation of that variable- Here,
theImage
is not a global environment variable, is evaluated by the host shell in a section with double quotes"..."
at the command line- The total string passed to
bash -c
is the concatenation of two doble quotes sections with a single quotes section in between
#10. List the existing times inside the ./overlayFSDir/overlay0 echo "Listing the available times inside ./overlayFSDir/overlay0" srun -n 1 -N 1 singularity exec --overlay ./overlayFSDir/overlay0 $theImage ls -lat processor0/
D.I Steps for dealing with the solver
Submit the solver script (from the scripts directory)
zeus-1:*-2.4.x> sbatch --reservation=$myReservation D.runFoam.sh
- If a reservation is not available, do not use the option. (Or you can use the debugq:
--partition=debugq
instead.)Submitted batch job 4632685 on cluster zeus
Check that the solver is running:
zeus-1:*-2.4.x> squeue -u $USER
JOBID USER ACCOUNT PARTITION NAME EXEC_HOST ST REASON START_TIME END_TIME TIME_LEFT NODES PRIORITY 4632685 espinosa pawsey0001 workq D.runFoam.sh n/a PD Resources 17:09:28 17:19:28 10:00 1 75190
Observe the output of the job with
tail -f
at runtime (use<Ctrl-C>
to exit the command):zeus-1:*-2.4.x> tail -f slurm-4632685.out
. . . Time = 0.2 PIMPLE: iteration 1 smoothSolver: Solving for Ux, Initial residual = 0.0118746, Final residual = 1.89249e-06, No Iterations 3 smoothSolver: Solving for Uy, Initial residual = 0.0617212, Final residual = 1.68113e-06, No Iterations 4 smoothSolver: Solving for Uz, Initial residual = 0.0589944, Final residual = 9.70923e-06, No Iterations 3 Pressure gradient source: uncorrected Ubar = 0.13369, pressure gradient = -0.000964871 GAMG: Solving for p, Initial residual = 0.213844, Final residual = 0.00414884, No Iterations 2 time step continuity errors : sum local = 5.82807e-06, global = -1.41211e-19, cumulative = -1.41211e-19 Pressure gradient source: uncorrected Ubar = 0.133687, pressure gradient = -0.000947989 GAMG: Solving for p, Initial residual = 0.0222643, Final residual = 4.30412e-07, No Iterations 7 time step continuity errors : sum local = 5.63638e-10, global = -2.40486e-19, cumulative = -3.81697e-19 Pressure gradient source: uncorrected Ubar = 0.133687, pressure gradient = -0.000947874 ExecutionTime = 0.25 s ClockTime = 0 s . . .
- You can see in the case directory that now there are several
processor*
soft linkszeus-1:*-2.4.x> cd run/channel395 zeus-1:channel395> ls -lat
total 5242952 drwxr-s---+ 12 espinosa pawsey0001 4096 May 25 14:55 . drwxrws---+ 4 espinosa pawsey0001 4096 May 25 14:55 logs lrwxrwxrwx 1 espinosa pawsey0001 42 May 25 14:55 processor0 -> /overlayOpenFOAM/run/channel395/processor0 lrwxrwxrwx 1 espinosa pawsey0001 42 May 25 14:55 processor1 -> /overlayOpenFOAM/run/channel395/processor1 lrwxrwxrwx 1 espinosa pawsey0001 42 May 25 14:55 processor2 -> /overlayOpenFOAM/run/channel395/processor2 lrwxrwxrwx 1 espinosa pawsey0001 42 May 25 14:55 processor3 -> /overlayOpenFOAM/run/channel395/processor3 lrwxrwxrwx 1 espinosa pawsey0001 42 May 25 14:55 processor4 -> /overlayOpenFOAM/run/channel395/processor4 drwxr-s---+ 2 espinosa pawsey0001 4096 May 25 14:55 system drwxrws---+ 4 espinosa pawsey0001 4096 May 25 14:53 bakDir drwxrws---+ 4 espinosa pawsey0001 4096 May 25 14:53 overlayFSDir drwxr-s---+ 2 espinosa pawsey0001 4096 May 25 14:53 0 drwxr-s---+ 3 espinosa pawsey0001 4096 May 25 14:53 constant drwxrws---+ 3 espinosa pawsey0001 4096 May 25 14:03 .. drwxr-s---+ 2 espinosa pawsey0001 4096 May 25 14:03 0.org -rwxrwx---+ 1 espinosa pawsey0001 483 May 25 14:03 Allrun
- The
processor*
soft links are pointing to the directory structure that “lives” inside the./overlayFSDir/overlay*
fileszeus-1:channel395> ls -la processor1/
ls: cannot access 'processor1/': No such file or directory
- The host shell cannot read the directory structure inside the ./overlayFSDir/overlay* files, and that is why the links appear broken
Check that the solver gave some results by listing the interior of an overlay file:
zeus-1:channel395> module load singularity zeus-1:channel395> theImage=/group/singularity/pawseyRepository/OpenFOAM/openfoam-2.4.x-pawsey.sif zeus-1:channel395> singularity exec --overlay ./overlayFSDir/overlay1 $theImage ls processor1/
0 10.2 12.4 14.6 16.8 19 20.2 22.4 24.6 26.8 29 30.2 32.4 34.6 36.8 39 5 7.2 9.4 0.2 10.4 12.6 14.8 17 19.2 20.4 22.6 24.8 27 29.2 30.4 32.6 34.8 37 39.2 5.2 7.4 9.6 0.4 10.6 12.8 15 17.2 19.4 20.6 22.8 25 27.2 29.4 30.6 32.8 35 37.2 39.4 5.4 7.6 9.8 0.6 10.8 13 15.2 17.4 19.6 20.8 23 25.2 27.4 29.6 30.8 33 35.2 37.4 39.6 5.6 7.8 constant 0.8 11 13.2 15.4 17.6 19.8 21 23.2 25.4 27.6 29.8 31 33.2 35.4 37.6 39.8 5.8 8 1 11.2 13.4 15.6 17.8 2 21.2 23.4 25.6 27.8 3 31.2 33.4 35.6 37.8 4 6 8.2 1.2 11.4 13.6 15.8 18 2.2 21.4 23.6 25.8 28 3.2 31.4 33.6 35.8 38 4.2 6.2 8.4 1.4 11.6 13.8 16 18.2 2.4 21.6 23.8 26 28.2 3.4 31.6 33.8 36 38.2 4.4 6.4 8.6 1.6 11.8 14 16.2 18.4 2.6 21.8 24 26.2 28.4 3.6 31.8 34 36.2 38.4 4.6 6.6 8.8 1.8 12 14.2 16.4 18.6 2.8 22 24.2 26.4 28.6 3.8 32 34.2 36.4 38.6 4.8 6.8 9 10 12.2 14.4 16.6 18.8 20 22.2 24.4 26.6 28.8 30 32.2 34.4 36.6 38.8 40 7 9.2
- (In this example the final time was set to be 40 to allow for the creation of more results)
- You should also check for success/errors in:
- the slurm output file:
slurm-<SLURM_JOBID>.out
- the log files created when executing the OpenFOAM tools in:
./run/channel395/logs/run/
E. Reconstruction
No, unfortunately a container cannot mount more than 1 OverlayFS file at the same time
- Yes, this implies that the results need to be copied back to the host file system before reconstruction
- This is the inverse operation to the process of copying the initial decomposition into the OverlayFS files (explained at the beginning of this episode)
- But in order to avoid the presence of many files in the host, this should be done by small batches:
- Copy small batch of results from the interior to the
bak.processor*
directories- Now create
processor*
soft links to point tobak.processor*
directories and not to the OverlayFS interior- Reconstruct that small batch
- Remove the reconstructed result-times from the
bak.processor*
directories- Continue the cycle in 1. again until postprocessing all the result-times needed
The
E.reconstructFromOverlay.sh
script (main points to be discussed):#SBATCH --ntasks=4 #Several tasks will be used for copying files. (Independent from the numberOfSubdomains)
#4. Create the reconstruction array, intended times to be reconstructed are set with the reconstructTimes var #These formats are the only accepted by function "generateReconstructArray" (check the function definition for further information) #reconstructTimes="all" #reconstructTimes="-1" #reconstructTimes="20" #reconstructTimes="50,60,70,80,90" reconstructTimes="0:10" unset arrayReconstruct #This global variable will be re-created in the function below generateReconstructArray "$reconstructTimes" "$insideDir";success=$? #Calling fucntion to generate "arrayReconstruct" if [ $success -ne 0 ]; then echo "Failed creating the arrayReconstruct" echo "Exiting";exit 1 fi
- Note the use of the
generateReconstructArray
function- The function receives as arguments: 1)the string defining the result-times to be reconstructed, and 2) the path inside the OverlayFS files where to create the
processor*
dirs- The returned value of the function is saved in the
success
variable and then checked- The GLOBAL array
arrayReconstruct
is unset before calling the function, and the function will create a new one with the result-times to be reconstructed- Read the definition of the function in
../../A1_auxiliaryScripts/ofContainersOverlayFunctions.sh
#5. Point the soft links to the ./bakDir/bak.processor* directories pointToBak $foam_numberOfSubdomains;success=$? #Calling function to point towards the bak.processors if [ $success -ne 0 ]; then echo "Failed creating the soft links" echo "Exiting";exit 1 fi
- Note the use of the
pointToBak
function- The function receives as argument the number of subdomains.
- The returned value of the function is saved in the success variable and then checked
- Read the definition of the function in
../../A1_auxiliaryScripts/ofContainersOverlayFunctions.sh
maxTimeTransfersFromOverlays=10
- This variable sets the size of the batches to be postprocessed
- The following sections are executed inside a loop (not shown here) as many times needed to postprocess all the result-times required batch by batch:
## 9. Copy from the ./overlayFSDir/overlay* the full batch into ./bakDir/bak.processor* unset arrayCopyIntoBak arrayCopyIntoBak=("${hereToDoReconstruct[@]}") replace="true" copyResultsIntoBak "$insideDir" "$foam_numberOfSubdomains" "$replace" "${arrayCopyIntoBak[@]}";success=$? #Calling the function to copy time directories into bak.processor* if [ $success -ne 0 ]; then echo "Failed transferring files into bak.processor* directories" echo "Exiting";exit 1 fi
- The array
hereToDoReconstruct
has the result-times to be processed in the current batch- Note the use of the
copyResultsIntoBak
function- The function receives as arguments: 1) the path inside the overlays, 2) the number of subdomains, 3) the indication for replacing or not already existing result-times in the bak directories and 4) the array with the result-times to process.
- The returned value of the function is saved in the success variable and then checked
- Read the definition of the function in
../../A1_auxiliaryScripts/ofContainersOverlayFunctions.sh
## 10. Reconstruct all times for this batch. echo "Start reconstruction" logFileHere=$logsDir/log.reconstructPar.${SLURM_JOBID}_${hereToDoReconstruct[0]}-${hereToDoReconstruct[-1]} srun -n 1 -N 1 singularity exec $theImage reconstructPar -time ${timeString} 2>&1 | tee $logFileHere
- The command in the
srun
line executes the reconstruction of the batch- The variable
timeString
has the list of the result-times to be reconstructed in each batch. This was set in a previous step (in step ## 8., not shown here)- The output of the reconstruction is saved in the file
logFileHere
(this file needs to be checked when reconstruction errors happen)- For more details of the logic of the loop refer to the full script itself
E.I Steps for dealing with reconstruction:
Submit the reconstruction script (from the scripts directory)
zeus-1:*-2.4.x> sbatch --reservation=$myReservation E.reconstructFromOverlay.sh
- If a reservation is not available, do not use the option. (Or you can use the debugq:
--partition=debugq
instead.)Submitted batch job 4632899 on cluster zeus
Check that the reconstruction is being performed:
zeus-1:*-2.4.x> cd run/channel395/ zeus-1:channel395> watch ls ./bakDir/bak.processor0/
- The command
watch
executes thels
of the content of./bakDir/bak.processor0
every 2 seconds
Every 2.0s: ls ./bakDir/bak.processor0/ Mon Jun 15 18:33:16 2020 0 0.2 0.4 0.6 0.8 1 1.2 constant
- You can watch the progress of the first batch being copied to the
./bakDir/bak.processor0
directory- The same is happening for the resto of the
./bakDir/bak.processor*
directories- In this case
maxTimeTransfersFromOverlays=10
(set within the script) is the size of the batches- After the copy of the first batch is finished, those result-times will be reconstructed
- After succesful reconstruction, those result-times will be removed from the host file system
Every 2.0s: ls ./bakDir/bak.processor0/ Mon Jun 15 18:35:36 2020 0 2.2 2.4 2.6 constant
- A few minutes later, the second batch is being copied to the
bak.processor*
directories- Note that the first batch of files have been removed already from the system
Every 2.0s: ls ./bakDir/bak.processor0/ Mon Jun 15 18:39:16 2020 0 10 constant
- When finished, the earliest and the latest result-times were kept in the
bak.processor*
directories (although this can be modified within the scripts if desired)
<CTRL>-C (to exit the watch command)
- Check for the existence of the reconstructed times:
zeus-1:channel395> ls
0 0.org 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6 bak.processor2 overlay0 processor0 system 0.2 1 1.8 2.8 3.8 4.8 5.8 6.8 7.8 8.8 9.8 bak.processor3 overlay1 processor1 0.4 10 2 3 4 5 6 7 8 9 Allrun bak.processor4 overlay2 processor2 0.6 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 bak.processor0 constant overlay3 processor3 0.8 1.4 2.4 3.4 4.4 5.4 6.4 7.4 8.4 9.4 bak.processor1 logs overlay4 processor4
- You should also check for success/errors in:
- the slurm output file:
slurm-<SLURM_JOBID>.out
- the log files created when executing the OpenFOAM tools in:
./run/channel395/logs/post/
(check the command for thereconstructPar
command for understanding the naming convention of the log files. In this case, they contain theSLURM_JOBID
and thefirst-last
result-times reconstructed in each batch.
F. Extract from Overlay into Bak; and G. Reconstruct From Bak
F.G.I These scripts are left for the user to try by themselves
F.extractFromOverlayIntoBak.sh
is for extracting a batch of result-times from the overlay files (no reconstruction)G.reconstructFromBak.sh
is for reconstructing existing results in thebak.processor*
directories
Z. Further notes on how to use OpenFOAM and OpenFOAM containers at Pawsey
More on OverlayFS for singularity in: https://sylabs.io/guides/3.5/user-guide/persistent_overlays.html
The usage of OpenFOAM and OpenFOAM containers at Pawsey has already been described in our documentation: OpenFOAM documentation at Pawsey
and in a technical newsletter note: https://support.pawsey.org.au/documentation/display/US/Pawsey+Technical+Newsletter+2020-04
Key Points
No, unfortunately a container cannot mount more than 1 OverlayFS file at the same time
Yes, this implies that the results need to be copied back to the host file system before reconstruction
In order to avoid the presence of many files in the host, this should be done by small batches - 1.Copy small batch of results from the interior of the
./overlayFSDir/overlay*
files towards the./bakDir/bak.processor*
directories in the host file system - 2.Now createprocessor*
soft links to point to./bakDir/bak.processor*
directories and not to the directory structure inside the OverlayFS files - 3.Reconstruct that small batch - 4.Remove the decomposed result-times from the./bakDir/bak.processor*
directories. Only the fully reconstructed result-times are kept in the host. And the original decomposed results are only kept inside the OverlayFS files. - 5.Continue the cycle until postprocessing all the result-times needed