Large Jobs

PolyADCIRC runs batches of simulations of size num_of_parallel_runs. PolyADCIRC relies on GNU Parallel to handle simultaneously running a batch of serial jobs in parallel. PolyADCIRC relies on ibrun, a TACC specific batch MPI launch command to handle simulataneously running a batch of parallel jobs in parallel. Based on user inputs PolyADCIRC writes the appropriate Bash scripts which are run using the subprocess.POpen() command. Once the batch size (num_of_parallel_runs) gets too large the user may encounter

tail: write error: Broken pipe

We are currently working to eliminate this problem. The current work around is to reduce your batch size and break your job up into multiple jobs. These jobs may then be submitted to the queue and either run independently or sequentially. When doing so make sure that the run scripts specify a different save_dir, save_file, and script_name for each job. After your jobs complete the data may be concatenated into a single file. This file will have the same structure as if a single job was run. You might also want to create a crontab to periodically clear our your .sge directory.

concatenation_2

This file demonstrates concatenating data from two separate jobs.

Import necessary modules:

import polyadcirc.run_framework.domain as dom
import polyadcirc.run_framework.random_wall as rmw

Specifiy directories for the jobs that were run:

base_dir = '/h1/lgraham/workspace'
grid_dir = base_dir + '/ADCIRC_landuse/Inlet/inputs/poly_walls'
save_dir1 = base_dir + '/ADCIRC_landuse/Inlet/runs/poly_wall1'
save_dir2 = base_dir + '/ADCIRC_landuse/Inlet/runs/poly_wall2'
basis_dir = base_dir + '/ADCIRC_landuse/Inlet/landuse_basis/gap/beach_walls_2lands'

Specify the save file:

save_file = 'py_save_file'

Load the data from both runs:

main_run, domain, mann_pts, wall_pts, points = rmw.loadmat(save_file, base_dir,
    grid_dir, save_dir1, basis_dir)
other_run, domain, mann_pts2, wall_pts2, points2 = rmw.loadmat(save_file,
    base_dir, grid_dir, save_dir2, basis_dir)

Concatenate the data from both runs and save to a runSet object:

cated = main_run.concatenate(other_run, points, points2)

Store the concatenated version of points in a dictonary object:

mdat = dict()
mdat['points'] = cated[1]

Concatenate and store mann_pts and wall_pts in the dictionary:

mdat['mann_pts'] = np.concatenate((mann_pts1, mann_pts2),
    axis = points1.ndim-1)
mdat['wall_pts'] = np.concatenate((wall_pts1, wall_pts2),
    axis = points1.ndim-1)

Save the data to a new save file in save_dir1 as cat_file.mat:

main_run.update_mdict(mdat)
main_run.save(mdat, 'cat_file')

concatenate_many

This file demonstrates concatenating data from seven different jobs.

Import necessary modules:

import polyadcirc.run_framework.domain as dom
import polyadcirc.run_framework.random_wall as rmw

Specify the directories for the jobs and the base string for the save_dir and save_file. The name pattern for the save_dir and save_file is name+str(n) where n indicates this the the files and directory for the nth job:

base_dir = '/h1/lgraham/workspace'
grid_dir = base_dir + '/ADCIRC_landuse/Inlet/inputs/poly_walls'
save_dir = base_dir + '/ADCIRC_landuse/Inlet/runs/poly_wall'
basis_dir = base_dir + '/ADCIRC_landuse/Inlet/landuse_basis/gap/beach_walls_2lands'

save_file = 'py_save_file'

Load the data from the first job:

main_run, domain, mann_pts, wall_pts, points = rmw.loadmat(save_file+'0',
    base_dir, grid_dir, save_dir+'_0', basis_dir)

Load and concatenate the data for the remaning runs:

for i in xrange(1,7):
    save_file2 = save_file+str(i) # construct the save_file name
    save_dir2 = save_dir+'_'+str(i) # construct the save_dir name
    # load the data
    other_run, domain, mann_pts2, wall_pts2, points2 = rmw.loadmat(save_file2,
        base_dir, grid_dir, save_dir2,basis_dir)
    # concatenate the data
    run, points = main_run.concatenate(other_run, points, points2)

Save the data to save_dir+'_0'

mdat = dict()
mdat['points'] = points

main_run.update_mdict(mdat)
main_run.save(mdat, 'poly7_file')

Notice that in this example mann_pts and wall_pts are NOT saved. These two arrays have been stitched together into the points array using numpy.vstack((np.repeat(wall_points, s_p_wall,1), mann_pts)) in polyadcirc.run_framework.random_wall.runSet.run_points() into a single array.