BIRD Cluster: General purpose batch farm

Sun N1 Grid Engine N1GE (BIRD)

Contents

 

 

Description

A very brief description

BIRD is "Batch Infrastructure Resource at DESY")
The BIRD Batch Facility provides:

  • more than 350 cpu cores for interactive and batch processing
  • group specific software and storage
  • submit and control facilities from PAL, BIRD and group specific hosts
  • fair share load distribution and quota handling
  • integration of DESY wide batch resources
  • afs and kerberos support for authentication and resource access
  • runs on Sun Project N1 Grid Engine Version 6 (N1GE)

N1GE is a batching system which includes job control, manual pages, graphical user interface(GUI),  and line commands.  

List of some basic line commands:

  Setup the sge shell environment:

. /usr/sge/default/common/settings.sh # Bourne-Shell Users for AFS Fair Share Support source
. /usr/sge/default/common/settings.csh # C-Shell Users for AFS Fair Share Support
. /usr/sge/c2/common/settings.sh # Bourne-Shell Users for non-AFS Experimental Support source
. /usr/sge/c2/common/settings.csh # C-Shell Users for non-AFS Experimental Support

  To show current status of submitted jobs:

 qstat # Show your active or waiting jobs 
 qstat -u '*' # Show all active or waiting jobs 
 qstat -j # Show special active or waiting job 
 qquota # Show your current quota usage 
 qquota -u '*' -P '*' -q '*' # Show quota usage for all users, projects and queues 
 qacct -j # Show special job accounting (when done) 
 qacct # Show your accounting information 

  To submit a job:

 qsub -cwd myjob -cwd: 
put output in current working directory,
otherwise output is to the home directory myjob: shell script

  To get information on the batch system:

 qstat -g c # Get infos on queues qhost # Get infos on hosts 

  Temporary Writable File Space:

 See Shell Variables $TMP and/or $TMPDIR 

  Start the GUI:

 qmon & 

Other Qick Hints:

Let yourself register to registry resource BATCH to get queue access ...

No names like: sge root atlas cms desy flc h1 hasylab herab it mpy unihh2 zeus opera operator batch
Commonly Used Options and Values
CLI
(Command Line Interface)
JCL
(Job Control Language)
Values
(Hints)
Comment
-l complex="value"#$ -l complex=valueRequested resources for your job. Depends also on configuration. Most names also have a shortcut alias name. Regular expressions may be used for value specification.Use multiple -l on CLI or multiple lines in JCL for requesting diffent complex values. Use quotes on CLI. 
 
-l distro="Operating_System"#$ -l distro=Operating_Systemsld3, sld4, sld5Alias is os 
 
-l arch="Architecture"#$ -l arch=Architecturex86(32bit), amd64(64bit)Alias is a 
 
-l h_rt="Available_Job_Time"#$ -l h_rt=Available_Job_TimeAmount of wall time hh:mm:ss (in hours, minutes and seconds units). Use e.g. 72:00:00 for a 3 days job.Alias is h_rt. 
 
-l h_vmem="Available_Memory_Space"#$ -l h_vmem=Available_Memory_SpaceAmount of memory (in K, M or G units) with default=256M, since 2009 default=1GAlias is h_vmem. 
 
-l h_fsize="Available_Disk_Space"#$ -l h_fsize=Available_Disk_SpaceAmount of memory (in K, M or G units) with default=3GAlias is h_fsize. 
 
-l h_stack="Available_Stack_Size"#$ -l h_stack=Available_Stack_SizeAmount of stack memory (in K, M or G units) with default=0 (unlimited), set h_stack=100M (if you have multithread issues e.g. with command lfc-ls)Alias is h_stack. 
 
Options and Values Under Test
-l idle=true#$ -l idle=trueRequest queuing to idle queueAlias is idle. 
 

Other Options:

#$ -P $GROUP Projektname für die Resourenabrechnung imFairsharebetrieb.
Im Normalfall wird das Default-Projekt der primären UNIX-Gruppeentsprechen
#$ -o %path% Dateiname für die Standardausgabe(stdout)
#$ -e %path% Dateiname für die Standardfehlerausgabe(stderr)
#$ -j y Join: Standardfehlerausgabe wird in Standardausgabegeschrieben
#$ -N jobname Default: Dateiname (qsub)
#$ -cwd Wechseln in das Verzeichnis, aus dem der Batchjobaufgerufen wurde
#$ -m ebas Mail nach Ende(e),Beginn(b),Abbruch(a) oder Suspend(s)des Batch-Jobs siehe auch: -M
#$ -M mailadresse Benachrichtigung über E-Mail.
Achtung: Um zustellbare Mails zu erzeugen werden unter bestimmten Bedingungen
die Zustelladdressen modifiziert.Wenn die Zustelladdresse user@host.desy.de lautet, wird diese
durch user@mail.desy.de ersetzt. Dies ist bei Verzicht auf die Option -M der Normalfall.
Wird die Addresse gesetzt auf Vorname.Nachname@Site oder user@Site oder user@host.other.site
findet keine Veränderung der Addresse statt.#$ -v Variable=Wert Umgebungsvariablen setzen.
Kann notwendig sein, damit Umgebungs- variablen auch an parallele Prozesse
vererbt werden.
#$ -w v Job überprüfen, aber nicht abschicken.

Short job control examples:

Source files and examples: 
- Use "show page source" for viewing examples (if not binary)
- Click with right mouse button for download

Non-interactive batch:

[my_submit_host] ~ qsub -cwd my_test_job.sh # Batch example 1 in working directory 
[my_submit_host] ~ qsub -l arch=x86 -l os=sld4 -cwd my_test_job.sh
# Batch example 2 with architecture request

Non-interactive batch with MPICH parallel environments:

[my_submit_host] ~ qsub my_mpich2_job.sh # Distributed PE Batch example (Currently only os=sld5) 
[my_submit_host] ~ qsub my_mpich2-1_job.sh # Non-Distributed PE Batch example Source
for MPI-Code: SL4/5 Binary MPI C Source Makefile Prerequisites for using mpich2:
- Create a file ~/.mpd.conf for the mpd process daemon containinga
password like: MPD_SECRETWORD=....
- Restrict file access of ~/.mpd.conf to youself only: chmod 600~/.mpd.conf

Interactive Batch (Under test):

Before migration from protected subnet (Last hosts moved in November 2008): 
[my_host] ~ ssh bird # After login ... # If proper DISPLAY setting, free access
to X11/Xserver port and SGE cell with AFS token provider:
[n1gexih] ~ /usr/sge/util/qterm # -l os=sld3 -l arch=amd64 # ... get an xterm
(# on special architecture) # If free access to SGE selected ssh ports on exec hosts:
[n1gexih] ~ /usr/sge/util/qshell # -l h_vmem=1G # ... or an interactive shell
(# with lot's of memory) and ...
[sgehost] ~ xterm # ... start xclient, if X11 ist needed and supported by ssh configuration
After migration from protected subnet (will be better with sge version 6.2 and os=sld5):
[my_submit_host] ~ /usr/sge/util/qterm # Get an xterm ...
[my_submit_host] ~ /usr/sge/util/qshell # -l h_vmem=1G # ... or an interactive shell

Typical Job Environment:

GROUP=%Group% 
GROUPROOT=/etc/local/groups/%Group%
HOST=%ExecHost%.desy.de
HOSTNAME=%ExecHost%.desy.de
HOSTTYPE=i386-linux
JOB_ID=%JobID%
JOB_NAME=%JobName%
JOB_SCRIPT=/scratch/sge/default/spool/%ExecHost%/job_scripts/%JobID%
LOGNAME=%User%
MANPATH=/usr/sue/man:/opt/products/man:/usr/share/man:/usr/afsws
/man:/usr/X11/man:/usr/local/X11/man:/usr/local/man:/usr/kerberos/man
QUEUE=%Queue%
PATH=/scratch/sge/tmp/%JobID%.%Queue%:/usr/local/bin:/bin:/usr/bin
SGE_ACCOUNT=sge
SGE_ARCH=lx24-x86
SGE_BINARY_PATH=/usr/sge/bin/lx24-x86
SGE_CELL=default
SGE_CWD_PATH=/afs/desy.de/user/&UserI%/%User%/prgs/sge
SGE_JOB_SPOOL_DIR=/scratch/sge/default/spool/%ExecHost%/active_jobs/%JobID%
SGE_O_HOME=/afs/desy.de/user/&UserI%/%User%
SGE_O_HOST=%SubmitHost%
SGE_O_LOGNAME=%User%
SGE_O_MAIL=/var/mail/%User%
SGE_O_PATH=/usr/sge/bin/lx24-x86:/afs/desy.de/user/&UserI%/%User%/bin.
Linux_RHEL:/afs/desy.de/user/&UserI%/%User%/bin:/etc/local
/groups/%Group%/bin:/etc/local/groups/%Group%/scripts:/usr/sue/bin:/opt/products/scripts:
/opt/products/bin:/bin:/usr/bin:/usr/bin/X11:/usr/local/X11/bin:
/usr/local/bin:/usr/kerberos/bin:/cern/pro/bin:.:/usr/afsws/etc
SGE_O_SHELL=/bin/tcsh
SGE_O_WORKDIR=/afs/desy.de/user/&UserI%/%User%/prgs/sge
SGE_ROOT=/usr/sge
SGE_STDERR_PATH=/afs/desy.de/user/&UserI%/%User%/prgs/sge/sge_env.sh.e35
SGE_STDIN_PATH=/dev/null
SGE_STDOUT_PATH=/afs/desy.de/user/&UserI%/%User/prgs/sge/sge_env.sh.o35
SHELL=/bin/zsh
TMP=/scratch/sge/tmp/%JobID%.%Queue%
TMPDIR=/scratch/sge/tmp/%JobID%.%Queue%
USER=%User%

Queues (until 8.9.2008):

ClassTime LimitComment
default.q5 minutesdefault without project/group
short.q1 dayavailable, default
(h_rt < 24:00:00)
long.q1 weekavailable, long runner
(24:00:00 < h_rt < 168:00:00)
idle.q3 weeksunder evaluation, jobs will be suspended while other jobs request same host
(h_rt < 504:00:00 , s_rt < 503:00:00 [the job is first "warned" via the SIGUSR1 signal])
login.q1 dayunder evaluation, useful for debugging, startup may be difficult (see advice)

Queues (since 8.9.2008):

ClassTime LimitSlotsComment
default.q3 hours100 %
(@anyhost)
available as default
(h_rt < 3:00:00, h_vmem < 2G)
short.q1 dayappr. 85 %
(/platform)
(incl. 50 %
for long.q)
available for medium sized jobs
(h_rt < 24:00:00, h_vmem < 2G)
long.q1 weekavailable for long runner and high memory usage
(24:00:00 < h_rt < 168:00:00, h_vmem < 4G)
idle.q3 weeks1
(@anyhost)
under evaluation, jobs will be suspended while other jobs request same host
(h_rt < 504:00:00 , s_rt < 503:00:00 [the job is first "warned" via the SIGUSR1 signal])
login.q1 day4
(@somehost)
under evaluation, useful for debugging, startup may be difficult (see advice)

Using path aliases:

Problem:

If you are e.g. working on a submit host in directory /nfs/flc/lc3/pool/<myuserid>, which is an automount link to /pool/<myuserid> and if you then start a "qsub -cwd", your batch job will try to use a working directory /pool/<myuserid>, which does not exist on the batch host. The following config file would prepare the correct mapping for the above situation:

~/.sge_aliases

# src-path subm-host exec-host dest-path /pool lc3.desy.de 
* /nfs/flc/lc3/pool /pool lc4.desy.de
* /nfs/flc/lc4/pool

More Information

Sun documentation


Quota and Fairshare:

We use quota and fairshare settings to keep the batch cluster in a state where all resources can be shared in a fair manner. This includes that

Quota

There are people related quota settings insuring that a single user must not use more than 65 % of the cores of an os flavor at a time. Projects are limited to 75 % of a core set.

There are queue related quota settings allowing 100 % jobs in the default queue which ensures scheduling at least every 3 hours. The longer queues like long, short and long+short are limited to 65, 75 and 85 % respectively. The interactive login queue is limited to 50 %.

Fairshare

BIRD will be shared by all DESY users and IT provides service and hardware to this facility. On top of this other DESY groups have dedicated project specific hardware to the BIRD cluster. This way, valuable resources have been made available. The contributing project will be granted fair share points: 10 for each compute core, 10 for each 2 GByte memory and 1 for each GByte disk. These points will garantee access to this relative amount of batch resources to the project members. Because IT gives his own fairshare to the community and as batch is not continous over time this is a win-win situation for all, even for those without share points who are allowed to use the idle times and/or idle resources. Unused project shares even enhance your job priorities for the future weeks, so typically a project may use more resources than it could do in a stand-alone facility of it's own.

Every user has additional 100 functional points to priorize his own jobs. Users who belong to more than one project should carefully set the job's project membership.

More Information

Sun documentation


Manuals and other Information Sources:

BIRD Flyer (from UCO):

Guides Current Version:

Administration(pdf): admin.pdf
User Guide(pdf): user_guide.pdf

Guides Next Version:

Starting.pdf
Planning.pdf
Upgrading.pdf
Installing.pdf
Administrating.pdf
Using.pdf


Info: Sun Source: LINK

N1 Grid Engine 6 Collection


Zeuthen Batch User Information


Contact:

AddressPurposeComment
bird . service @ desy.deOperational Issues in HHGenerate an offical Ticket in RT system
sge - global @ desy.deDiscussion and Questions on Design and Development 
sge - users @ desy.deUser Notifications and Support HHSubscription is Part of Registry Resource