Bird Cluster: Allgemeine Batch Farm (englisch)

Sun N1 Grid Engine N1GE (BIRD)

Actual Information also at http://bird.desy.de !

Contents

 

 

Description

A very brief description

BIRD is "Batch Infrastructure Resource at DESY")
The BIRD Batch Facility provides:

  • more than 350 cpu cores for interactive and batch processing
  • group specific software and storage
  • submit and control facilities from PAL, BIRD and group specific hosts
  • fair share load distribution and quota handling
  • integration of DESY wide batch resources
  • afs and kerberos support for authentication and resource access
  • runs on Sun Project N1 Grid Engine Version 6 (N1GE)

N1GE is a batching system which includes job control, manual pages, graphical user interface(GUI),  and line commands.  

List of some basic line commands:

  Setup the sge shell environment:

. /usr/sge/default/common/settings.sh # Bourne-Shell Users for AFS Fair Share Support source
. /usr/sge/default/common/settings.csh # C-Shell Users for AFS Fair Share Support 
. /usr/sge/c2/common/settings.sh # Bourne-Shell Users for non-AFS Experimental Support source 
. /usr/sge/c2/common/settings.csh # C-Shell Users for non-AFS Experimental Support 

  To show current status of submitted jobs:

 qstat # Show your active or waiting jobs 
 qstat -u '*' # Show all active or waiting jobs 
 qstat -j # Show special active or waiting job 
 qquota # Show your current quota usage 
 qquota -u '*' -P '*' -q '*' # Show quota usage for all users, projects and queues 
 qacct -j # Show special job accounting (when done) 
 qacct # Show your accounting information 

  To submit a job:

 qsub -cwd myjob -cwd: 
put output in current working directory, 
otherwise output is to the home directory myjob: shell script 

  To get information on the batch system:

 qstat -g c # Get infos on queues qhost # Get infos on hosts 

  Temporary Writable File Space:

 See Shell Variables $TMP and/or $TMPDIR 

  Start the GUI:

 qmon & 

Other Qick Hints:

Let yourself register to registry resource BATCH to get queue access ...

No names like: sge root atlas cms desy flc h1 hasylab herab it mpy unihh2 zeus opera operator batch
Commonly Used Options and Values
CLI
(Command Line Interface)
JCL
(Job Control Language)
Values
(Hints)
Comment
-l complex="value" #$ -l complex=value Requested resources for your job. Depends also on configuration. Most names also have a shortcut alias name. Regular expressions may be used for value specification. Use multiple -l on CLI or multiple lines in JCL for requesting diffent complex values. Use quotes on CLI.  
 
-l distro="Operating_System" #$ -l distro=Operating_System sld3, sld4, sld5 Alias is os  
 
-l arch="Architecture" #$ -l arch=Architecture x86(32bit), amd64(64bit) Alias is a  
 
-l h_rt="Available_Job_Time" #$ -l h_rt=Available_Job_Time Amount of wall time hh:mm:ss (in hours, minutes and seconds units). Use e.g. 72:00:00 for a 3 days job. Alias is h_rt.  
 
-l h_vmem="Available_Memory_Space" #$ -l h_vmem=Available_Memory_Space Amount of memory (in K, M or G units) with default=256M, since 2009 default=1G Alias is h_vmem.  
 
-l h_fsize="Available_Disk_Space" #$ -l h_fsize=Available_Disk_Space Amount of memory (in K, M or G units) with default=3G Alias is h_fsize.  
 
-l h_stack="Available_Stack_Size" #$ -l h_stack=Available_Stack_Size Amount of stack memory (in K, M or G units) with default=0 (unlimited), set h_stack=100M (if you have multithread issues e.g. with command lfc-ls) Alias is h_stack.  
 
Options and Values Under Test
-l idle=true #$ -l idle=true Request queuing to idle queue Alias is idle.  
 

Other Options:

#$ -P $GROUP Projektname für die Resourenabrechnung imFairsharebetrieb.
Im Normalfall wird das Default-Projekt der primären UNIX-Gruppeentsprechen
#$ -o %path% Dateiname für die Standardausgabe(stdout)
#$ -e %path% Dateiname für die Standardfehlerausgabe(stderr)
#$ -j y Join: Standardfehlerausgabe wird in Standardausgabegeschrieben
#$ -N jobname Default: Dateiname (qsub)
#$ -cwd Wechseln in das Verzeichnis, aus dem der Batchjobaufgerufen wurde
#$ -m ebas Mail nach Ende(e),Beginn(b),Abbruch(a) oder Suspend(s)des Batch-Jobs siehe auch: -M
#$ -M mailadresse Benachrichtigung über E-Mail.
Achtung: Um zustellbare Mails zu erzeugen werden unter bestimmten Bedingungen 
die Zustelladdressen modifiziert.Wenn die Zustelladdresse user@host.desy.de lautet, wird diese
durch user@mail.desy.de ersetzt. Dies ist bei Verzicht auf die Option -M der Normalfall. 
Wird die Addresse gesetzt auf Vorname.Nachname@Site oder user@Site oder user@host.other.site 
findet keine Veränderung der Addresse statt.#$ -v Variable=Wert Umgebungsvariablen setzen.
Kann notwendig sein, damit Umgebungs- variablen auch an parallele Prozesse 
vererbt werden.
#$ -w v Job überprüfen, aber nicht abschicken.


Short job control examples:

Source files and examples: 
- Use "show page source" for viewing examples (if not binary) 
- Click with right mouse button for download  

Non-interactive batch:

[my_submit_host] ~ qsub -cwd my_test_job.sh # Batch example 1 in working directory 
[my_submit_host] ~ qsub -l arch=x86 -l os=sld4 -cwd my_test_job.sh 
# Batch example 2 with architecture request

Non-interactive batch with MPICH parallel environments:

[my_submit_host] ~ qsub my_mpich2_job.sh # Distributed PE Batch example (Currently only os=sld5) 
[my_submit_host] ~ qsub my_mpich2-1_job.sh # Non-Distributed PE Batch example Source 
for MPI-Code: SL4/5 Binary MPI C Source Makefile Prerequisites for using mpich2: 
- Create a file ~/.mpd.conf for the mpd process daemon containinga 
password like: MPD_SECRETWORD=.... 
- Restrict file access of ~/.mpd.conf to youself only: chmod 600~/.mpd.conf 

Interactive Batch (Under test):

Before migration from protected subnet (Last hosts moved in November 2008): 
[my_host] ~ ssh bird # After login ... # If proper DISPLAY setting, free access 
to X11/Xserver port and SGE cell with AFS token provider:
[n1gexih] ~ /usr/sge/util/qterm # -l os=sld3 -l arch=amd64 # ... get an xterm 
(# on special architecture) # If free access to SGE selected ssh ports on exec hosts: 
[n1gexih] ~ /usr/sge/util/qshell # -l h_vmem=1G # ... or an interactive shell 
(# with lot's of memory) and ... 
[sgehost] ~ xterm # ... start xclient, if X11 ist needed and supported by ssh configuration 
After migration from protected subnet (will be better with sge version 6.2 and os=sld5): 
[my_submit_host] ~ /usr/sge/util/qterm # Get an xterm ...
[my_submit_host] ~ /usr/sge/util/qshell # -l h_vmem=1G # ... or an interactive shell 

Typical Job Environment:

GROUP=%Group% 
GROUPROOT=/etc/local/groups/%Group% 
HOST=%ExecHost%.desy.de 
HOSTNAME=%ExecHost%.desy.de
HOSTTYPE=i386-linux 
JOB_ID=%JobID% 
JOB_NAME=%JobName%
JOB_SCRIPT=/scratch/sge/default/spool/%ExecHost%/job_scripts/%JobID% 
LOGNAME=%User%
MANPATH=/usr/sue/man:/opt/products/man:/usr/share/man:/usr/afsws
/man:/usr/X11/man:/usr/local/X11/man:/usr/local/man:/usr/kerberos/man
QUEUE=%Queue% 
PATH=/scratch/sge/tmp/%JobID%.%Queue%:/usr/local/bin:/bin:/usr/bin
SGE_ACCOUNT=sge 
SGE_ARCH=lx24-x86 
SGE_BINARY_PATH=/usr/sge/bin/lx24-x86
SGE_CELL=default 
SGE_CWD_PATH=/afs/desy.de/user/&UserI%/%User%/prgs/sge 
SGE_JOB_SPOOL_DIR=/scratch/sge/default/spool/%ExecHost%/active_jobs/%JobID% 
SGE_O_HOME=/afs/desy.de/user/&UserI%/%User% 
SGE_O_HOST=%SubmitHost% 
SGE_O_LOGNAME=%User% 
SGE_O_MAIL=/var/mail/%User% 
SGE_O_PATH=/usr/sge/bin/lx24-x86:/afs/desy.de/user/&UserI%/%User%/bin.
Linux_RHEL:/afs/desy.de/user/&UserI%/%User%/bin:/etc/local
/groups/%Group%/bin:/etc/local/groups/%Group%/scripts:/usr/sue/bin:/opt/products/scripts:
/opt/products/bin:/bin:/usr/bin:/usr/bin/X11:/usr/local/X11/bin:
/usr/local/bin:/usr/kerberos/bin:/cern/pro/bin:.:/usr/afsws/etc 
SGE_O_SHELL=/bin/tcsh 
SGE_O_WORKDIR=/afs/desy.de/user/&UserI%/%User%/prgs/sge
SGE_ROOT=/usr/sge 
SGE_STDERR_PATH=/afs/desy.de/user/&UserI%/%User%/prgs/sge/sge_env.sh.e35 
SGE_STDIN_PATH=/dev/null 
SGE_STDOUT_PATH=/afs/desy.de/user/&UserI%/%User/prgs/sge/sge_env.sh.o35 
SHELL=/bin/zsh 
TMP=/scratch/sge/tmp/%JobID%.%Queue% 
TMPDIR=/scratch/sge/tmp/%JobID%.%Queue% 
USER=%User%  


Queues (until 8.9.2008):

Class Time Limit Comment
default.q 5 minutes default without project/group
short.q 1 day available, default
(h_rt < 24:00:00)
long.q 1 week available, long runner
(24:00:00 < h_rt < 168:00:00)
idle.q 3 weeks under evaluation, jobs will be suspended while other jobs request same host
(h_rt < 504:00:00 , s_rt < 503:00:00 [the job is first "warned" via the SIGUSR1 signal])
login.q 1 day under evaluation, useful for debugging, startup may be difficult (see advice)

Queues (since 8.9.2008):

Class Time Limit Slots Comment
default.q 3 hours 100 %
(@anyhost)
available as default
(h_rt < 3:00:00, h_vmem < 2G)
short.q 1 day appr. 85 %
(/platform)
(incl. 50 %
for long.q)
available for medium sized jobs
(h_rt < 24:00:00, h_vmem < 2G)
long.q 1 week available for long runner and high memory usage
(24:00:00 < h_rt < 168:00:00, h_vmem < 4G)
idle.q 3 weeks 1
(@anyhost)
under evaluation, jobs will be suspended while other jobs request same host
(h_rt < 504:00:00 , s_rt < 503:00:00 [the job is first "warned" via the SIGUSR1 signal])
login.q 1 day 4
(@somehost)
under evaluation, useful for debugging, startup may be difficult (see advice)

Using path aliases:

Problem:

If you are e.g. working on a submit host in directory /nfs/flc/lc3/pool/<myuserid>, which is an automount link to /pool/<myuserid> and if you then start a "qsub -cwd", your batch job will try to use a working directory /pool/<myuserid>, which does not exist on the batch host. The following config file would prepare the correct mapping for the above situation:

~/.sge_aliases

# src-path subm-host exec-host dest-path /pool lc3.desy.de 
* /nfs/flc/lc3/pool /pool lc4.desy.de 
* /nfs/flc/lc4/pool 

More Information

Sun documentation


Quota and Fairshare:

We use quota and fairshare settings to keep the batch cluster in a state where all resources can be shared in a fair manner. This includes that

Quota

There are people related quota settings insuring that a single user must not use more than 65 % of the cores of an os flavor at a time. Projects are limited to 75 % of a core set.

There are queue related quota settings allowing 100 % jobs in the default queue which ensures scheduling at least every 3 hours. The longer queues like long, short and long+short are limited to 65, 75 and 85 % respectively. The interactive login queue is limited to 50 %.

Fairshare

BIRD will be shared by all DESY users and IT provides service and hardware to this facility. On top of this other DESY groups have dedicated project specific hardware to the BIRD cluster. This way, valuable resources have been made available. The contributing project will be granted fair share points: 10 for each compute core, 10 for each 2 GByte memory and 1 for each GByte disk. These points will garantee access to this relative amount of batch resources to the project members. Because IT gives his own fairshare to the community and as batch is not continous over time this is a win-win situation for all, even for those without share points who are allowed to use the idle times and/or idle resources. Unused project shares even enhance your job priorities for the future weeks, so typically a project may use more resources than it could do in a stand-alone facility of it's own.

Every user has additional 100 functional points to priorize his own jobs. Users who belong to more than one project should carefully set the job's project membership.

More Information

Sun documentation


Manuals and other Information Sources:

BIRD Flyer (from UCO):

Guides Current Version:

Administration(pdf): admin.pdf
User Guide(pdf): user_guide.pdf

Guides Next Version:

Starting.pdf
Planning.pdf
Upgrading.pdf
Installing.pdf
Administrating.pdf
Using.pdf
 


Info: Sun Source: LINK

N1 Grid Engine 6 Collection


Zeuthen Batch User Information


Contact:

Address Purpose Comment
bird . service @ desy.de Operational Issues in HH Generate an offical Ticket in RT system
sge - global @ desy.de Discussion and Questions on Design and Development  
sge - users @ desy.de User Notifications and Support HH Subscription is Part of Registry Resource