Wednesday, February 18, 2009

How to install Sun Grid Engine 6.2

Sun Grid Engine, Enterprise Edition softwate provides advanced resource management and policy administration for UNIX environments that are composed of multiple shared resources.
Sun Grid Engine provides the user with the means to submit computationally demanding tasks to the Sun Grid Engine, Enterprise Edition system for transparent distribution of the associated workload. The user can submit batch jobs, interactive jobs, and parallel jobs to the Sun Grid Engine.
The Sun Grid Engine accepts jobs—users’ requests for computer resources—from the outside world, puts them in a holding area until they can be executed, sends them from the holding area to an execution device, manages them during execution, and logs the record of their execution when they are finished.

Four types of hosts are fundamental to the Sun Grid Engine.
1. Master
2. Execution
3. Administration
4. Submit
First two are installed and later two are added to configuration.

To Install master host.
[root@rhel1 6.2.u5]# ./inst_sge -m
Do you agree with that license? (y/n) [n] >> y
Grid Engine admin user account -- sgeadmin
Do you want to install Grid Engine as admin user >sgeadmin< (y/n) [y] >>

Installing Grid Engine as admin user >sgeadmin< Hit to continue >>

Checking $SGE_ROOT directory
----------------------------

The Grid Engine root directory is:

$SGE_ROOT = /common/sge/6.2.u5

If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit
to use default [/common/sge/6.2.u5] >>

Your $SGE_ROOT directory: /common/sge/6.2.u5

Hit to continue >>

Grid Engine TCP/IP communication service
----------------------------------------

The port for sge_qmaster is currently set as service.

sge_qmaster service set to port 536

Now you have the possibility to set/change the communication ports by using the
>shell environment< or you may configure it via a network service, configured in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form sge_qmaster /tcp

to your services database and make sure to use an unused port number.

How do you want to configure the Grid Engine communication ports?

Using the >shell environment<: [1] Using a network service like >/etc/service<, >NIS/NIS+<: [2] (default: 2) >>

Grid Engine TCP/IP service >sge_qmaster< ---------------------------------------- Using the service sge_qmaster for communication with Grid Engine. Hit to continue >>

Grid Engine TCP/IP communication service
----------------------------------------

The port for sge_execd is currently set as service.

sge_execd service set to port 537

Now you have the possibility to set/change the communication ports by using the
>shell environment< or you may configure it via a network service, configured in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form sge_execd /tcp

to your services database and make sure to use an unused port number.

How do you want to configure the Grid Engine communication ports?

Using the >shell environment<: [1] Using a network service like >/etc/service<, >NIS/NIS+<: [2] (default: 2) >>

Grid Engine TCP/IP communication service
-----------------------------------------

Using the service

sge_execd

for communication with Grid Engine.

Hit to continue >>

Grid Engine cells
-----------------

Grid Engine supports multiple cells.

If you are not planning to run multiple Grid Engine clusters or if you don't
know yet what is a Grid Engine cell it is safe to keep the default cell name

default

If you want to install multiple cells you can enter a cell name now.

The environment variable

$SGE_CELL=

will be set for all further Grid Engine commands.

Enter cell name [default] >> Rahway_SciComp

Using cell >Rahway_SciComp<. Hit to continue >>

Unique cluster name
-------------------

The cluster name uniquely identifies a specific Sun Grid Engine cluster.
The cluster name must be unique throughout your organization. The name
is not related to the SGE cell.

The cluster name must start with a letter ([A-Za-z]), followed by letters,
digits ([0-9]), dashes (-) or underscores (_).

Enter new cluster name or hit
to use default [p536] >> usrylxap1
creating directory: /common/sge/6.2.u5/Rahway_SciComp/common

Your $SGE_CLUSTER_NAME: usrylxap1

Hit to continue >>

Grid Engine qmaster spool directory
-----------------------------------

The qmaster spool directory is the place where the qmaster daemon stores
the configuration and the state of the queuing system.

The admin user >sgeadmin< must have read/write access to the qmaster spool directory. If you will install shadow master hosts or if you want to be able to start the qmaster daemon on other hosts (see the corresponding section in the Grid Engine Installation and Administration Manual for details) the account on the shadow master hosts also needs read/write access to this directory. Enter a qmaster spool directory [/common/sge/6.2.u5/Rahway_SciComp/spool/qmaster] >>

Using qmaster spool directory >/common/sge/6.2.u5/Rahway_SciComp/spool/qmaster<. Hit to continue >>
Windows Execution Host Support
------------------------------

Are you going to install Windows Execution Hosts? (y/n) [n] >> n

Verifying and setting file permissions
--------------------------------------

Did you install this version with >pkgadd< or did you already verify and set the file permissions of your distribution (enter: y) (y/n) [y] >>

We do not verify file permissions. Hit to continue >>

Select default Grid Engine hostname resolving method
----------------------------------------------------

Are all hosts of your cluster in one DNS domain? If this is
the case the hostnames

>hostA< and >hostA.foo.com< would be treated as equal, because the DNS domain name >foo.com< is ignored when comparing hostnames. Are all hosts of your cluster in a single DNS domain (y/n) [y] >>

Ignoring domain name when comparing hostnames.

Hit to continue >>

Grid Engine JMX MBean server
----------------------------

In order to use the SGE Inspect or the Service Domain Manager (SDM)
SGE adapter you need to configure a JMX server in qmaster. Qmaster
will then load a Java Virtual Machine through a shared library.
NOTE: Java 1.5 or later is required for the JMX MBean server.

Do you want to enable the JMX MBean server (y/n) [y] >> n


Making directories
------------------

creating directory: /common/sge/6.2.u5/Rahway_SciComp/spool/qmaster
creating directory: /common/sge/6.2.u5/Rahway_SciComp/spool/qmaster/job_scripts
Hit to continue >>

Setup spooling
--------------
Your SGE binaries are compiled to link the spooling libraries
during runtime (dynamically). So you can choose between Berkeley DB
spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>

The Berkeley DB spooling method provides two configurations!

Local spooling:
The Berkeley DB spools into a local directory on this host (qmaster host)
This setup is faster, but you can't setup a shadow master host

Berkeley DB Spooling Server:
If you want to setup a shadow master host, you need to use
Berkeley DB Spooling Server!
In this case you have to choose a host with a configured RPC service.
The qmaster host connects via RPC to the Berkeley DB. This setup is more
failsafe, but results in a clear potential security hole. RPC communication
(as used by Berkeley DB) can be easily compromised. Please only use this
alternative if your site is secure or if you are not concerned about
security. Check the installation guide for further advice on how to achieve
failsafety without compromising security.
Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> n


Hit to continue >>

Berkeley Database spooling parameters
-------------------------------------

Please enter the database directory now, even if you want to spool locally,
it is necessary to enter this database directory.

Default: [/common/sge/6.2.u5/Rahway_SciComp/spool/spooldb] >> /home/sge
The spooling directory already exists! Do you want to delete it? [n] >> y


creating directory: /home/sge
Dumping bootstrapping information
Initializing spooling database

Hit to continue >>

Grid Engine group id range
--------------------------

When jobs are started under the control of Grid Engine an additional group id
is set on platforms which do not support jobs. This is done to provide maximum
control for Grid Engine jobs.

This additional UNIX group id range must be unused group id's in your system.
Each job will be assigned a unique id during the time it is running.
Therefore you need to provide a range of id's which will be assigned
dynamically for jobs.

The range must be big enough to provide enough numbers for the maximum number
of Grid Engine jobs running at a single moment on a single host. E.g. a range
like >20000-20100< means, that Grid Engine will use the group ids from 20000-20100 and provides a range for 100 Grid Engine jobs at the same time on a single host. You can change at any time the group id range in your cluster configuration. Please enter a range [20000-20100] >>

Using >20000-20100< as gid range. Hit to continue >>

Grid Engine cluster configuration
---------------------------------

Please give the basic configuration parameters of your Grid Engine
installation:



The pathname of the spool directory of the execution hosts. User >sgeadmin< must have the right to create this directory and to write into it. Default: [/common/sge/6.2.u5/Rahway_SciComp/spool] >>

Grid Engine cluster configuration (continued)
---------------------------------------------



The email address of the administrator to whom problem reports are sent.

It is recommended to configure this parameter. You may use >none< if you do not wish to receive administrator mail. Please enter an email address in the form >user@foo.com<. Default: [none] >> abc@xyz.com

The following parameters for the cluster configuration were configured:

execd_spool_dir /common/sge/6.2.u5/Rahway_SciComp/spool
administrator_mail abc@xyz.com
Do you want to change the configuration parameters (y/n) [n] >>

Creating local configuration
----------------------------
Creating >act_qmaster< file Adding default complex attributes Adding default parallel environments (PE) Adding SGE default usersets Adding >sge_aliases< path aliases file Adding >qtask< qtcsh sample default request file Adding >sge_request< default submit options file Creating >sgemaster< script Creating >sgeexecd< script Creating settings files for >.profile/.cshrc< Hit to continue >>

qmaster startup script
----------------------

We can install the startup script that will
start qmaster at machine boot (y/n) [y] >> y

cp /common/sge/6.2.u5/Rahway_SciComp/common/sgemaster /etc/init.d/sgemaster.usrylxap1
/usr/lib/lsb/install_initd /etc/init.d/sgemaster.usrylxap1

Hit to continue >>

Grid Engine qmaster startup
---------------------------

Starting qmaster daemon. Please wait ...
starting sge_qmaster
Hit to continue >>

Adding Grid Engine hosts
------------------------

Please now add the list of hosts, where you will later install your execution
daemons. These hosts will be also added as valid submit hosts.

Please enter a blank separated list of your execution hosts. You may
press if the line is getting too long. Once you are finished
simply press without entering a name.

You also may prepare a file with the hostnames of the machines where you plan
to install Grid Engine. This may be convenient if you are installing Grid
Engine on many hosts.

Do you want to use a file which contains the list of hosts (y/n) [n] >> n

Adding admin and submit hosts
-----------------------------

Please enter a blank seperated list of hosts.

Stop by entering . You may repeat this step until you are
entering an empty list. You will see messages from Grid Engine
when the hosts are added.

Host(s): abc123.xyz.com abc124.xyz.com abc125.xyz.com abc126.xyz.com

Above hosts are added as administrative and submit hosts.
Do you want to add your shadow host(s) now? (y/n) [y] >> n

Creating the default queue and hostgroup
-----------------------------------------------------------

root@abc123.xyz.com added "@allhosts" to host group list
root@abc123.xyz.com added "all.q" to cluster queue list

Hit to continue >>
Scheduler Tuning
----------------

The details on the different options are described in the manual.

Configurations
--------------
1) Normal
Fixed interval scheduling, report limited scheduling information,
actual + assumed load

2) High
Fixed interval scheduling, report limited scheduling information,
actual load

3) Max
Immediate Scheduling, report no scheduling information,
actual load

Enter the number of your preferred configuration and hit !
Default configuration is [1] >>

We're configuring the scheduler with >Normal< settings! Do you agree? (y/n) [y] >> y

changed scheduler configuration

Using Grid Engine
-----------------

You should now enter the command:

source /common/sge/6.2.u5/Rahway_SciComp/common/settings.csh

if you are a csh/tcsh user or

# . /common/sge/6.2.u5/Rahway_SciComp/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

- $SGE_ROOT (always necessary)
- $SGE_CELL (if you are using a cell other than >default<) - $SGE_CLUSTER_NAME (always necessary) - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<) - $SGE_EXECD_PORT (if you haven't added the service >sge_execd<) - $PATH/$path (to find the Grid Engine binaries) - $MANPATH (to access the manual pages) Hit to see where Grid Engine logs messages >>

Grid Engine messages
--------------------

Grid Engine messages can be found at:

/tmp/qmaster_messages (during qmaster startup)
/tmp/execd_messages (during execution daemon startup)

After startup the daemons log their messages in their spool directories.

Qmaster: /common/sge/6.2.u5/Rahway_SciComp/spool/qmaster/messages
Exec daemon: //messages


Grid Engine startup scripts
---------------------------

Grid Engine startup scripts can be found at:

/common/sge/6.2.u5/Rahway_SciComp/common/sgemaster (qmaster)
/common/sge/6.2.u5/Rahway_SciComp/common/sgeexecd (execd)

Do you want to see previous screen about using Grid Engine again (y/n) [n] >>

Your Grid Engine qmaster installation is now completed
------------------------------------------------------
lease now login to all hosts where you want to run an execution daemon
and start the execution host installation procedure.

If you want to run an execution daemon on this host, please do not forget
to make the execution host installation in this host as well.

All execution hosts must be administrative hosts during the installation.
All hosts which you added to the list of administrative hosts during this
installation procedure can now be installed.

You may verify your administrative hosts with the command

# qconf -sh

and you may add new administrative hosts with the command

# qconf -ah

Please hit >>
sge_qmaster successfully installed!

Execution host installation

[root@abc124.xyz.com.u5]# ./inst_sge -x

Welcome to the Grid Engine execution host installation
------------------------------------------------------

If you haven't installed the Grid Engine qmaster host yet, you must execute
this step (with >install_qmaster<) prior the execution host installation. For a sucessfull installation you need a running Grid Engine qmaster. It is also neccesary that this host is an administrative host. You can verify your current list of administrative hosts with the command: # qconf -sh You can add an administrative host with the command: # qconf -ah

The execution host installation will take approximately 5 minutes.

Hit to continue >>

Checking $SGE_ROOT directory
----------------------------

The Grid Engine root directory is:

$SGE_ROOT = /common/sge/6.2.u5

If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit
to use default [/common/sge/6.2.u5] >>

Your $SGE_ROOT directory: /common/sge/6.2.u5

Hit to continue >>

Grid Engine cells
-----------------

Please enter cell name which you used for the qmaster
installation or press to use [Rahway_SciComp] >>

Using cell: >Rahway_SciComp< Hit to continue >>

Grid Engine TCP/IP communication service
----------------------------------------

The port for sge_execd is currently set as service.

sge_execd service set to port 537

Hit to continue >>

Checking hostname resolving
---------------------------

This hostname is known at qmaster as an administrative host.

Hit to continue >>

Execd spool directory configuration
-----------------------------------

You defined a global spool directory when you installed the master host.
You can use that directory for spooling jobs from this execution host
or you can define a different spool directory for this execution host.

ATTENTION: For most operating systems, the spool directory does not have to
be located on a local disk. The spool directory can be located on a
network-accessible drive. However, using a local spool directory provides
better performance.

FOR WINDOWS USERS: On Windows systems, the spool directory MUST be located
on a local disk. If you install an execution daemon on a Windows system
without a local spool directory, the execution host is unusable.

The spool directory is currently set to:
<>

Do you want to configure a different spool directory
for this host (y/n) [n] >>

Creating local configuration
----------------------------
sgeadmin@abc124.xyz.com modified "abc124.xyz.com" in configuration list
Local configuration for host >abc124.xyz.com< created. Hit to continue >>

execd startup script
--------------------

We can install the startup script that will
start execd at machine boot (y/n) [y] >>

cp /common/sge/6.2.u5/Rahway_SciComp/common/sgeexecd /etc/init.d/sgeexecd.usrylxap1
/usr/lib/lsb/install_initd /etc/init.d/sgeexecd.usrylxap1

Hit to continue >>

Grid Engine execution daemon startup
------------------------------------

Starting execution daemon. Please wait ...
starting sge_execd

Hit to continue >>

Adding a queue for this host
----------------------------

We can now add a queue instance for this host:

- it is added to the >allhosts< hostgroup - the queue provides 4 slot(s) for jobs in all queues referencing the >allhosts< hostgroup You do not need to add this host now, but before running jobs on this host it must be added to at least one queue. Do you want to add a default queue instance for this host (y/n) [y] >>

root@abc124.xyz.com modified "@allhosts" in host group list
root@abc124.xyz.com modified "all.q" in cluster queue list

Hit to continue >>

Using Grid Engine
-----------------

You should now enter the command:

source /common/sge/6.2.u5/Rahway_SciComp/common/settings.csh

if you are a csh/tcsh user or

# . /common/sge/6.2.u5/Rahway_SciComp/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

- $SGE_ROOT (always necessary)
- $SGE_CELL (if you are using a cell other than >default<) - $SGE_CLUSTER_NAME (always necessary) - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<) - $SGE_EXECD_PORT (if you haven't added the service >sge_execd<) - $PATH/$path (to find the Grid Engine binaries) - $MANPATH (to access the manual pages) Hit to see where Grid Engine logs messages >>

Grid Engine messages
--------------------

Grid Engine messages can be found at:

/tmp/qmaster_messages (during qmaster startup)
/tmp/execd_messages (during execution daemon startup)

After startup the daemons log their messages in their spool directories.

Qmaster: /common/sge/6.2.u5/Rahway_SciComp/spool/qmaster/messages
Exec daemon: //messages


Grid Engine startup scripts
---------------------------

Grid Engine startup scripts can be found at:

/common/sge/6.2.u5/Rahway_SciComp/common/sgemaster (qmaster)
/common/sge/6.2.u5/Rahway_SciComp/common/sgeexecd (execd)

Do you want to see previous screen about using Grid Engine again (y/n) [n] >>

Your execution daemon installation is now completed.

No comments:

Post a Comment