Connecting to OrangeGrid

The Campus OrangeGrid Pool is a collection of compute resources available free of charge to SU researchers.

The compute power for the Syracuse University Campus OrangeGrid Pool comes from unused CPU cycles from desktop machines across the campus. When a desktop machine is idle and a job is queued, OrangeGrid (through a specialized workload management system for compute-intensive jobs called HTCondor) launches a Linux virtual machine which is available to run your compute jobs. To connect to the Campus Condor Pool, connect to your virtual host (zzzzzz.zzz.syr.edu) in the usual way and then ssh into the head node of the pool by typing the following code (replacing netid with your Syracuse NetID):

ssh netid@its-og-login1.syr.edu

OR

ssh netid@its-og-login2.syr.edu

The email you received when your account was created will indicate which one to use.

However, if using other environments such as smatter, please use this command:

ssh netid@smatter-condor-submit2.phy.syr.edu

The most widely used Condor commands are:

  • condor_status  –  To check the state of the machines available in the pool.
  • condor_userprio  –  To see who is using the pool.
  • condor_q  –  To see the status of jobs are queued in the pool.
  • condor_submit  –  To submit a job to the pool.
  • condor_rm  –  To remove a submitted job.
  • man condor_submit  –  Will give you all the options for submitting a job and creating job submission files.

For much more information on the options available with these commands, you can view the Condor manual or use the UNIX man command to get help on a particular command.

Using the OrangeGrid Pool

Once you are connected, you can view the current state of the OrangeGrid pool by typing:

condor_status  –  This is likely to scroll the screen.  So to see things more clearly, pipe the output to less using: condor_status | less

Press the space bar to see the next screen and q to exit less when you are done. At the end of the output, you will see a report showing the total number of machines, for example:

Condor Status

The number of machines varies throughout the day, and is highest at nights and on weekends when the machines are idle. The example above shows that there are currently 2545 CPUs available to run work, of which 2494 are claimed by another user and 51 are unclaimed. Don’t worry that there seems to be a low number of unclaimed machines. Condor implements a fair-share use policy which means that you can claim some of the already claimed machines for yourself.

condor_userprio -all  –  Shows who is using the Condor pool

You will see output that looks like: Condor Priority

Note that the state of the OrangeGrid pool changes from day to day, so you are unlikely to see the same output as above. There may be other users running jobs, or the users listed above may no longer be using the pool and so their names will not appear.

The column Res Used tells you how many CPUs a particular user is consuming.

The column Effective Priority tells you how much Condor will favor a given user when scheduling jobs. Lower numbers mean higher priority, so the user yffily has a higher priority than essedore.  If yffily submits more jobs, they will take CPU away from essedore until their priorities are equal. You can see that the effective priority is almost the same as the resources used, which means that the Condor pool tries to equalize the number of CPU cores that each user has.

So why does the user yffily have less CPU cores than the user essedore? So far we have only looked at the CPUs available and used. To answer this question, we need to look at the job queue. The state of the Condor pool may be different when you try this, so the output given below is just an example.

To see the jobs that yffily has sent to the pool, type:
condor_q yffily

Again, you may want to pipe the output to less if it very long:
condor_q yffily | less

In the output of condor_q, the frst column ID is a unique identification number assigned by condor to a job. The OWNER column gives the user name of the job owner. The column labeled ST gives the job state.

The states that you will normally encounter are:

  • I for idle. Your jobs is in Condor’s job queue, but is not currently running.
  • R for running. Your job has been assigned to a CPU and is currently executing.
  • H for held. There is a problem with your job that requires manual intervention.
  • C for completed. Your jobs is fnished and is ready to be removed from the queue.
  • X for exiting.

To investigate another user, replace yffily with a diferent user name (for example your own). If you run condor_q without a user name it will show the job queue for all users. In the example above, the user vffily has less jobs queued that essedore, and so he is using less machines from the pool.

Looking at the last line of condor_q for each user we see 452 jobs; 2 idle, 450 running, 0 held.  For user yffily and we see 3647 jobs; 962 idle, 2685 running, 0 held for user essedore.