This distributed computing system, comprising some 12,000 cores, is used by SU faculty and researchers, particularly in the physical sciences and engineering, who need reliable, high throughput computing (HTC). The computers in the grid are optimized to perform a large number of smaller parallel jobs (typically less than 24 hours), providing high processing capacity over long periods of time.
OrangeGrid is supported by ITS, Syracuse’s central IT group, and offered to researchers without cost. The grid is unique in that approximately 20% of the nodes on the network are upgraded annually as part of regular campus desktop replacement cycles, providing a notable increase in processing and memory capacity each year.
These components are distributed to desktop clients via Microsoft’s Active Directory. HTCondor, developed with support from the National Science Foundation, manages the grid’s workload. The computer’s task scheduler detects when its host computer is idle, starts up CVMC, and connects to HTCondor to receive work. When user activity is detected on the computer, research operations are immediately stopped. The use of virtualization acts as a barrier which separates the researcher and their content from the user’s information on the same computer.
Header Image Credit:
Barrett Lyon / The Opte Project
Visualization of the routing paths of the Internet.
This guide provides enough guidance to submit and observe the successful completion of a first job.
A very deep list top topics in the HTCondor Flightworthy Development Wiki.
View and subscribe to various HTCondor email lists.
Topics include these and more: Matchmaking and ClassAds, Workflow and DAGMan, Resource Management, Checkpointing
Article from ACI-REF – Advanced CyberInfrastructure – Research and Education Facilitators
HTCondor Tutorials and Video Tutorials from past HTCondor weeks.
Presentations from the University of Wisconsin, Madison, Wisconsin, May 17–20, 2016.
To illustrate running a job under Condor, you will use a Monte Carlo method for computing π This is an example of an embarrassingly parallel problem that can easily be run on the Campus Condor Pool. [latexpage]To illustrate running a job under Condor, you will use a...
To see what happens when a job fails, let us deliberately break the script hello.sh First edit the Condor submit file hello.sub and change the number of jobs submitted in a cluster back to one: queue 1 Next, edit the file hello.sh and change the line...
So far, we have only submitted a single job. The power of Condor is in making it simple to submit many jobs in one go. Edit the le hello.sub and change the last line of the file to queue 20 and re-submit the job with condor_submit hello.sub Now Condor...