When entering the University’s datacenter, it’s natural to wonder about the seemingly infinite sources of information being computed there. Could these systems be performing cutting-edge thermodynamic modeling? Or, perhaps are they helping to unravel some of the mysteries of quantum chemistry? That said, would your curiosity would change if the source of this data were closer to home? In fact, it’s much closer to home: the US Census Bureau or the Social Security Administration.
News
Spring 2017 Research Computing Colloquy Archive
About Research Computing Colloquies
These sessions will explore how computing resources help researchers take on new and greater computational tasks, enhance research productivity, increase the competitiveness of grant submissions, and advance scientific discovery across many disciplines.
Gaussian
Gaussian is used by chemists, chemical engineers, biochemists, physicists and other scientists worldwide. Starting from the fundamental laws of quantum mechanics, Gaussian predicts the energies, molecular structures, vibrational frequencies and molecular properties of molecules and reactions in a wide variety of chemical environments. Continue Reading
TrinityRNASeq
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Continue Reading
Trimmomatic
Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data. The selection of trimming steps and their associated parameters are supplied on the command line. It works with FASTQ (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used), either uncompressed or gzipp’ed FASTQ. Continue Reading
Samtools
Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories:
- Samtools – Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
- BCFtools – Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
- HTSlib – A C library for reading/writing high-throughput sequencing data
Proteinortho
Proteinortho is a tool to detect orthologous genes within different species. For doing so, it compares similarities of given gene sequences and clusters them to find significant groups. The algorithm was designed to handle large-scale data and can be applied to hundreds of species at one. Details can be found in Lechner et al., BMC Bioinformatics. 2011 Apr 28;12:124 (http://www.biomedcentral.com/1471-2105/12/124). To enhance the prediction accuracy, the relative order of genes (synteny) can be used as additional feature for the discrimination of orthologs. The corresponding extension, namely PoFF (manuscript in preparation), is already build in Proteinortho. Continue Reading
OrthoDB
Orthology is the cornerstone of comparative genomics and gene function prediction. OrthoDB aims to classify protein-coding genes from the increasing number of available sequenced genomes into groups of orthologs descended from a single gene of the last common ancestor (LCA) of each clade of species. Applying this concept to the hierarchy of LCAs along the species phylogeny results in multiple levels of orthology: the more closely-related the species, the more finely-resolved the orthologous relations.
HTSlib
Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories:
- Samtools – Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
- BCFtools – Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
- HTSlib – A C library for reading/writing high-throughput sequencing data
Samtools and BCFtools both use HTSlib internally, but these source packages contain their own copies of HSTlib so they can be built independently.
Software Details
Visit their website and view the support site
License: MIT/Expat License
Application: Bioinformatics
Platform: Linux-64
Citation: Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]
* Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93. Epub 2011 Sep 8. [PMID: 21903627]
* Danecek P., Schiffels S., Durbin R. Multiallelic calling model in bcftools (-m), samtools.github.io/bcftools/call-m.pdf
* Li H. Improving SNP discovery by base alignment quality. Bioinformatics. 2011 Apr 15;27(8):1157-8. doi: 10.1093/bioinformatics/btr076. Epub 2011 Feb 13. [PMID: 21320865]
* Durbin R. Segregation based metric for variant call QC, samtools.github.io/bcftools/rd-SegBias.pdf
* Li H, Mathematical Notes on SAMtools Algorithms, www.broadinstitute.org/gatk/media/docs/Samtools.pdf
GAMESS-US
GAMESS is a program for ab initio molecular quantum chemistry. Briefly, GAMESS can compute SCF wavefunctions ranging from RHF, ROHF, UHF, GVB, and MCSCF. Correlation corrections to these SCF wavefunctions include Configuration Interaction, second order perturbation Theory, and Coupled-Cluster approaches, as well as the Density Functional Theory approximation. Excited states can be computed by CI, EOM, or TD-DFT procedures. Nuclear gradients are available, for automatic geometry optimization, transition state searches, or reaction path following.
Computation of the energy hessian permits prediction of vibrational frequencies, with IR or Raman intensities. Solvent effects may be modeled by the discrete Effective Fragment potentials, or continuum models such as the Polarizable Continuum Model. Numerous relativistic computations are available, including infinite order two component scalar relativity corrections, with various spin-orbit coupling options. The Fragment Molecular Orbital method permits use of many of these sophisticated treatments to be used on very large systems, by dividing the computation into small fragments. Nuclear wavefunctions can also be computed, in VSCF, or with explicit treatment of nuclear orbitals by the NEO code.
Software Details
Visit their website and view the support site
License: User License Agreement from ISUQCG (http://www.msg.chem.iastate.edu/)
Application: Quantum chemistry, Molecular mechanics
Platform: Linux-64
Citation: General Atomic and Molecular Electronic Structure System” M.W.Schmidt, K.K.Baldridge, J.A.Boatz, S.T.Elbert, M.S.Gordon, J.H.Jensen, S.Koseki, N.Matsunaga, K.A.Nguyen, S.Su, T.L.Windus, M.Dupuis, J.A.Montgomery J. Comput. Chem., 14, 1347-1363(1993).
Advances in electronic structure theory: GAMESS a decade later” M.S.Gordon, M.W.Schmidt pp. 1167-1189, in “Theory and Applications of Computational Chemistry: the first forty years” C.E.Dykstra, G.Frenking, K.S.Kim, G.E.Scuseria (editors), Elsevier, Amsterdam, 2005.