News

Spring 2017 Research Computing Colloquy Archive

2017 colloquy speakersAbout Research Computing Colloquies

These sessions will explore how computing resources help researchers take on new and greater computational tasks, enhance research productivity, increase the competitiveness of grant submissions, and advance scientific discovery across many disciplines. 

Continue Reading

Gaussian

GaussianGaussian is used by chemists, chemical engineers, biochemists, physicists and other scientists worldwide. Starting from the fundamental laws of quantum mechanics, Gaussian predicts the energies, molecular structures, vibrational frequencies and molecular properties of molecules and reactions in a wide variety of chemical environments. Continue Reading

TrinityRNASeq

TrinityRNASeqTrinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Continue Reading

Trimmomatic

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data. The selection of trimming steps and their associated parameters are supplied on the command line.  It works with FASTQ (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used), either uncompressed or gzipp’ed FASTQ. Continue Reading

Samtools

Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories:

  • Samtools – Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
  • BCFtools – Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
  • HTSlib – A C library for reading/writing high-throughput sequencing data

Continue Reading

Proteinortho

Proteinortho is a tool to detect orthologous genes within different species. For doing so, it compares similarities of given gene sequences and clusters them to find significant groups. The algorithm was designed to handle large-scale data and can be applied to hundreds of species at one. Details can be found in Lechner et al., BMC Bioinformatics. 2011 Apr 28;12:124 (http://www.biomedcentral.com/1471-2105/12/124). To enhance the prediction accuracy, the relative order of genes (synteny) can be used as additional feature for the discrimination of orthologs. The corresponding extension, namely PoFF (manuscript in preparation), is already build in Proteinortho. Continue Reading

OrthoDB

Orthology is the cornerstone of comparative genomics and gene function prediction. OrthoDB aims to classify protein-coding genes from the increasing number of available sequenced genomes into groups of orthologs descended from a single gene of the last common ancestor (LCA) of each clade of species. Applying this concept to the hierarchy of LCAs along the species phylogeny results in multiple levels of orthology: the more closely-related the species, the more finely-resolved the orthologous relations.

Software Details

Version: 1.6 – Visit their website and view the support site

License: GNU General Public License

Application: Bioinformatics

Platform: Linux-64

Citation:

OrthoDB v8: update of the hierarchical catalog of orthologs and the underlying free software. EV Kriventseva, F Tegenfeldt, TJ Petty, RM Waterhouse, FA Simao, IA Pozdnyakov, P Ioannidis, and EM Zdobnov NAR, Jan 2015, PMID:23180791

If you wish to use a different version of a program, or to request an upgrade to a newer version of existing software, please submit your request here.

HTSlib

Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories:

  • Samtools – Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
  • BCFtools – Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
  • HTSlib – A C library for reading/writing high-throughput sequencing data

Samtools and BCFtools both use HTSlib internally, but these source packages contain their own copies of HSTlib so they can be built independently.

Software Details

Visit their website and view the support site

License: MIT/Expat License

Application: Bioinformatics

Platform: Linux-64

Citation: Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]

* Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93. Epub 2011 Sep 8. [PMID: 21903627]

* Danecek P., Schiffels S., Durbin R. Multiallelic calling model in bcftools (-m), samtools.github.io/bcftools/call-m.pdf

* Li H. Improving SNP discovery by base alignment quality. Bioinformatics. 2011 Apr 15;27(8):1157-8. doi: 10.1093/bioinformatics/btr076. Epub 2011 Feb 13. [PMID: 21320865]

* Durbin R. Segregation based metric for variant call QC, samtools.github.io/bcftools/rd-SegBias.pdf

* Li H, Mathematical Notes on SAMtools Algorithms, www.broadinstitute.org/gatk/media/docs/Samtools.pdf

GAMESS-US

GAMESS is a program for ab initio molecular quantum chemistry. Briefly, GAMESS can compute SCF wavefunctions ranging from RHF, ROHF, UHF, GVB, and MCSCF. Correlation corrections to these SCF wavefunctions include Configuration Interaction, second order perturbation Theory, and Coupled-Cluster approaches, as well as the Density Functional Theory approximation. Excited states can be computed by CI, EOM, or TD-DFT procedures. Nuclear gradients are available, for automatic geometry optimization, transition state searches, or reaction path following.

Computation of the energy hessian permits prediction of vibrational frequencies, with IR or Raman intensities. Solvent effects may be modeled by the discrete Effective Fragment potentials, or continuum models such as the Polarizable Continuum Model. Numerous relativistic computations are available, including infinite order two component scalar relativity corrections, with various spin-orbit coupling options. The Fragment Molecular Orbital method permits use of many of these sophisticated treatments to be used on very large systems, by dividing the computation into small fragments. Nuclear wavefunctions can also be computed, in VSCF, or with explicit treatment of nuclear orbitals by the NEO code.

Software Details

Visit their website and view the support site

License: User License Agreement from ISUQCG (http://www.msg.chem.iastate.edu/)

Application: Quantum chemistry, Molecular mechanics

Platform: Linux-64

Citation: General Atomic and Molecular Electronic Structure System” M.W.Schmidt, K.K.Baldridge, J.A.Boatz, S.T.Elbert, M.S.Gordon, J.H.Jensen, S.Koseki, N.Matsunaga, K.A.Nguyen, S.Su, T.L.Windus, M.Dupuis, J.A.Montgomery J. Comput. Chem., 14, 1347-1363(1993).

Advances in electronic structure theory: GAMESS a decade later” M.S.Gordon, M.W.Schmidt pp. 1167-1189, in “Theory and Applications of Computational Chemistry: the first forty years” C.E.Dykstra, G.Frenking, K.S.Kim, G.E.Scuseria (editors), Elsevier, Amsterdam, 2005.

FASTX-Toolkit

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.  Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).  The main processing of such FASTA/FASTQ files is mapping (aka aligning) the sequences to reference genomes or other databases using specialized programs. Example of such mapping programs are:

  • Blat
  • SHRiMP
  • LastZ
  • MAQ and many many others

However, It is sometimes more productive to preprocess the FASTA/FASTQ files before mapping the sequences to the genome – manipulating the sequences to produce better mapping results. The FASTX-Toolkit tools perform some of these preprocessing tasks.

Software Details

Visit their website and view the support site

License: Affero GPL (AGPL) version 3

Application: Bioinformatics

Platform: Linux-64

Citation: n/a