HOCOMOCO: a Collection of Transcription Factor Binding Models for Human and Mouse Based on ChIP-seq Data and its Application in Genetics and Systems Biology
HOCOMOCO (HOmo sapiens COmprehensive MOdel Collection) is a human curated collection of position weight matrix (PWM) models for binding sites for 680 human and 453 mouse TFs.
HOCOMOCO (HOmo sapiens COmprehensive MOdel Collection) is a human curated collection of position weight matrix (PWM) models for binding sites for 680 human and 453 mouse TFs. HOCOMOCO is mostly based on the ChIP-seq data, that appear to be most informative on the specificities of TF binding in vivo. We used five thousand of ChIP-Seq experiments as the raw data, the experimental datasets were taken from the GTRD database where there were uniformly processed within the BioUML framework using several ChIP-Seq peak calling tools. ChIPMunk software was used for systematic motif discovery from different peak sets. Motifs that displayed the best separation of the test (ChIP-seq peaks) and control datasets were selected. To reduce the number of irrelevant motifs emerged due to indirect binding we performed extensive computer assessment and human curation of the motifs found. As valid models, we selected those that were (i) similar to the already known motifs, (ii) consistent within a TF family, or, at least, (iii) with a clearly exhibited consensus (based on LOGO representation, manually assessed). The current version of HOCOMOCO (v.11) includes 1,302 mononucleotide and 576 dinucleotide PWMs, which describe primary binding motifs of each TF and reliable alternative binding specificities. An interactive interface and bulk downloads are available at http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. HOCOMOCO database can be used for exploration of different problems in genetics, medicine, and systems biology, and can support studies on evolution of TF binding sites.