Ensemble clustering refers to the process of perturbing the data, the relationships between the data, or the metrics by which relationships are judged and clustering many times under these perturbations (please see our review for a fuller explanation). The lab has had great success in developing and applying ensemble clustering to biological data to gain insight in the physical systems. Examples of successful biological understanding that is gained from such approaches include:

  • Identifying transient, phosphotyrosine-directed interactions in the EGFR network by robustly clustering tyrosine phosphorylation dynamic data with our collaborator Forest White (paper)
  • Identifying context-specific interaction differences in the EGFR/HER2 networks (paper).
  • Separating the role of HDAC in driving gene expression changes, versus axon-end acetylation, from gene expression data with our collaborator Valeria Cavalli (paper)
  • Uncovering novel roles of proteins in the DDR2 (Discoidin Domain Receptor) from phosphoproteomic data with our collaborator Paul Huang (paper)
  • Separating the responses of the EGF receptor system to ligands and doses from luciferase complementation assay with our collaborator Linda Pike (paper)

The OpenEnsembles project is our open source Python project to implement ensemble clustering for a greater accessibility to the approach. This is an ongoing developmental project and incorporates our work initially developed in Matlab (MCAM). The project and code are here.