Bioinformatics & cellular genomics

We are interested in solving the challenges of analysing and interpreting large-scale biological data. We develop methods and software for the analysis of medical image data, particularly mammograms to detect breast cancer, and data produced by modern sequencing technologies, in particular single-cell genomic data. Further, we are interested in many biological applications of statistics and machine learning, including studying the effects of changes in DNA on gene expression in individual cells.

Research Overview

We are broadly interested in all the ways in which computational approaches can drive biological discovery. We are interested in developing statistical and machine learning methods and software tools for the analysis of high-throughput sequencing data, with a focus on single-cell genomic data. We are also interested in the ways in which DNA variation contributes to variation in gene expression at the level of individual cells. We study “single-cell genetics” in this sense by looking at single-cell quantitative trait loci and at the effects of somatic mutations in healthy ageing and cancer. Recently, we have expanded our research into deep learning methods for the analysis of medical image data, working closely with colleagues at St Vincent’s Breast Screen Clinic to identify breast cancer in mammogram images. We work closely with a wide range of biological collaborators to contribute computational expertise to studies motivated by specific biologically-focused questions.

Honours and PhD Projects

If you are interested in our work and are seeking Honours or PhD opportunities please contact Davis McCarthy on [email protected] to enquire about available projects.

Upcoming Positions

We are keen to hear from postdoctoral scientists with a PhD in a relevant computational discipline (up to 3 years postdoctoral). This position, under the direction of Dr Davis McCarthy, is advertised on our Careers page.

Research Themes

Analysis methods for single-cell genetics

Clonal cell populations at the single-cell level

Single-cell quantitative trait locus mapping

Bioinformatics software development

Deep learning for medical image data

Student Projects


  • Dr Davis McCarthy
  • Dr Christina Azodi
  • Dr Puxue Qiao
  • Dr Cynthia Liu
  • Dr Carlos Pena Solarzano
  • Dr Jackson Kwok
  • Brendan Hill
  • Neke Ibeh, PhD student
  • Ruqian Lyu, PhD student
  • Sagrika Chugh, PhD student
  • Jeffrey Pullin, Masters student
  • Sam Tanner, Lab alumni

Publication Highlights

  1. Andrews, T. S., Kiselev, V. Y., McCarthy, D., & Hemberg, M. (2020). Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data. Nature Protocols, 1–9.
  2. Mereu, E., Lafzi, A., Moutinho, C., Ziegenhain, C., McCarthy, D. J., Álvarez-Varela, A., Batlle, E., Sagar, Grün, D., Lau, J. K., Boutet, S. C., Sanada, C., Ooi, A., Jones, R. C., Kaihara, K., Brampton, C., Talaga, Y., Sasagawa, Y., Tanaka, K., … Heyn, H. (2020). Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nature Biotechnology38(6), 747–755.
  3. McCarthy, D. J.*, Rostom, R.*, Huang, Y.*, Kunz, D. J., Danecek, P., Bonder, M. J., Hagai, T., Lyu, R., HipSci Consortium, Wang, W., Gaffney, D. J., Simons, B. D., Stegle, O., & Teichmann, S. A. (2020). Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes. Nature Methods17(4), 414–421.
  4. Cuomo, A. S. E.*, Seaton, D. D.*, McCarthy, D. J.*, Martinez, I., Bonder, M. J., Garcia-Bernardo, J., Amatya, S., Madrigal, P., Isaacson, A., Buettner, F., Knights, A., Natarajan, K. N., Vallier, L., Marioni, J. C., Chhatriwala, M., & Stegle, O. (2020). Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nature Communications11(1), 1–14.
  5. Lähnemann, D., Köster, J., Szczurek, E., McCarthy, D. J., Hicks, S. C., Robinson, M. D., Vallejos, C. A., Campbell, K. R., Beerenwinkel, N., Mahfouz, A., Pinello, L., Skums, P., Stamatakis, A., Attolini, C. S.-O., Aparicio, S., Baaijens, J., Balvert, M., Barbanson, B. de, Cappuccio, A., … Schönhuth, A. (2020). Eleven grand challenges in single-cell data science. Genome Biology21(1), 31.
  6. Huang, Y.#, McCarthy, D. J.#, & Stegle, O.# (2019). Vireo: Bayesian demultiplexing of pooled single-cell RNA-seq data without genotype reference. Genome Biology20(1), 273.
  7. Buettner, F., Pratanwanich, N., McCarthy, D. J., Marioni, J. C., & Stegle, O. (2017). f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biology18(1), 212.
  8. Kilpinen, H., Goncalves, A., Leha, A., Afzal, V., Alasoo, K., Ashford, S., Bala, S., Bensaddek, D., Casale, F. P., Culley, O. J., Danecek, P., Faulconbridge, A., Harrison, P. W., Kathuria, A., McCarthy, D., McCarthy, S. A., Meleckyte, R., Memari, Y., Moens, N., … Gaffney, D. J. (2017). Common genetic variation drives molecular heterogeneity in human iPSCs. Nature546(7658), 370–375.
  9. McCarthy, D. J., Campbell, K. R., Lun, A. T. L., & Wills, Q. F. (2017). Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33(8), 1179–1186.
  10. Fuchsberger, C., Flannick, J., Teslovich, T. M., Mahajan, A., Agarwala, V., Gaulton, K. J., Ma, C., Fontanillas, P., Moutsianas, L., McCarthy, D. J., Rivas, M. A., Perry, J. R. B., Sim, X., Blackwell, T. W., Robertson, N. R., Rayner, N. W., Cingolani, P., Locke, A. E., Tajes, J. F., … McCarthy, M. I. (2016). The genetic architecture of type 2 diabetes. Nature536(7614), 41–47.
  11. Lun, A. T. L., McCarthy, D. J., & Marioni, J. C. (2016). A step-by-step workflow for low-level analysis of single-cell RNA-seq data. F1000Research5
  12. McCarthy, D. J., Humburg, P., Kanapin, A., Rivas, M. A., Gaulton, K., Cazier, J.-B., & Donnelly, P. (2014). Choice of transcripts and software has a large effect on variant annotation. Genome Medicine6(3), 26.
  13. Lund, S. P., Nettleton, D., McCarthy, D. J., & Smyth, G. K. (2012). Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical Applications in Genetics and Molecular Biology11(5).
  14. McCarthy, D. J.*, Chen, Y.*, & Smyth, G. K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research40(10), 4288–4297.
  15. Robinson, M. D.*, McCarthy, D. J.*, & Smyth, G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140.
  16. McCarthy, D. J., & Smyth, G. K. (2009). Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25(6), 765–771.

* co-first authors; # co-corresponding authors