We are interested in solving the challenges of analysing and interpreting large-scale biological data. We develop methods and software for the analysis of medical image data, particularly mammograms to detect breast cancer, and data produced by modern sequencing technologies, in particular single-cell genomic data. Further, we are interested in many biological applications of statistics and machine learning, including studying the effects of changes in DNA on gene expression in individual cells.
Research Overview
We are broadly interested in all the ways in which computational approaches can drive biological discovery. We are interested in developing statistical and machine learning methods and software tools for the analysis of high-throughput sequencing data, with a focus on single-cell genomic data. We are also interested in the ways in which DNA variation contributes to variation in gene expression at the level of individual cells. We study “single-cell genetics” in this sense by looking at single-cell quantitative trait loci and at the effects of somatic mutations in healthy ageing and cancer. Recently, we have expanded our research into deep learning methods for the analysis of medical image data, working closely with colleagues at St Vincent’s Breast Screen Clinic to identify breast cancer in mammogram images. We work closely with a wide range of biological collaborators to contribute computational expertise to studies motivated by specific biologically-focused questions.
Honours and PhD Projects
If you are interested in our work and are seeking Honours or PhD opportunities please contact Davis McCarthy on [email protected] to enquire about available projects.
Upcoming Positions
We are keen to hear from postdoctoral scientists with a PhD in a relevant computational discipline (up to 3 years postdoctoral). This position, under the direction of Dr Davis McCarthy, is advertised on our Careers page.
Research Themes
Analysis methods for single-cell genetics
Clonal cell populations at the single-cell level
Single-cell quantitative trait locus mapping
Bioinformatics software development
Deep learning for medical image data
Student Projects
Staff
- Dr Davis McCarthy
- Dr Christina Azodi
- Dr Puxue Qiao
- Dr Cynthia Liu
- Dr Carlos Pena Solarzano
- Dr Jackson Kwok
- Brendan Hill
- Neke Ibeh, PhD student
- Ruqian Lyu, PhD student
- Sagrika Chugh, PhD student
- Jeffrey Pullin, Masters student
- Sam Tanner, Lab alumni
Publication Highlights
- Andrews, T. S., Kiselev, V. Y., McCarthy, D., & Hemberg, M. (2020). Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data. Nature Protocols, 1–9. https://doi.org/10.1038/s41596-020-00409-w
- Mereu, E., Lafzi, A., Moutinho, C., Ziegenhain, C., McCarthy, D. J., Álvarez-Varela, A., Batlle, E., Sagar, Grün, D., Lau, J. K., Boutet, S. C., Sanada, C., Ooi, A., Jones, R. C., Kaihara, K., Brampton, C., Talaga, Y., Sasagawa, Y., Tanaka, K., … Heyn, H. (2020). Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nature Biotechnology, 38(6), 747–755. https://doi.org/10.1038/s41587-020-0469-4
- McCarthy, D. J.*, Rostom, R.*, Huang, Y.*, Kunz, D. J., Danecek, P., Bonder, M. J., Hagai, T., Lyu, R., HipSci Consortium, Wang, W., Gaffney, D. J., Simons, B. D., Stegle, O., & Teichmann, S. A. (2020). Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes. Nature Methods, 17(4), 414–421. https://doi.org/10.1038/s41592-020-0766-3
- Cuomo, A. S. E.*, Seaton, D. D.*, McCarthy, D. J.*, Martinez, I., Bonder, M. J., Garcia-Bernardo, J., Amatya, S., Madrigal, P., Isaacson, A., Buettner, F., Knights, A., Natarajan, K. N., Vallier, L., Marioni, J. C., Chhatriwala, M., & Stegle, O. (2020). Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nature Communications, 11(1), 1–14. https://doi.org/10.1038/s41467-020-14457-z
- Lähnemann, D., Köster, J., Szczurek, E., McCarthy, D. J., Hicks, S. C., Robinson, M. D., Vallejos, C. A., Campbell, K. R., Beerenwinkel, N., Mahfouz, A., Pinello, L., Skums, P., Stamatakis, A., Attolini, C. S.-O., Aparicio, S., Baaijens, J., Balvert, M., Barbanson, B. de, Cappuccio, A., … Schönhuth, A. (2020). Eleven grand challenges in single-cell data science. Genome Biology, 21(1), 31. https://doi.org/10.1186/s13059-020-1926-6
- Huang, Y.#, McCarthy, D. J.#, & Stegle, O.# (2019). Vireo: Bayesian demultiplexing of pooled single-cell RNA-seq data without genotype reference. Genome Biology, 20(1), 273.
https://doi.org/10.1186/s13059-019-1865-2
- Buettner, F., Pratanwanich, N., McCarthy, D. J., Marioni, J. C., & Stegle, O. (2017). f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biology, 18(1), 212. https://doi.org/10.1186/s13059-017-1334-8
- Kilpinen, H., Goncalves, A., Leha, A., Afzal, V., Alasoo, K., Ashford, S., Bala, S., Bensaddek, D., Casale, F. P., Culley, O. J., Danecek, P., Faulconbridge, A., Harrison, P. W., Kathuria, A., McCarthy, D., McCarthy, S. A., Meleckyte, R., Memari, Y., Moens, N., … Gaffney, D. J. (2017). Common genetic variation drives molecular heterogeneity in human iPSCs. Nature, 546(7658), 370–375. https://doi.org/10.1038/nature22403
- McCarthy, D. J., Campbell, K. R., Lun, A. T. L., & Wills, Q. F. (2017). Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics , 33(8), 1179–1186.
https://doi.org/10.1093/bioinformatics/btw777
- Fuchsberger, C., Flannick, J., Teslovich, T. M., Mahajan, A., Agarwala, V., Gaulton, K. J., Ma, C., Fontanillas, P., Moutsianas, L., McCarthy, D. J., Rivas, M. A., Perry, J. R. B., Sim, X., Blackwell, T. W., Robertson, N. R., Rayner, N. W., Cingolani, P., Locke, A. E., Tajes, J. F., … McCarthy, M. I. (2016). The genetic architecture of type 2 diabetes. Nature, 536(7614), 41–47.
https://doi.org/10.1038/nature18642
- Lun, A. T. L., McCarthy, D. J., & Marioni, J. C. (2016). A step-by-step workflow for low-level analysis of single-cell RNA-seq data. F1000Research, 5.
https://doi.org/10.12688/f1000research.9501.1
- McCarthy, D. J., Humburg, P., Kanapin, A., Rivas, M. A., Gaulton, K., Cazier, J.-B., & Donnelly, P. (2014). Choice of transcripts and software has a large effect on variant annotation. Genome Medicine, 6(3), 26. https://doi.org/10.1186/gm543
- Lund, S. P., Nettleton, D., McCarthy, D. J., & Smyth, G. K. (2012). Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical Applications in Genetics and Molecular Biology, 11(5). https://doi.org/10.1515/1544-6115.1826
- McCarthy, D. J.*, Chen, Y.*, & Smyth, G. K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research, 40(10), 4288–4297.
https://doi.org/10.1093/nar/gks042
- Robinson, M. D.*, McCarthy, D. J.*, & Smyth, G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics , 26(1), 139–140. https://doi.org/10.1093/bioinformatics/btp616
- McCarthy, D. J., & Smyth, G. K. (2009). Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics , 25(6), 765–771. https://doi.org/10.1093/bioinformatics/btp053
* co-first authors; # co-corresponding authors