Nikolai Slavov is an Allen Distinguished Investigator and Professor at Northeastern University. His work focuses on single cell proteomics and understanding proteins’ roles in life and pathology. Nikolai discusses mass spectrometry, machine learning and the importance of cross-disciplinary communication.
Please note the transcript has been edited for brevity and clarity.
FLG: Hello, and welcome to the latest “A Spotlight on” interview. Today, we’re going to be talking about single cell proteomics. I’m joined by Nikolai Slavov. Nikolai, could you please introduce yourself and tell everyone a little about what you do.
Nikolai Slavov: Thank you for inviting me. I’m delighted to join you. I’m an Allen Distinguished Investigator and Professor at Northeastern University. My laboratory is generally interested in proteome biology, understanding how proteins’ interactions and modifications give rise to life in normal physiology and pathology alike. Today, we are going to discuss technologies to quantify proteins in single mammalian cells that we develop and apply to various biomedical projects.
FLG: Thank you so much. What drew you to focus on single cell proteomics?
Nikolai Slavov: It is well appreciated and understood that most biological functions are performed by proteins. But our inability to analyse proteins in depth at high-resolution has resulted in biomedical research being focused primarily on analysing DNA and RNA molecules.
My background was previously focused on transcriptomics. I did my PhD in systems biology with David Watson’s group where we used the technology of the day, DNA microarrays. But as we were studying that, often my data would suggest that the interesting biological processes were taking place or happening at the level of protein synthesis or protein degradation, and we were limited in our ability to analyse those.
Around 2011, I realised that existing mass spectrometry technologies can be applied to analyse proteins with much higher throughput, quantitative accuracy and sensitivity than what had previously been done. This particular opportunity wasn’t being realised by anybody, as far as I could tell. This made me think that I could help accelerate the development of this part of biomedical sciences. I saw this as a very big opportunity. It resonated with me, and one of the most important criterion is the ability to accelerate the development of scientific research, not only to be a competent scientist who can help you in a particular milestone research project, which is obvious to many other colleagues in the field, but that you can do something unique that is going to make science proceed faster, and single cell proteomics seemed to be very much in this category.
I decided to give it a try to see if we could actually develop those technologies. The importance was very obvious to me and to anybody else, that if we are able to achieve cheap, quantitative protein analysis of individual mammalian cells, that will have their own mini applications, the significance was never in doubt. What, at the time, was much more in doubt and controversial, was the feasibility. Many of the leading experts in the field at the time believed that this was not possible and they were very sceptical. I was a newcomer, without a background in that particular technology. Not surprisingly, I did not convince the leaders in mass spectrometry overnight that we can do something that they believed was impossible. But I thought it was worthwhile giving it a try.
FLG: You have developed methods for high throughput single cell proteomics by mass spectrometry and you’re using them to quantify proteome heterogeneity during cell differentiation. It’s an emerging field, there’s been a variety of methods proposed by different groups. So, what makes your SCoPE2 project stand out from the crowd?
Nikolai Slavov: The approach of SCoPE2, and other methods that we have tried to develop… we emphasise its accessibility, from the very beginning.
Let’s take a step back and get a perspective on methods, how they have been developed and their differences. Looking from the outside, it is easy to see multiple different names being used and different methods. The field appears to be quite crowded, but the methods that are being used fall into only a couple of categories. And the methods within a category are quite similar to each other.
One approach is to analyse one cell at a time. In the jargon of mass spectrometry, we call this ‘label free’, because cells are not labelled, and we analyse only one cell at a time. This approach has been attractive to a number of colleagues. From my perspective, a major limitation of these approaches has been their limited throughput. Because mass spec instruments are quite expensive and mass spec time is expensive. If we only analyse one cell per hour, that is going to have limited scope and biological applications.
Our strategy has been to use labels, which are often called ‘multiplex approaches’. Instead of analysing one cell, we can barcode proteins from different single cells with single cell specific barcodes, and then we can analyse a dozen or more single cells at the same time. At the end of the experiment, we can tell which protein came from which single cell and therefore do the single cell quantification. This is one specific aspect that we introduced using isobaric mass tags, and until recently, all of the multiplex methods that anybody has used is the technology that we introduced using isobaric mass tags. More recently, we have introduced a different kind of multiplexing, which will be published in Nature Biotechnology, using non-isobaric mass tags. This has a different set of advantages, which we’ll discuss later on. But one aspect of our first approach, SCoPE-MS, and its second generation, SCoPE2, is the use of multiplexing, which allows increased throughput.
Another aspect at the time that was absolutely crucial, is the use of an isobaric carrier. In addition to the single cells, we barcoded a small bulk sample of cells, which allows us to reduce losses of single cells and various surfaces that the labelled cells interact with. This allows us to increase our ability to assign amino acid sequences to the peptides, and that was key for being able to quantify proteins in single cells in our very first attempt. At this point, new technologies developed by my laboratory and other laboratories have made the use of isobaric carrier less essential. But at the time, this was the first demonstration, to my knowledge, of being able to quantify hundreds of proteins across hundreds of single cells. That was very much enabled by using this approach of the isobaric carrier. In fact, it was enabled by using old equipment. We did not have access to cutting-edge, state-of-the-art mass spectrometry equipment. In fact, we did not have any equipment. I had to collaborate with a good friend of mine at Harvard, who made a key contribution by providing access to equipment that I did not have at Northeastern.
This whole approach of multiplexing, the isobaric carrier and other aspects that were introduced, made it very easy to implement in other laboratories. What has been a guiding philosophy for us, is not only to develop the best methods that we can possibly develop to give us the most accurate and highest throughput methods…all of that is great. But we have a significant constraint in everything we do. That constraint is that what we do should be reproducible in other laboratories; others should be able to do it. Ideally, we make it as easy as possible for others by using equipment that is widely available, that is relatively inexpensive. They’re not cheap, nothing is very cheap, but at least we can use equipment that is much cheaper; in some cases, orders of magnitude cheaper, than alternative equipment. That has been a very important guiding principle for us, trying to make the technology accessible, make it as inexpensive as possible, make it high throughput.
I think that these are distinctive aspects that have allowed many laboratories to adopt and start using SCoPE2. I know of a number of mass spectrometry facilities, both in the US and Europe, that have successfully implemented SCoPE2. There are several other methods using multiplexing, sometimes with slightly different names, but they’re really variations on the same approach that SCoPE-MS introduced. I mentioned other methods that exist – they are label free approaches. Another set of approaches we recently introduced was by doing data-independent acquisition, combined with multiplexing, non-isobaric labels are used in that context. I’m very enthusiastic about the potential of these approaches to inherit many of the advantages of SCoPE2, in terms of being accessible, being relatively inexpensive etc. But they have the potential to provide even deeper proteome coverage and substantially higher throughput.
FLG: You’ve talked about the benefits and potentials there. What about some of the challenges? What are some of the key challenges in scaling up single cell analysis to the proteome?
Nikolai Slavov: Some of the challenges are very similar to the challenges of mass spectrometry proteomics. Any kind of mass spectrometry proteomics analysis is not as widely integrated with biomedical research as DNA or RNA sequencing methods. The reasons for that are numerous. Some of them are technological, I think a lot of them are societal and policy based. It’s a level of understanding of the technology by colleagues who drive biomedical research, it’s funding from various governments, institutions, and so on. All of these problems that have generally made mass spectrometry proteomics less accessible and less integrated with biomedical research, are also applied to single cell protein analysis.
Fortunately, these problems are not unsolvable. They certainly have solutions. They’re not simple. I cannot solve them overnight. But we try to help, certainly at the level of education, we’re very passionate about doing our best to explain the technology in an accessible manner to the wider community. Some of those have to do with articulating a compelling vision and justifying funding to develop standard operating procedures. We also articulate the problems in biology and medicine that really need proteomics, and why we should invest in doing the protein analysis as opposed to focusing on the more accessible transcriptomic and genomic analysis. In terms of adoption of SCoPE2 in existing facilities and laboratories that can already do protein analysis well, I think all of these laboratories and facilities that can do quantitative proteomics should be able to implement SCopPE2, so there are no major additional bottlenecks.
There is one disadvantage, compared to single cell RNA sequencing, which has a high throughput. 10x Genomics has made it possible to analyse in the order of 10,000 single cells in a relatively convenient way, in a single sample. This is even more challenging to deal with. SCoPE2 throughput is much more comparable to the multi well plate-based approaches such CEL-Seq, SMART-Seq, and so on. And, to some extent, this reflects the current state of the field. It’s not a limitation of mass spectrometry proteomics or single cell analysis. It is simply the level of throughput that current technologies have achieved.
In fact, with the new multiplex data independent acquisition framework that we’ve introduced, we believe that we can get to analysing the proteomes of five thousand single cells per day, per single instrument, and potentially scale that even further. The opportunity certainly exists to increase throughput substantially; though at the moment, the current situation is relatively weak when compared to the more mature droplets single cell RNA sequencing.
FLG: When it comes to analysis, we’re entering the data revolution, with potential data overwhelm. Machine learning is often touted as the solution. How realistic and attainable is it in your field? What are the limitations we may see?
Nikolai Slavov: Data analysis plays a very, very important role. Many of the gains that we have made have depended on introducing new data analytics. Actually, in the same way, with SCoPE2 we took advantage of a Bayesian framework to incorporate additional features in determining peptide sequences, such as retention time. Data analytics certainly holds much promise to further interpret the data that we collect and to further downstream analysis as well. Simply quantifying protein abundances, interactions or modifications is just the beginning of the project. It’s not the end goal. The end goal is biological interpretation, which again, requires various types of data analysis.
I tend to see data analysis as being a very exciting and productive component of what we do. It is true that we generate gigabytes, sometimes terabytes, of data, and those need to be analysed. Fortunately, we have access to clusters that make this analysis quite doable. I would say that data analysis in terms of the volume of the data is certainly not the bottleneck. Current algorithms are not extracting as much from the data as I believe we can extract, so one can say that they’re limiting, and this is the glass being half empty. But the half full part of the glass is the opportunity to advance those algorithms so that we can interpret a lot more from the data. We think that’s a very exciting opportunity. Big data, data interpretation, data analysis, machine learning – they’re certainly very important parts of the field. I think that we’ll see many of the advances ahead of us be driven, or at least aided by, improvements in data analysis, and you’re certainly not limited by computational power.
FLG: At the Slavov laboratory, you seek to coordinate across protein synthesis, metabolisms, cell growth and differentiation. Can you run us through some case studies of what’s come out of your lab and the work you’re producing?
Nikolai Slavov: We identify ourselves as part of the systems biology community, where we try to understand some of the principles of biological behaviour and emergent behaviours. We do many molecular studies and we’re interested in mechanistic exploration of what underpins biological functions. But often, we see that it is not a simplistic picture of the single protein or a single transcription factor doing all of the regulation resulting in a change. It’s a more complex system that is underpinning life.
I can share with you a couple of specific, recent projects that have given interesting results. One of these is we studied a process that is quite relevant to the beginning of my life and the beginning of your life, which is that what first breaks the symmetry as a zygote gives rise to a two cell embryo and a four cell embryo? When do these cells become different?
For decades, developmental biologists believed that the two cell stage making up human and mouse embryos are similar. Perhaps identical, indistinguishable. More recently, there has emerged evidence that they have different functional potential to develop into different fates, but it has been very difficult to identify molecules associated with this different developmental bias. Despite the fact that multiple groups have applied state-of-the-art single cell RNA sequencing to study the cells, there are very few RNA molecules which show bias between those two blastomeres. When we approached the same question by measuring proteins instead of RNA molecules, we found many hundreds of proteins that were systematically and consistently differentially abundant between the two cell stages. We were able to find functionally coherent sets of proteins that follow that further. Furthermore, our data strongly suggested that what was breaking the symmetry in those early stages of our development is protein degradation in protein transport. This, in retrospect, should have been clearer, because in those early stages of development, transcription is not dominant. It’s not very adaptive. Therefore, the way that cells distinguish between each other and break their symmetry is by working with the proteome that they have. They can either degrade or transport proteins to a different location of the zygote. So, that is a project that was quite encompassing in terms of different biological experiments, both functional and analytical, and clearly indicated a biological system where the most exciting and interesting process happening would have been missed. In fact, they had been missed.
Another recent project is studying polarisation in primary macrophages in a model system that many people have traditionally considered as being highly homogeneous and, in particular, the issue of bone marrow derived macrophages. Macrophages are well known for being polarisable. However, in certain conditions, when derived from bone marrow, people tend to think of them as not having a loss of single cell heterogeneity. We analysed those cells, either unstimulated or LPS stimulated, and to the surprise of our collaborators at Harvard Medical school, we found there are substantial molecular differences. Even within the treatment group, even before the cells were treated with lipid polysaccharides. Of course, we were surprised, so then the question emerges, are these differences just some molecular fluctuations that have no biological significance? Or do they have any relevance for the biological functions of the cells? We explored that by further measuring the ability of these macrophages to take up fluorescently labelled dextran particles. We found that the molecular differences are strongly coupled to the function differences. Indeed, there was a 30-fold variability in endocytic activity of these primary macrophages.
Another aspect that also highlighted the benefit of using protein analysis, as opposed to just RNA analysis, was our ability to measure proteolysis. It is well known that many biological processes, including macrophage polarisation, are regulated in part by regulated proteolysis. And by mass spectrometry, we can specifically detect the proteolytically cleaved products from that regulatory signalling, and be able to associate this with inflammatory or anti-inflammatory processes. That’s a level of regulation that is very difficult to measure if one is simply sequencing RNA molecules.
FLG: Thank you for sharing these fascinating case studies. To link back to something you were talking earlier about – reproducibility and accessibility. I know you have a great YouTube channel featuring lots of your scientific presentations. So I’d love to hear your thoughts on open access research and the importance of cross disciplinary communication.
Nikolai Slavov: I think this is an example of a win-win strategy. I think that open research is very beneficial for the community, but also very beneficial for groups who practice it. One way that benefited us very significantly in the early days in establishing credibility of this emerging field, where we, the newcomers, proposed that we can do something that the established leaders in the field couldn’t do, or claimed wasn’t possible to do. Of course, that resulted in a lot of scepticism. Part of what helps overcome the scepticism is that colleagues from other laboratories downloaded our data, or repeated our analysis, and obtained results that were qualitatively identical, for all practical purposes, to what we had done. While reproducibility is not the same as accuracy, this ability to reproduce our results landed a very large degree of credibility to what we had done, and was very, very healthy for the field.
Another example of this benefiting us is, I think it sets the standard and the bar high for all the students and postdocs in the group. If they make the work easy for others to reproduce, it also becomes very easy for them to introduce new data to their pipelines and easily revise their papers in the process of peer review. Which is, unfortunately, not the standard in the community.
I was recently an editor for a paper where a reviewer suggested a very, very appropriate change to the data processing paper. The authors responded that they couldn’t do that because they would have to regenerate their figures. That was very difficult for them to do and justified not doing it that way, which is quite pathetic, frankly, as a justification for not improving the analysis. It takes more time to begin with to establish your producible pipeline of data analysis. But in the long term, it actually saves you time because, for impactful papers, one has to revise the figures multiple times.
If everything is set up in a way that allows these revisions to be done simply and quickly in the long term, it enables you to benefit from constructive feedback from reviewers, and also allows us to incorporate new data for others to build upon it. Ultimately, that’s why we do science in my view. That’s perhaps a bit more idealistic. But I can do a lot of interesting things in life and my time is very precious. Why spend my time on something that is not going to have impact? It just doesn’t make sense to me to even do that – it’s not worth my time. If I’m going to spend my time doing science and research, I might as well do it in such a way that it really drives the global research enterprise, it really provides useful resources for others in the community to build upon. To me, it’s obviously the right thing to do. I think it’s right, not only philosophically, but I think practically it has helped us in a number of ways to attain high visibility and to help the field that we helped start.
FLG: What do you think is the endgame for these technologies and methods we’ve discussed today? What do you want to see in the future?
Nikolai Slavov: I’d love to see the technology being so accessible and so inexpensive, that every biological project that can benefit from doing single cell protein measurements, can do them as easily and perhaps more cheaply than currently. Not only making it cheaper and easier, but extending the reach of what we can measure, because measuring proteins expands the scope of our analysis, but it’s clearly insufficient. We want to be able to measure protein interactions, protein modifications, protein localisation in the cell, protein activities, protein confirmations… all of these very important layers of biological activities and regulation, that our technologies can be extended to make it accessible, because they currently cannot do all of these things. We have not yet done it. But I see no reason why this cannot be developed.
I think it’s going to be a very exciting path of technology development towards achieving this. Once we have that technology, I think we’ll be in a much better position to catalyse a more mechanistic approach to single cell biology. Not only to identify different clusters of cells and to describe differences in cell states, in different pathophysiological conditions. But to be able to measure the molecular processes that ultimately underpin those different stages and contribute to either health or disease.
FLG: And to finish up as we’re talking about being excited about the future. What are you excited about at the upcoming Tri-Omics Summit?
Nikolai Slavov: I’m excited about networking with colleagues from different fields, who can benefit from using the technologies that we have helped develop, implementing them to solve their problems, and joining the field. I think we need experts with different backgrounds. We need a very interdisciplinary community to join and help realise the potential and the promise of single cell proteomics. We cannot do this alone. We need a broad spectrum of expertise from different fields.
FLG: Thank you so much for joining me today. I’ve learnt an incredible amount. And I know I want to hear more sneak peeks about what’s coming out of your lab as well. Thank you for joining me today.
Nikolai Slavov: Thank you for inviting me. I enjoyed our discussion.
Watch more interviews in our series here.