With the ever-increasing potential of new technology and the exponential growth of the life sciences field, researchers are always running into new problems to solve. In this interview series, we get scientists’ opinions on the ‘Big Challenge’ in their field and the steps being taken to address it. From new and unique hurdles to fresh takes on common problems, we dive into the complexities of the research landscape.
In this interview, we chat to John Cole (Bioinformatics Research Scientist, University of Glasgow) about the big challenge in the bioinformatics space, including education, data analysis and automation.
FLG: What is your background and role?
John: My name is John Cole, and I’m a Bioinformatics Research Scientist at the University of Glasgow. I manage a small bioinformatics facility in the School of Infection and Immunity. It’s small because it’s literally just me. So, I support everyone in the building, which is probably about 50 labs.
I left school early, and trained to become a landscape gardener. My career reached the lofty peaks of working in a graveyard in Sydney, a big graveyard, where I’d probably spend half my time just killing spiders. After a few years I went home and studied genetics in Glasgow, which I enjoyed, and I worked as a Research Assistant on ADRB2 for a couple of years, eventually realising I didn’t really like the wet lab. So, I quit and became a curry chef! After about 5 years, I realised again that I always wanted to be a scientist, but I wanted to get into computational stuff. So, I studied bioinformatics at Glasgow, and that suited me quite well because it had very few exams and lots of coursework, which I preferred. From there on, I got a job working in cancer research as a bioinformatician.
I’ve worked on all kinds of different things like leukaemia, melanoma and ageing, every type of dataset under the sun (proteomics, metabolomics, transcriptomics, genomics, epigenomics, etc.), all these multi-omics projects. Then I moved into immunology, in a collaboration with AstraZeneca. What I built on there was, I always liked helping other people; helping people with their research was always what I wanted to do. I think also coming from a labouring background, I always thought about efficiency. I think lots of stuff in academia isn’t as efficient as it could be. So, something I’ve worked quite hard on is ways to make sequencing, particularly RNA-seq analysis, much quicker and easier. Another thing I’ve focused on is how to train people to do omics, and that’s worked quite well.
FLG: What is the ‘big challenge’ in your field?
John: I honestly think the biggest challenge is time. Because most labs now do some omics, it’s very easy to do an omic experiment on the bench, right? It’s getting cheaper and cheaper and there’s a bigger range of things to do and more people are doing it. But there aren’t enough bioinformaticians, and there aren’t enough wet lab people that have been trained on how to do it. The analysis for lots of types of omics – maybe not for the most eye-catching types like single cell, but certainly for the ones that people do most frequently, like bulk RNA-seq – is just not automated enough. There’s been a real obsession with processing the data, historically, going from sequences to a table of values. But it’s actually quite frightening how little information there is on automating the actual analysis. People are like, ‘Oh, it can’t be done,’ or, ‘We wouldn’t want to do that because I already know how to do that in R.’ But no one has an issue with automating the processing step. Everybody automates that, whereas practically nobody automates the analysis, and it’s quite odd.
I think it partly comes from the fact that lots of the bigger and more serious omics groups, historically, have done genomics, and genomics is much more about processing than analysis. If I was to generalise, that just hasn’t filtered through. If you look at guides for how to do omics, excluding single-cell, all the guides are ‘how to process’ and there’s just so little about how to do the analysis properly. It’s interesting because how to do the analysis is not very well standardized. But it’s also not particularly difficult because there’s actually only a very limited number of things you could do for an omics project. But you can plot it in an infinite number of different ways and that makes it look more complicated than it is. Because it hasn’t been automated, there’s not enough time for people to analyse all these datasets thoroughly. Things don’t get done as well as they could or as deeply as they could, or data gets pushed under the carpet.
The second side to that, in terms of time, is training. Because bioinformatics is so different – or it’s seemingly so different because you need to code – the generation above me are the ones that teach the undergraduate and Master’s students, but they never had to learn it, so they can’t teach it. It’s only when the generation below me, who’ve been using omics in their wet lab stuff from the start, become the professors that it’s really going to come through in a strong meaningful way into undergraduate teaching.
A lot of the teaching is limited; it maybe could be a little better to be honest. For people with PhDs and postdocs, part of the problem is there are just very few good courses out there that people can do. Because they’re either not very well targeted, like making plots about car mileage to learn R, which is just not relevant, or it’s a course where they start teaching you how to do command line programming, which is horrendous to learn. No one should ever learn coding through command line. But it’s also the bit you never spend your time on when you’re in the wet lab, you need to learn the analysis. So, people get that wrong.
Also, people try and make courses to teach you coding in two days, which you can’t do, you need weeks. Also, they’re just really expensive. So, you can go to some courses, and it’s £2000 for a week’s training, where they try and cover everything. By the time you reach Monday afternoon, you can’t even remember what you learned in the morning, because it’s gone too quickly. So, I think that is the problem, there’s not enough bioinformatics time out there.
FLG: Why should people care about this challenge?
John: Because so many people are doing omics, and so few people have the proper training to do it. Or so few people have access to the pipeline that does the analysis for you. And I think there’s a bit of stigma around, ‘You shouldn’t use a pipeline to do it for yourself, you should understand it.’ Yeah, you should understand it, but you can still use the pipeline.
So, people need to care because you don’t want to spend £100k on a data set and not be able to analyse it. Or not get very much out of it or make mistakes. I’ve seen all these things happen – I had someone come to me recently who had spent £80,000 on different spatial, single-cell, proteomics data types, and they had no way to analyse it.
You think you can pay a company to do this, but you can’t. Because if it takes a year to analyse the data, you can’t pay someone, a company, £100,000 to analyse your data. They’ll do a much worse job than you because they don’t care, and they also know nothing about the biology. So, it’s really important to have bioinformaticians that work with wet lab people and wet lab people who know how to work with bioinformaticians. It’s essential, and it’s only going to get increasingly so.
Bulk RNA-seq is now £80 a sample, which is nothing, right? There’s no point in doing qPCR, you just do sequencing, it’s the same price. Proteomics is better now, it actually works, you get 10,000 genes and proteins and they’re good. Then you’ve also got this massive explosion in spatial omics. I’m excited about this, it’s so good. But people need to be able to analyse the data.
FLG: What is being done to tackle the issue, or what should be done to tackle the issue?
John: I’m a bit cynical, actually, I think what’s being done is not really in a deliberate way. People are starting to wake up and see that teaching should be a bit better. People are starting to wake up and see that if you’re going into a PhD, then you might have to do this. People are just starting to wake up, more so than actively formulating some plan. I’m still a little dismayed by the training in the UK and elsewhere for PhDs or postdocs. In terms of pipelines, again, things like Seurat, which is what is used for single-cell analysis, is very good, it is quite nicely pipelined. But again, a lot of it can just be automated and it’s not presented in that way.
Another issue is that for lots of people when they try and present pipelines for things, it’s like, ‘Do you want a PCA? Yes or no. Do you want a heat map? Yes or no.’ You’ve got to click about 500 buttons. But you’re always going to want a PCA, you’re always going to want a heatmap, it might as well just give you it without asking you and then you can just choose whether to look at it or not. That sounds trivial but it is a massive thing; if you’re scared of omics, which lots of people are, being asked those questions is just going to turn you away.
So, I think what should be done is that the analysis should be more automated, and people need to pay attention to that. It needs to be acceptable in the culture for you to just fire it through a pipeline and basically get your figures more or less ready. I think people need to be trained – that’s what needs to happen. The training needs to be not by computing scientists and people that don’t do any biology, the training needs to be by people like me that have been in the wet lab and can understand what you need to do, and understand that it’s scary to learn a new thing, and sympathise and actually give people what they really need. A week of learning the command line is useless to most wet lab people, whereas a week or two of R is invaluable.
FLG: What is your advice to people breaking into the field?
John: I think bioinformatics is a good field to break into because there’s a shortage of people that are good. So, there are quite a lot of jobs. You can even skip a PhD and stay in academia, or industry, that’s quite doable. Probably the best advice generally is just, don’t do what everybody else does. There are a thousand tools out there that are web platforms for pathway analysis and so many things are just copies of other things. Whereas, actually, a lot of the basic stuff still remains to be done. So, just think for yourself and don’t do what other people do, just do what you think should be done. That’s probably the best advice I could give. It worked for me, I suppose.
Enjoyed this interview? John Cole is one of 250+ speakers joining us at The Festival of Genomics & Biodata in January. Register here to hear more from John and the rest of our expert speaker faculty in London.