Data sharing & sources of bias in metabolomics

In this episode, Alice and Gabi Kastenmüller talk about how knowledge is power for data stratification, the importance of teamwork in a metabolomics project, and Gabi’s effort to maintain data and bioinformatic tools available for the community.
Gabi Kastenmüller

Gabi Kastenmueller

Group leader Metabolomics at the Institute of Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, Germany (since 2011)

Kastenmueller group @Helmholtz

Favorite metabolite

Discussed paper by Arnold et. al.
Sex and APOE ε4 genotype modify the Alzheimer’s disease serum metabolome

Database links discussed
AD Atlas
GWAS Server

Other resources
Talk about the Omicscience platform by Prof. Claudia Langenberg
(given at the biocrates Pan-Cohort Metabolomics event 2021)

More about The Metabolomist podcast

Episode Transcript

Alice: Gabi Kastenmüller is a group leader in metabolomics at the Helmholtz Center in Munich. You’ve been working for a while on metabolomics. Would you like to add something.

Gabi: I would say maybe even a bit more than a decade in metabolomics now, and this was for me a perfect synthesis with a background in chemistry and computer science and data, but it also has to do with chemical molecules.

Alice: It converges on metabolomics quite well. From what I understand, your research focuses a lot on metabolomics, a bit on GWAS as well, mostly creating tools then to better analyze and interpret this type of data. Is it mostly bioinformatics or do you do also different things?

Gabi: I think our general driver for our research is that we want to understand what influences a human metabolome. Coming from genetics as an inborn factor of metabolism, but also influences by the microbiome, by nutrition, by exercise. We want to understand these influences on the human metabolome and the changes of the metabolome over time by using metabolomics.

Alice: And the metabolome is specifically interesting type of molecules to study in this context? Isn’t it too, because a lot of people would study proteomics or maybe transcriptomics, but you made a choice at some point to focus on metabolomics. Is it because it’s so sensitive to these different influences?

Gabi: Exactly. It’s really in the intersection of all these influences and all these influences that we know are important for developing diseases and also for treating diseases. So it’s molecular layer, very close to the phenotypes or to the symptoms; very close to what you can do against it in an easy way, like a diet and exercise, but also when you have drugs and medication, they often act on the layer of metabolism. So it’s the molecular layer where it all comes together: The genetic imprint and what we do to our bodies with our lifestyle.

Alice: In part of your work, you develop by bioinformatic tools and databases, some of which are available online. Could you tell a bit about this free access to data because people see sharing data, whether it’s metabolomics or other type of omics data as a kind of a chore, sometimes they’ll have a feeling it’s a different dynamic on your side. I have a feeling that sharing the data is part of the processes for you.

Gabi: We share the opinion that it’s an important process of work. So, as I said, the driver of our work is to understand the metabolome systemically and you won’t get to that understanding without collecting the results from very different angles of research, very different projects, where you look into diseases.

Compared to where you look into nutrition and exercise and so on. And what we try with our tools is bringing these results together. It’s not only about sharing the actual data. It’s in our opinion, also important to share this wealth of results that you get from that data. It’s much easier to share results compared to data because you don’t have all these restrictions coming from data protection.

You have to make sure that you don’t give out private sensitive data. This problem you don’t have when you are sharing results, but the essence of what you do still has to come together. Sharing the results is as important as sharing the data.

Alice: In terms of results, do you mean, for example, the associations that you find when you combine metabolomics with G was, this is the kind of results that you share, right?

Gabi: Yes. We make these online tools that you already mentioned to bring all the results, not only highlights, which you focus on in a publication, but we want to bring all the results to the people interested in these results. So if someone is interested in a single very particular gene, because he or she is working on that gene for decades and is very interested in the function there, then this person should be able to see and search easily for associations that we, for example, see with metabolites for this gene. And this can suggest new experiments for someone who is really interested in this very specific functional aspects. Usually, these type of results are somewhere hidden in supplements and the people interested in those results won’t even find them.

Alice: Yes, absolutely. And it is really useful. I’ve worked with a lot of different omics, not just metabolomics, but still I have no chance working with Gus. I’m not a bioinformatician. If the data was freely available on a server somewhere, but I still had to mine it myself. I would have no chance. The fact that you provide the associations and it’s really simple to either start from the metabolites or from the genes and to find just what came out of the analysis. And that can be an inspiration sometimes when you’re stuck, even just to try and see if there’s a direction you hadn’t thought about and if there’s something in the literature that you wouldn’t have found by looking at the general topic is it’s maybe not a mainstream theory yet. And you can find new interesting things in that way.

Gabi: That’s the goal of these tools that we try to make our association results as accessible as possible.

Alice: Do you want to name a few of the tools? We will put a list of links so people can find it, but maybe she wants to name a couple of such tools.

Gabi: In collaboration with the group from Cambridge now in Berlin, the group of Claudia Langenberg [A talk by Prof. Langenberg on omicscience can be found here], we set up a whole set of such online supplements for GWAS, which is called

It’s not only about metabolites, it’s also proteins in part, but we will add further association results from big cohorts there. And there you can find also an association results from metabolites, with diseases and disease risk factors from a very large population of 11,000 participants.

Alice: These tools are still growing so that it’s not just a static result of a project or a paper, but every new study that can be added to it.

Gabi: We try to sustain these things as good as possible. It is always difficult because of funding. These kind of things are not really funded well, but we try to keep those servers up. Not only for one or two years after the publication, as a growing resource.

The other resource, which is already quite old, but still we are updating and developing this further is the where we collect information on each and every genetic variant in the human genome. So the type of associations we link in there are results from different GWASes that are also available on other places, but also metabolite associations and associations with proteins.

Maybe a third addition: Our newest one, the AD Atlas. We really try to not only make single associations accessible one by one (if people are interested coming from a different metabolite or gene) and just give a list. We focus in the AD Atlas on bringing these lists and these associations together into one network to get a more systemic view on the interactions between the different molecular layers between the different disease phenotypes. And in this case, we made that specifically focusing on Alzheimer’s disease related phenotypes. In general, this is not restricted to the disease when it comes to the molecular backbone. We use really broad data from healthy big cohorts – A lot of the gene-metabolite association analysis or gene-protein association analysis.

Alice: So you combine GWAS – metabolomics – proteomics. Other things as well?

Gabi: As it’s specifically focusing on AD we are also including metabolite-disease phenotype associations, and also tissue specific expression profiles, for example, that are particularly interesting for AD. So we have different brain tissue and gene expression information from a big US consortium working on AD and that’s all brought together in this database.

Alice: Did you find new ideas about the pathology of AD from this association that you make now in these networks?

Gabi: So far, we mainly looked into it in a more explorative way of starting from specific group of metabolites that we saw associated, or a set of genes that is interesting from a particular perspective, like targets of a drug such as statins. And then, different statins have different targets beside the main target.

It is starting from questions like these and then we create and analyze sub-networks in terms of enrichment of genes we have in the bigger network. From these enrichments, we got to suggest a drug repurposing by combining it with different databases where you have this information on drug screens.

That’s what we already tried to show that this network is of value. In the next step we want to really mine the complete networks. That’s what you were saying. We just want the data to speak to us. So this will be the next step that we provide the possibility to analyze this huge network of millions of data points.

With graph based methods.

Alice: So at the moment, it’s not yet available to the public to use, but this is the plan.

Gabi: What people can do with the Atlas already now is to explore starting from a set of genes set of phenotypes, et cetera.

Alice: So now you can already filter and play around and see what comes up – it’s already available.

Gabi: Yes. It’s already available and you can filter, you can search, but the step that is missing and that we want to go now is to take the full network that we created and mine it in a more holistic way.

Alice: Are you using Alzheimer’s now as a kind of training disease and then planning to maybe use the same strategy that you hold now with this disease to other interesting diseases, is that the plan?

Gabi: The backbone of molecular interactions will be the same anyway, because that’s coming from the very big, more or less healthy cohorts, right? And we plan to couple this with different disease specific databases, information, or consortia data in the future to make different atlases for different diseases, but also to use that for a better understanding of co-morbidity – That is also one of the interests of my group. We want to understand on the metabolic level why obesity and the linked metabolic pathways, a risk factor to basically all age-related diseases. Why is it going to type two diabetes for some individuals and to Alzheimer’s for other individuals. These pathways are all connected and this is known, and we want to understand how.

Alice: That’s a lot of fascinating work ahead. I think.

To prepare this interview, I put your name in PubMed to see what comes up. And what was interesting with you is that if looking at the first two pages, you have a multitude of very different papers that each one seems fascinating, but there’s one paper that I find particularly interesting because it talks about a topic I’m very interested in is the sex differences and metabolomics. In 2020, there was a paper where Matthias Arnold was first author and you were last author that talks about sex differences. The paper is called – Sex and APOE ε4 genotype modify the Alzheimer’s disease serum metabolome

Firstly, it clearly puts forward the idea that sex differences are a topic in metabolomics and that we shouldn’t avoid it. I also like the paper because it had two main figures which are radically different. So the first figure is a highly complex visualization of the data, really a lot of information. I don’t know if it’s a classical way of showing the data, but for me, it seemed to be very creative to really show the pulling of the, of the male and female differences and metabolome together with other factors and then managing to extract the few metabolites that seem to be really strongly influenced by it by sex.

And so a lot of information condensed in the picture. And then the second figure, the complete opposite, very simple box plots that has the most beautiful message I’ve ever seen. I don’t know if you remember that figure.

And I found the contrast between those two figures. Really interesting. So just to describe it quickly to the people who are listening. The second figure is a box plots of the levels of proline in the patients in blood. Without doing any special stratification of the data, when you look at the proline levels compared in two groups, one that would be diagnosed with Alzheimer’s and one that wouldn’t there’s no, there’s no difference visible.

And then you start stratifying based on sex or ApoE status. And again – not much to see: You see differences between the two sexes, but you don’t see that there’s a difference in Alzheimer’s. Then the results of the twofold stratification and suddenly it happens. For males – not the most relevant information, but for females, you clearly see that there’s a difference between Alzheimer’s and control and that organizing the data in this way allows to just reveal this important difference.

Would you like to comment on what this figure means and what the, the overall topic of sex differences in metabolomics is about?

Gabi: We all know that there is a huge sex difference between metabolite levels, but usually people think that these differences are a different layer. In case of disease or treatment, people then think that on these different layers the effect [of the disease] in the different sexes is basically the same. That might be true for most of the cases. We are all functioning in a similar way, biochemically from the, from the type of reaction, right. Even though the levels can be very different, but this example shows that some of the effects in Alzheimer’s disease that we see metabolically is only relevant for a small subgroup. So you have to have the genetic risk and be female that you see something in proline. That’s a very simple message. And that was what we wanted to convey with this figure, because it’s otherwise from the analysis, it’s sometimes a bit hard to explain why are we looking into these subgroups at all?

Because – on average -, we haven’t seen any difference in proline. So why were we looking into that more deeply – that was what we wanted to show.

Alice: It is really shown nicely in that figure because the figure is so simple. Sometimes you need the complex visualization, like in the first figure, to show as much as possible of the data and sometimes it is really important, like in the second figure the specific message is the simplicity.

Gabi: The first to figure out the outcome of the whole analysis, right? So this does not only happen for proline. You have different effects also for other subgroups and other combinations. And you also have these homogeneous effects that are really the same for everybody. Apparently.

Alice: And this is important than when you’re interested in future treatments and which targets to choose.

Gabi: Yeah. And it’s also important to know about the heterogeneity of effects because those associations can also come up in unstratified analysis if you happen that your cohort is a more biased towards a specific genotype or any other reason for this kind of bias.

And then this association can come up overall, but it’s indeed it’s only relevant for a subgroup. And maybe, if you target the other groups can; it can be harmful to target this. Right. So we don’t know. And that’s why we wanted to show that we have to analyze the heterogeneity and not only say, oh, we see a particular global effect or p-value.

Right. And we can replicate it. Maybe that’s it. It’s not the end of the story. We have to look into subgroups more carefully, especially in diseases where we already know that we have phenotypic heterogeneity even, right?

Alice: Yes. And this brings also the question of how to find new markers or new targets because here in this example, the way you stratify is based on known risk factors. So you know what to look for and you know how to stratify. Sometimes there might be a differential effect between different subgroups that we don’t know about because we simply don’t know that the subgroups exist. Like we don’t know that something is a risk factor because it hasn’t been discovered yet.

Gabi: So that was at the end of working on this paper exactly something that we cut out for the new analysis that we go away from known risk factors to a stratification using the metabolomics data. So I’m approaching things from the other end questioning whether we can use part of metabolomic status, specific metabolites, subgroup, individuals, and then in a second step, check whether we see associations with AD phenotypes and that’s something we are currently putting together in another publication where we use this approach of sub-grouping using the metabolomic state.

Alice: And this is one of the powers of metabolomics! Would you say that there is a similar kind of hope now with metabolomics there was with genomics because 20 years ago, genomics was going to solve biology, kind of, and there are lots of things that were found out and there are lots of things that are still being done with genomics, but there are lots of things that we now know don’t just rely on our genetic code or even how it’s regulated at the epigenomic level that there are things that happen directly at the metabolite level or that are visible later at the metabolite level. Do you see a new kind of hope coming from metabolomics?

Gabi: Yes. I see the hope that we can get information on different parts of lifestyle and genetic influences that come together in an individual directly with one measurement that’s I think the promise and the hope of metabolomics. I think we are getting close to what makes it always very complicated is that we have this fluctuation of metabolism all the time. So anything will influence the metabolome. Be it what you ate two days ago, or whether you did a lot of exercise the week before it all will influence your current metabolome. I think there are two solutions. One is to have a good knowledge of this fluctuation. By looking into time resolved data. That’s another branch of the group where we are very interested in understanding when do we see which metabolites are affected by these things, which ones are not. To get a really good knowledge about this [time resolution] is one solution, in my opinion, the other good news is that even if we use metabolomics data from all these different points during the day and all these different challenges, like exercise, different types of nutrition, if you take it all together, we have seen that the metabolomes of an individual is a very stable thing. Even over years, we have shown that.

Not in each and every metabolite, but altogether people stay closer to themselves than to other people closer to yourself no matter what you do to your metabolism. The other thing that we saw in the analysis of bigger cohorts over years (and the changes over years) is when the whole metabolism changes dramatically in an individual: That’s a bad sign. That’s an alarm sign. Yes. And I think that is an argument for using metabolomics as a monitoring tool to see when things are disbalanced in a way that the individual body cannot cope with it anymore.

Alice: You could imagine something like getting the metabolite panel checks every year, just so you can catch things early; and then after 10 years you have 10 pictures of your metabolome and you might see an outlier when you start developing relatively early on a disease, that’s might get caught earlier that way.

Gabi: That’s a helpful, when you look at the complete metabolome as such as a combination of metabolites, it gives you an idea, but also you have the different metabolites where you might be in the normal range still when you compare it to a normal range (as it’s done with the classical clinical chemistry). When you have your [own] monitored levels, you can see a trend already much earlier. Maybe you have always been very low in that level and high levels are bad, but you have a clear trend rising. You have a much more relevant information. And that’s where I would hope that we get metabolomics into the clinic as a supporting tool.

Alice: There’s much that can be seen through it. I mean, it’s used in certain applications in the clinics especially with newborn screening, but there’s really a lot more that could be done if metabolomics enter the clinics.

Gabi: The beauty of metabolomics I would see that we can also label specific metabolic classes or specific metabolites, a combination of them to modifiable risks that can be related to your lifestyle or to the microbiome. We have the information, for example, maybe this comes from a disbalance in your microbiome.

Maybe you had antibiotic treatment, too many of those, or something similar. So we can also see all the metabolites that show up when you look into liver problems. You see those combined with those indicating microbiome dysbiosis and act on this. I think their metabolomics has a lot to give.

Alice: Yeah, I think so too. I agree. I like the idea of lipids being more than just a membrane components, but being active members of the metabolome and of our biology. What’s your view on this?

Gabi: I’m not a particular expert on Lipids, but what I noticed over the years is that lipids come up as associated with diseases almost everywhere. Any disease to look at a lipid is involved as well. I’m pretty sure that lipids are much more than just things that you need for the membranes – it must be because they are so diverse. Why should nature make all these different things? What makes it very difficult in my opinion, when doing these lipid screens, is that the classic biochemical functional information on each and every particular liquid that can be measured nowadays is missing. So no one has done the hard work of finding out what this particular lengths, there and there the double bonds, all these [chemical functional] things that we can imagine now is really doing and in comparison to the next length and the next double bonds.

All this biochemical hard work is missing in the literature, and that makes it really difficult to do interpretation. And the other thought I always have, when I see lipids in many cases: I have seen really bad things happening in terms of interpretation. When people don’t know what the labeling of this measurement means.

Alice: What do you mean the labeling? Like, they’re not sure of which lipid they’re looking at?

Gabi: In lipidomics we have these different layers of precision. The lipid structure, but you don’t know exactly what the fatty acid chains are behind the measurement is. For example the label PC [Phosphatidylcholine] something it is hard to really capture what’s behind that label and find the interpretation and literature on that, because literature is more there for very specific ones, very specific lipids, very detailed things but what you measure is much more of a bag of things.

Alice: And in a nomenclature that is used in the publication might be different from the one you’re using as well, which makes it difficult to find information.

Gabi: Exactly. I think this complexity of lipidomic measurements is not appreciated.

Alice: I think that’s also why when we see a beautiful lipidomic study with a beautiful story of what the lipids are doing, why it’s so fascinating and exciting because some of the most interesting metabolomics or lipidomics papers that are read sometimes are really about lipids because you see it’s a whole new world, a new dimension that opens every time there is a cool story that comes out about it. And also because there’s this scarcity of information, both on the function and the structure of what is measured, that makes it difficult for the interpretation. Speaking of interpretation, this is the last topic I want to address more specifically, though this whole podcast is about interpretation, of course. I explained to you a bit when we spoke before the picture I have about interpretation project, where you start planning your project and then you have execution steps where you prepare your data and operate the tools that you want to use, and then check that everything is fine and finally, you can do the interpretation work so that you found in your case, for example, you found the associations that are significant between metabolites and genes or this kind of things, your confidence, you did a good work, and now you can do the interpretation and try to figure out the story behind let’s say Alzheimer’s or whatever the topic of the paper is.

In your experience, what would you say is the most time-consuming step of the process? What’s your feeling about it?

Gabi: There are two steps that at least in the projects we were involved in always took most of the time. The first step in getting to know the data together with the experiment. That´s a very time consuming step, because no matter whether you come from the experiments, so you are the one who does the experiment and have the idea about the research question or whether you come more from the metabolomics and you know, the metabolites; it’s always hard to really capture the rest. So for the experimentalists, who has a clear idea of what the research question is often has a very hard time to capture what’s in this metabolomics data. How much can I rely on this one? What is that metabolite?

What does it tell me? And from the other end, when you’re more the metabolomics expert, first have to understand what the research question is in a very detailed way to really help the people to make most out of the metabolomics data. And no matter on which side you are; all the studies we did were highly collaborative. You always have different parties involved.

Alice: Yes. In fairness with metabolomics, I’ve spoken with some people who do the planning, the measurements, the analysis [themselves] but I think these people are quite rare and what more often than not it’s a group effort.

Gabi: It’s possible but everybody has to make this common crown somehow.

Alice: And you have to find a language you both understand, which is sometimes difficult. There are even words that are used by biologists and informaticians or chemists they’re the same words used for different things like “feature”, for example. I always find it complicated. Even “interpretation”, some people think of interpretation something like interpreting the signals of my measurements to know what the metabolite is when other people like me from a biological point of view, say, I’m going to interpret now what’s going on the biology level. You can have conversations with people for a while until you realize you’re not talking about the same thing.

Gabi: I have this actually with a chemist and a computer scientist, they were discussing upon putting together a manuscript. I would say for 20 minutes, they were discussing a graph and after 20 minutes, it turned out that the whole time the computer scientist was thinking of the graph in a very mathematically well-defined way of a network that wanted to set up for this publication and the chemist was just talking about a conceptual figure for the paper – that things happen, at least in studies we were involved in. It takes time and I think we should give ourselves this time to really understand what we are doing together.

Alice: Yes. Because the risk, if you don’t do this, is that you go further down the steps and then, when you realize [you did not have a common understanding] you have to go back to the beginning. This is in my sense the worst, because it’s absolutely normal and you should take the time to prepare and plan and make sure that everyone understands what they have to do for any project.

Gabi: Especially in big projects, even in the funding schemes, they always expect the best experts in each and every discipline. So the best statistician doing new algorithms or coming up with new ideas to analyze and the best analytical chemist, but they are not bringing them together. They sometimes expect that one part is done only by the analytical chemist. [The other part] is done only by a statistician and that’s not necessarily the best outcome for the research question. You clearly need these experts, right? But you also need those that bring things together and that’s sometimes a bit forgotten.

This is important. So that’s one part of the interpretation where I see it’s time-consuming. And the other part is bringing together current knowledge in literature from other current screens with what my results are. That’s also a very difficult time consuming thing because you have, sometimes you have to read deep into the publications.

Alice: And this is more often the one person job, isn’t it. Or do you manage to do this in the group? Because this is much more difficult to organize in the group.

Gabi: That’s true. At least you need one person who wants to dig into it deeply. Without that person it usually is very difficult because it stays very shallow and together in one head first, then of course the discussion is needed with all the experts. Sometimes you find something and you think it’s important because you are not so familiar with that particular field. So you need that one person who is willing to connect all aspects in her head first.

Alice: Then once you’ve made your own connection of the dots, you need to go out and also expose your theory to the world as you can’t be an expert in everything or an expert in every biological field. So you need to have then inputs to see if that makes sense or if maybe you need to rework parts of the story.

Speaking of this interpretation part from the biological point of view, where do you see a place for creativity? Is it there? Is there also creativity involved in the previous steps for you or everywhere along the way, from an original scientific question to how to do the work to also how to form the team? What’s the place of creativity?

Gabi: I would see it in each and every step. I think that would be the ideal case. Of course, there is a lot of need for creativity in that last step where maybe you can have more standardized processes before, but if you really want to get the most out of the story, I think creativity at each step is bringing things forward.

Alice: My last question for you is what is your favorite metabolite and why?

Gabi: Yeah, I saw this question. Funny one. It made me saying it’s changing with the studies. I only can say what my current favorite metabolite is. I would say it is beta-citryl glutamate. Why? Of course it has to do with the current project we did where we had muscle biopsies from people doing resistance exercise. That’s a collaboration with exercise biologists from Cologne and Munich. It was not the first time that I saw beta-citryl glutamate in this study, but there, we saw a change in muscle for this metabolite. And I think the metabolite is a very good example, why I liked to do metabolomics instead of very targeted things only (like only clinical chemistry). This is a metabolite known for a while, but it has been described mostly in embryos for development of the brain and later has been seen also in Spermatogenesis, but, apart from that, it hasn’t been described much. Metabolite association studies with different phenotypes, mostly in exercise – It comes up. It’s basically nowhere described as key point or a highlight because nobody knows what the thing is doing. I found it very interesting because I immediately looked up almost automatically: Do we have a genetic association with that? Sure. If we do, I could have found that in knowledge bases as well, because there is a particular enzyme dealing with.

Synthesizing this, which always tells you that it must have an important role somewhere, and nobody knows where. And I think there metabolomics can help a lot to look into areas and fields where this metabolite might be of importance. We can get to the next steps or suggest experiments to see what the role of this metabolite is.

Alice: It’s a beautiful example of both types of applications of metabolomics as well. You might find new markers of a disease state, or of exercise or excessive exercise, but you might also better understand muscle physiology by looking into a new pathway that’s that no one looked into before in that context. It´s really cool. That was a great example.

I have the same. I always change my mind just depending on what I’m looking into. Is there something else you would like to discuss?

Gabi: I talked about all my favorite parts – time resolved metabolomics, different metabolic challenges like exercise and nutrition, the stability of metabolomes over time. On the other hand, not only for the fluctuations, but also the stability and also my favorite topic of making our association results and all the information we have about metabolites accessible to others in the way that they don’t need to be bioinformaticians.

Alice: On this beautiful note, I would like to thank you for joining me on the podcast. It was lovely to discuss with you.

Gabi: Thank you very much for the discussion. That was a great pleasure.