As far as statistical correlation, the p-value of 0.02 (lower, and further away from 0.05, is better) is pretty strong.
Where this article fails to convince me to start waving red meat in the faces of all the vegs I know is the limited sample size of a couple dozen people. Maybe there's some protective factor in red meat, or maybe their small sample size just overrepresented young carnivores and old vegetarians.
The small sample size is what most concerns me. Just from personal observation I have no doubt that genetics play a huge part in all of this, part that's hard to believe could be accounted for in a study of 28 people.
Yes sample size is ridiculously small. Also, I couldn't find anywhere how they actually selected these individuals. i.e. it doesn't seem this was a random double blind study
Also the suggested causal association "unexpected effect of red meat" would need a lot more controls to figure out. There could easily be several confounding variables here.
P values can't save you if the number of diverse subpopulations in the general population is larger than your sample size. Also, they break the two dozen people into many groups, so they've implicitly tested many hypotheses.
Even if they did the stats right, there can be hidden causality at play with samples this small. Maybe 100% of the people that don't eat red meat in their sample grew up in poor households with poor nutrition, or maybe they all live in the arts district, which is close to an industrial polluter.
Anyway, it looks like it is worth further study with larger samples, but I don't think you can draw any actionable conclusions from the current study.
Even if this were a good study, you couldn't conclude that from it. The conclusion would be that red meat consumption is somehow associated with longer telomere length.
Some have speculated that extending telomeres or preventing them from shortening over time can reduce some of the effects of aging, so a diet that preserves telomere length might be good for you, but that also remains to be seen.
I don't think the study is very useful because they were essentially just data mining a tiny sample for any kind of apparent correlation from 10+ variables and found one that's barely statistically significant (although the effect size looks substantial, if real).
Note: as an observational study, this is not the final word on the subject. A p of 0.02 being the strongest evidence may also be the result of p-hacking. More research is needed. Causal studies should be performed to confirm.
Further, the sample size is exceptionally low (n=28).
It is even worse, there were only 7 male participants. Imagine - of course totally theoretical - that men eat more red meat then the study may attribute the fact that men age faster to red meat.
I am extremely skeptical about the statistical analysis here.
It sounds like they tested nine food types, eight beverages and then even tried out different types of comparisons for each food/beverage. And the best p-value they observed was p=0.02:
"Among nine food types (cereal, fruits, vegetables, diary, red meat, poultry, fish, sweets and salty snacks) and eight beverages (juices, coffee, tea, mineral water, alcoholic- and sweetened carbonated beverages) only intake of red meat was related to T/S ratio. Individuals with increased consumption of red meat have had higher T/S ratio and the strongest significant differences were observed between consumer groups: “never” and “1–2 daily” (p = 0.02)."
Nowhere in their methods do they mention adjusting for multiple testing. If we do that using the standard Bonferroni approach, the p-value would have to be multiplied by a factor of at least 17 x 2 to account for the number of options that they tried. So the adjusted p-value would be 0.02 x 17 x 2 = 0.68 and thus far from statistical significance.
Well I won't add anything on whether this particular study is flawed, though I recognise the same doubts as you.
What I can say is it seems like an awfully tiresome drudge to replicate someone's statistical analysis. Where's the raw data? How did you clean it? What tests did you run? With what parameters? And what library? It's different in R / Numpy, oh no, another week of sorting the haystack.
It would be easier if there were a GitHub for each study where you could just pull the exact same analysis that the authors did.
Also, I get the feeling most experts see themselves as experts in their particular field. Not surprising, given how people start by studying some subject and then get more and more degrees in it. But actually, the more I read papers from various fields, the more it seems statistics is the critical skill. Domain knowledge is necessary, but it means very little without stats.
Now it would be nice if researchers in every field were also experts in statistics, but I happen to have a friend whose post-doc is to assist biomedical researchers with their stats. So he's a biochem domain expert, but also a stats expert, and the other researchers come to his little group to make sure things are statistically well reasoned.
That would be really sad. Every bachelor student in particle physics learns about the "look-elsewhere effect", but in a way, our field is about nothing, we do it out of intellectual curiosity. But in medical studies, where it is potentially about lifes, you see sometimes so sloppy statistical analysis. Sometimes I get the impression that it is OK, even fashionable, in certain fields to know nothing about statistics. Like in school when it was not cool to be good in math. (Source: anecdotes from fellow medical and non-medical researchers)
Thankfully, there are also many good medical studies, and I know there are many researchers in medicine who are much better statisticians then I am. I haven't looked into this concrete study, so I can't comment on it. But I'm a bit worried about the field in general...
Great catch. This is a super important factor. The real concern in science is how easy it can be to hide the fact that so many possibilities were tested, precluding this observation.
As another example, there are GWAS studies where people take surveys and medical histories of large groups of people and then sequence their DNA to look for correlations across all possibilities. Those studies only report results with p values less than 1e-4 or 1e-5.
Put simply: if you test a random event often enough, you're going to get lucky eventually.
In statistics, the question everybody asks is whether or not a given result is "significant". To answer this question, one calculates a p-value, which gives you the probability that your data are purely random (i.e. not due to any causal factor). By convention, if p < 0.05, one calls this result "significant".
There are various tests to calculate this p-value, depending on the type of data you have available. One of the most common types are t-tests, which are used when you wish to compare two samples. (For example, you might want to compare the number of butterfly species on mountain meadows with the number of species in vineyards.) The t-test takes various factors into account, including the mean and the variance of your datasets. This makes it a pretty good tool for comparison, much better than looking merely at, say, the arithmetic mean.
The problem is that occasionally, the t-test is going to give you a false positive. That means you end up thinking that two things are causally linked that in reality are completely independent. The probability of such a wrong result rises with the number of tests you perform in a given study. That is why such studies need to adjust for multiple testing. Often this is done with the Bonferroni method, which basically means multiplying your p-value by a/n, where a is your level of significance (0.05) and n is the number of tests you perform.
The concept I was missing was that testing two different random variables at the same time (do meat or vegetables affect telomere length in this group) has the same probability of yielding a false positive for one of the variables as testing the same variable twice (does meat affect telomere length in this group, does meat affect telomere length in that group) has of yielding a false positive for one of the tests.
A quick recap of the basics (multiple testing appears at the X):
Precisely, what a p-value means is: "Assuming the null hypothesis is true, the probability that I would get this test statistic from the null distribution is p," where p is your p-value.
When you call a p-value "significant" that means you're rejecting some null hypothesis. That null hypothesis entails some null distribution, and what you're saying is: "It is unlikely the test statistic I've calculated comes from that null distribution." Somehow 0.05 has become the p-value of significance, which means we say a test rejects the null hypothesis if the test statistic we calculated had less than a 1 in 20 chance of being drawn from the null distribution.
The t-test is the test the linked paper used. Essentially they compare list of telomere lengths sorted by different habits to see if they're drawn from the same (normal) distribution. For example, the significant comparison was the list of telomere lengths of all subjects who didn't eat any red meat vs. the list of telomere lengths of all subjects who ate "1-2 [red meat] daily". Based on two lists of numbers, you calculate the t-statistic. Under the null hypothesis that the two lists were drawn from the same normal distribution, the probability distribution of the t-statistics is known. If you calculated a t-statistic that falls on the tail of that distribution (meaning, a t-statistic large enough to be very unlikely) then you say: "Aha! The null hypothesis is probably wrong. The two lists come from different distributions. The red meat eaters have significantly longer telomeres."
X Now let's use a really specific example to show why that doesn't scale. Suppose you generate two lists of normally-distributed numbers with the same mean and variance What you're really testing is whether they're drawn from the same distribution which, in this case, we know they're not (we drew them from the same distribution ourselves). Now do that 100 times. If you want (and if you've already downloaded scipy) you can follow along and verify what I'm saying:
from numpy.random import randn
from scipy.stats import ttest_ind
N_tests = 100; N_samples = 500; sig_level = 0.05
results = [ttest_ind(randn(N_samples),randn(N_samples)) for i in range(N_tests)]
is_significant = [result[1] < sig_level for result in results]
print("The proportion of results that are significant is: ", sum(is_significant) / N_tests)
If you run this a few times, you'll see you get ~0.05 every time. If you increase the number of tests (the number of lists you compare), you'll be even more likely to get close to 0.05.
Now, the natural response to this is: "But that's absurd! You've just compared a bunch of random lists! That has nothing to do with a scientific inquiry looking for meaning and relation within a single dataset!" And that's almost true. And there are absolutely well-designed experiments where there are 100 perfectly reasonable hypotheses to test. What this tells you, is that in that perfectly well-designed experiment you will get about 5 significant results (p < 0.05). Hell, you'll probably even get one "highly significant" result (p < 0.01). By definition.
That means in that case it's meaningless to publish your results at the 0.05 significance level. You correct for that with something like Bonferroni correction (which is probably far too conservative) or Tukey's range test [1], both of which say roughly: Given I tried this many tests, how likely was I to get this test statistic under the null hypothesis?
One closing observation: This holds across experiments and papers. What I mean is, if every year you run 20 t-tests, at the 0.05 significance level you can expect to get 1 spurious significant result every year. I suspect most researchers run more than 20 t-tests per year. This is why a single "significant" result mined from a dataset should never be the basis of an academic paper. That's why experimental design and replicability are so important.
They claim that eating more meat increase longevity, against popular belief. But it is contradicted by what we can see: people that eat less meat like those in blue zones have higher longevity in fact.
To be fair its amongst people who study diet, it is also common belief that a ketogenic diet (which while can have lots of plant matter is often high in meat) is almost as good for longevity as caloric restriction.. :)
> Many factors can modify telomere length, among them are: nutrition and smoking habits, physical activities and socioeconomic status measured by education level.
All speculative but... I've assumed since hearing about telomeres that they were a failsafe against cancer; X cell divisions allowed, then apoptosis. If so, maybe shorter telomeres are something you can afford if you have the energy (etc) to engage in a lot of DNA maintenance.
This whole study strikes me as rather weak, I wouldn't give assign too much importance to the results.
As mentioned below, the number of participants was rather low (though that is not uncommon in the life sciences and medicine). Also, the statistics appear to be sloppy, or at any rate not sufficiently documented to be verifiable. Thirdly, they tried to investigate too many things at once ("diet, smoking habit, physical activity and education"). And lastly, they base their whole study on the supposed link between telomere length and human longevity - a somewhat obsolete hypothesis.
Study included 28 subjects (seven male and
21 female, age 18–65 years) completed the
questionnaire and gave blood for testing
peripheral blood mononuclear cells telomere
length.
So, in other words anecdotal evidence, and an interesting sidebar, but really just some notes about one type of tissue sample across 28 people.
If it were any other look at any other tissue sample, what would you say about the sample size and the reliability of voluntary answers from subjects based on memory?
Funny how a shitty study about how butter is good for you funded by the meat and dairy industry is lavished with praise, but any study that says your bad habits are bad for you is met with "oh I am unconvinced. Probably p-hacked. Too small a sample size."
Actually, the thing I thought was funny is that some comments here interpret the study as saying "red meat is bad for you" and others interpret the study as saying "red meat is good for you." I actually couldn't quite figure out wtf it was saying about red meat's impact on you. And given the conflicting interpretations here, I feel like I am not the only one.
IGF-1 is also what the non-meat eating lobby use to connect red meat eating with cancer. IGF causes cells to grow. Anything that causes cell growth, even if it generally manifests as bigger muscles, stronger bones or longevity also helps cancer cells grow and gets tagged as bad by association, even though preventing cancer is more about having a strong immune system, early detection and avoiding mutagenic chemical pollutants.
There is a huge pro-vegetarian bias is nutrition studies. That's because there are a lot of people who want the world to switch to a vegetarian diet because it is more ecologically sustainable or they are intensely devoted to animal welfare and not necessarily because it is the most healthy.
> There is a huge pro-vegetarian bias is nutrition studies. That's because there are a lot of people who want the world to switch to a vegetarian diet because it is more ecologically sustainable or they are intensely devoted to animal welfare and not necessarily because it is the most healthy.
This is the first time I've heard people charge nutrition studies are pro-vegetarian. Usually I hear people claiming food industry pressure causes the opposite bias.
No. Almost all diets have some positive and some negative effects.. One diet is better than another if its overall effects are better.
Has T/S ratio in peripheral blood mononuclear cells have positive effect for health or lifespan of humans, how large the effect is, is it smaller than negative effects of eating meat, ... , lots of questions before we get into a diet.
The study doesn't say whether red meat is "good" or "bad", just that eating red meat correlated (within their study group) with shorter telomere lengths.
Telomeres are caps at the end of the chromosomes that serve as a protection against genetic damage. They decrease in length every time a cell divides until they are eventually used up altogether - at which point a healthy cell will initiate self-destruction. Some years back they were a much-hyped focus of aging research, as there was the hypothesis that if you could keep the telomeres long, your cells (and by extension, you) could live indefinitely.
In that context, eating red meat would appear to be an unhealthy choice.
However, said hypothesis is no longer current. Telomeres probably do play a role in the aging process, but the whole affair is much more complex than just "keep your telomeres long". So basically, never eating red meat again isn't going to make you immortal.
Honest, slightly OT question: what motivates researchers to perform and publish studies with low N? Can it have a positive or negative impact on their career?
> what motivates researchers to perform and publish studies with low N?
Consider the systemic perspective. There are problems where we have no idea what's going on. (The pool of possibilities is so big, relative to the budget, that it's practically infinite.)
In these cases, lots of low-N studies could be better than a few high-N ones. It's a balance between the odds of your study giving a false result against the odds of picking the wrong hypothesis.
That - or the results are based on one of those low quality medical research studies that is performed by medical students in order to get their MD in some countries.
> what motivates researchers to perform and publish studies with low N?
You mean low observations/ data point?
It's really simple in the medical world.
Money.
Longitudinal data mean following people over years.
Imagine how much money we have to pay these people for 100 people and say for just 5 years.
Medical data/ experiment isn't cheap.
You can do it with rats but it doesn't represent human. And iirc FDA found only 50% of what happen to rats can be replicated to Human when they look at the drug result.
They have Assays now but I'm not sure how well it represent real subject.
The ways FDA does it at least for NTCR is we test for liver toxicology to make things cheaper using assays data. Instead of real human being.
Performing studies with low N makes it more likely for spurious effects to pass the various statistical effects. Therefore, it makes it more likely to obtain a result that can be published (despite being wrong).
Traditionally not really, unfortunately. The problem is that negative results usually don't allow you to tell a story.
The solution to this is a publication process where you register studies before you do them, in exchange for guaranteed publication once you're done with them. Different fields are further along in establishing such processes. (I've never worked in an empirical field myself, just reporting what I hear from a psychologist friend.)
As far as statistical correlation, the p-value of 0.02 (lower, and further away from 0.05, is better) is pretty strong.
Where this article fails to convince me to start waving red meat in the faces of all the vegs I know is the limited sample size of a couple dozen people. Maybe there's some protective factor in red meat, or maybe their small sample size just overrepresented young carnivores and old vegetarians.