NCI publishes their grant funding outcome for FY2011

May 04 2012 Published by under NIH, NIH funding

This is huge. Previously the only IC that, to my knowledge, made their funding data available was the NIGMS. We grant geeks were big fans, even those of us who don't seek funding from that particular Institute of the NIH.

Well, apparently NCI has joined the party....I do hope this is a sign of things to come at other ICs. Do note that this comes in the wake of some announced policy changes from NCI head Varmus which caused some consternation. Check the comments over at writedit's pad. I concluded that this was just business as usual (i.e., as already practiced by numerous other ICs).

meh. he's actually talking normal stuff here. I read him as saying the payline is 7%ile (he says priority score but I suspect he means percentile) and then, as is totally normal business as usual for many ICs, he's talking about the gray zone wherein they violate the strict order of review for various Programmatic reasons.

Nothing to see here, save 7%ile is the lowest payline I've heard mentioned as such...

Given such practices, however, we are all intensely curious about the mysterious grey zone behavior. I have asserted in the past that I think the NIGMS data very likely stand as proxy for most, if not all, other ICs in the broad strokes. (The reason is that they dovetail nicely with the tiny bits of info that sneak out around the corners for the other ICs, if one is inclined to follow the breadcrumbs.) Importantly, the grey zone pickups are not randomly distributed. They are more likely the closer the score is to the payline. Well, now I have another data point..

First up, the Experienced Investigator graph:

yep. looks very familiar.

ok, how about the New Investigators?

hmm, notice that percentile skew? Now let's see about the ESI folks:

Yep.

Okay, so what? Well I think this should continue to motivate people to keep the heat on whatever NIH representative happens to be listening. POs or those poor, poor higher-ups that have to get up in front of a room of agitated PIs and put a happy face on things.

My point is that this skew shows that study sections are not responding to the clear intents and desire of the NIH. I.e., to treat newb investigators more fairly (or "generously" one might argue). And just like with any other initiative of the NIH with respect to review, I assume that they are serious about it. They've shown this, but changing ESI paylines, making greyzone pickups more frequently, etc. So why not fix the problem at the point of review?

1) First off, you would think that both Program and the reviewers would see that their refusal to treat ESI apps in the mix with the rest decreases their input even further. We all have the experience that the tightest discussion and the most agonized decision making as an assigned reviewer comes at the perceived payline. For the obviously top applications, all we're looking to do is to make an obvious argument. For the ones that are going to get triaged, or nearly triaged, well the tendency is to just hit the high points, slap on a few StockCritiques and assign a score. Even if the app is discussed way down in ~30-40%ile land, there isn't going to be so much argument about the exact score range. The other reviewers aren't going to be so engaged trying to decide which end of the post-discussion score range they should go with. Not like they will with applications that appear to be right around the perceived payline.

2) Next, this is a symptom of a larger problem. I.e., for the NIH trying to get the review panels on board with their broader goals. Take "Innovation". Despite a lot of hoopla in launching a new review approach, the data showed (thanks again, NIGMS) that review outcome was driven mostly by the same old, same old. I.e., Significance and Approach. The same problem applies if the ICs choose to fix this by scrutinizing the grey zone critiques for the ones that seem most "Innovative"....panels haven't discriminated the pool very well on that factor. More variance, more influence of the PO.

3) I think a lot of reviewers have no concept of these broader statistical trends. They are unaware of the data, blind to study section cultural influences and generally just haven't thought things through very well. It may be that some of these people really believe in a different type of outcome for their study section and their subfield. But they have no notion of where the problem lies, nor that it is fixable.

It is most assuredly a fixable problem. I have two themes that I've pursued on the blog. First, education of study section members with respect to what they are doing. The funding data, such as the NCI and NIGMS charts posted and linked above, is the start of this. I'd like to see these outcomes be made available to study section reviewers, right down to the level of their own review panel. Second, the solution of competing biases. Anytime there is human judgement, there is bias. Anytime. The only solution that offers high confidence of having an effect is the competition of biases. This is why the panels are explicitly representative of geography, sex, ethnicity and institution type/size. What they are not representative on is the Newb/Experienced PI axis. (Also, one might argue that the Innovative!!!1111!!/Conservative PI axis has some skew, but that's a chat for another day.)

@boehninglab was skeptical that there was any point in talking about these issues. I, naturally, am of the opinion that the surest way to prevent things you want to see happen is to remain silent. Sure, there are never any guarantees that you position will change anything at the NIH, but if you don't say something then there is a guarantee you won't be heard.

So comment on the NIH sites, in this case the NCI one. Let them know how you see their behavior and why it is good or bad for the science in your subfield.

[h/t: @salsb]
__
ps. check the NCI page for the R21 data, also interesting.
update: pps, PhysioProf noted that the score distribution for Experienced/New Investigators is much more similar for R21s than for R01s. Interesting to consider why that might be so. I would point to the "starter grant" bias....

23 responses so far

  • Boehninglab says:

    I have been in science just long enough to have witnessed many of the changes which were supposed to improve peer review. The funny thing is, even though most of us know quite well *how* grants are supposed to be reviewed, in many cases we fall back on the old system. One obvious example is scoring grants, where we are supposed to use the full range (1-9), but rarely is a 6+ given. It is a strange grant world we live in right now, and I completely agree that any positive changes should be vocalized and burned into not only CSR and IC heads, but also those of the reviewers.

  • drugmonkey says:

    Using the range is a perfect example. The SROs are *constantly* on about this. There's a simple solution. Tell every reviewer (with 7-10 assignments) that they absolutely must have a score at each end of the range. Train a culture in which it is *expected* that the default will be an even spread within the pile.

    I think I've mentioned that when I was on a panel, this was my starting point. Anchor each end and divide the range by the number of assignments. only as a secondary approach would I modify from there. There was no intent to match to some idealized bullshit about what score X really "means" in some absolute sense.

    The infamous chart on this scoring guidance document is totally wrongheaded. IMO, of course. I mean, it is nice to have some general reference for newbs. But grants are funded, per round in large part and per-FY in absolute part. There is no frigging point to maintaining some sort of all-time, decades covering standard of "best grant evah" and reserving a perfect score for this gem.

    Yet, I've had chairs insist that they were told explicitly in the Chair training sessions that a perfect score should be issued essentially once in a reviewer's review-life-time. It doesn't add up and has no sense to it.

  • zb says:

    Has there been a historical trend in this data? Are you saying that the pickups of ESI/NI grants at lower percentiles has caused a decrease in ESI/NI scores? (because reviewers are downgrading those grants, because they know that a 10% or 15% ESI grant might have a higher pickup rate?)

    Or are you more generally commenting that ESI grants are graded with bias (because, for example, the lack of representation of an anti-bias in the review committee? though it is to be noted that self-bias that represents general biases in society -- against women/minorities/new investigators is not negated by including those folks in the group, especially if there's a tendency to pick out non-representative individuals from the group).

  • zb says:

    Oh, the question about historical trend, because I'm wondering if ESI investigators scores have decreased as the "pickup" idea became more generally know?

  • Grumble says:

    You can educate your reviewers all you want. They are going to ignore you because they have their own agendas.

    Let's say you have 10 grants to review. You personally know 2 or 3 of the PIs. You are inclined to like one of them (a fun evening drinking beer at a conference a year ago) and really, really like another (a few beer evenings, plus the stuff coming out of her lab complements your own work really well). You know that out of 10 applications, only one or two are going to get funded.

    Wow, you say to yourself, all these applications are really, really good. Not terribly surprising: they all come from really really smart, accomplished people with PhDs. Now, be honest. Which ones are you going to score well?

    Of course, you have to come up with some bullshit justification for dissing the other 8. Easiest is to nail on some stupid failure of methods/approach, because there is ALWAYS going to be an incomplete methods/approach section in a 12 page grant covering 5 years of funding.

    Yeah, yeah, I've never been on study section and you're going to tell me it doesn't entirely work that way. But even if it works that way some of the time, I count it as a huge failure of the system. And reviewer education is going to do diddly-fucking-squat to change anything.

  • Susan says:

    Amen to your entire post, DM.

    Together with the previous discussion re:preliminary data on R21s (that it is always noted as a positive inclusion, despite ideally not being considered) this does indicate that study sections (still) see R21s as baby R01s for noobs to a great extent.

  • drugmonkey says:

    They are going to ignore you because they have their own agendas.

    false.

    Now, be honest. Which ones are you going to score well?

    I have given scores I knew were going to make the grant unfundable to people I really like. Also I have assigned easily-fundable scores to people I can't stand (on either personal or professional grounds). Both categories to people I don't know at all or those I don't know on any sort of personal level at all. I have battled unsuccessfully for apps that I really liked but the other reviewers didn't. I have, once or twice, managed to save a grant. I have battled to torpedo a bad app and lost. I have battled to torpedo a bad app and won.

    Your assertion is perhaps true in some tiny minority of cases, but hardly is endemic to the system. perhaps you are expressing your own biases and generalizing? kinda like data fakers assert that "everybody is doing it" to nearly universal disbelief?

    They review, what, some 60,000 apps per year? So if your assertion is true in, maybe 10 cases this is a "huge failure" and, more to the point, you have a way to absolutely guarantee this never comes into play? how's that going to work? and please include the things that your schema is going to introduce into the system that would be bad relative to the current system.

  • drugmonkey says:

    if ESI investigators scores have decreased as the "pickup" idea became more generally know?

    very early on in the "save the ESIs" thing, Zerhouni said that this was the case.

  • drugmonkey says:

    Or are you more generally commenting that ESI grants are graded with bias

    that is what I am saying, yes. That apps from newbs who do not yet have a grant (or even those who do have one) are scored lower than they would be if they were from a more experienced investigator. and to an unfair degree that represents a bias which funds less-meritorious project in preference to more-meritorious ones.

  • zb says:

    DM -- I don't think there's any data on how much Grumble's "favoritism" hypothesis works out. You point out your own anecdote, but who knows how much of a role personal knowledge/friendship/scientific compatibility plays in reviews? Your assertion that a data faker thinks everyone does it is no more valid than a non-data faker thinking that no one does it.

    We have poor data on all these questions. I, personally, am starting to get worried that data faking is actually much more common than my personal anecdotal experience would have suggested.

  • zb says:

    "That apps from newbs who do not yet have a grant (or even those who do have one) are scored lower than they would be if they were from a more experienced investigator. and to an unfair degree that represents a bias which funds less-meritorious project in preference to more-meritorious ones."

    But how would you show this? It seems to me that it's pretty likely that newb grants *would* actually be worse than experienced grants. I wouldn't have guessed off hand that they would be as different as the data here, but seeing it, I'm don't have good reason to believe that the underlying merit is that far off from what we're seeing. How would we test the idea?

  • Susan says:

    Remember that overall, there are so very many more applications than funding dollars. It's been said over & over that within the top, say, 20% -- pretty much everything is damn good science; and that there's very little discernible difference between a 6th percentile grant and a 12th percentile grant. Almost certainly not twice-as-good a difference. It's not hard to extrapolate that the top 20% of both ESI and "experienced" grants are all dollar-worthy science that we just don't have dollars for.

    Sure, newb grants don't have decades of smooth grantwriting, and decades of preliminary data and published results. But that's exactly what the SS supposed to NOT count against them. I can see where that's easily what slides newb grants from the 1st decade of percentiles, to the second, and out of funding.

  • whimple says:

    Sure, newb grants don't have decades of smooth grantwriting, and decades of preliminary data and published results. But that's exactly what the SS supposed to NOT count against them.

    "Investigator" is a key criterion of grant evaluation. To say that an investigator with a track record of grant dollars well-spent is of equivalent value to an unproven newbie is to deny reality. Note that applications are to be evaluated on the merits. If Program then wants to do something else, that's obviously their prerogative.

  • Jeremy Berg says:

    I am delighted to see that NCI released these data. This is a step in the right direction and I hope that other ICs will follow.

    With regard to the established investigator/new investigator differences in the R01 scores, I suspect that this is largely due to better scores for type 2 (competing renewal) applications compared to type 1 (new) applications (see https://loop.nigms.nih.gov/index.php/2010/09/14/scoring-analysis-with-funding-and-investigator-status/ ). Since there are no competing renewal applications for R21s, one would expect the established/new score distributions to be more similar (as they are).

  • physioprof says:

    The infamous chart on this scoring guidance document is totally wrongheaded. IMO, of course. I mean, it is nice to have some general reference for newbs. But grants are funded, per round in large part and per-FY in absolute part. There is no frigging point to maintaining some sort of all-time, decades covering standard of "best grant evah" and reserving a perfect score for this gem.

    Dude, you are confusing the old, informal "infamous chart" with all the "walking on water" "better than sex" "once in you career" shitte with this official one that you linked to here. Actually, this new official one is very good for calibrating, and makes it clear that all the way down to an impact score of 5, the grants are good. We were recently exhorted by our SRO to score approximately half of our assigned grants scores of 6-9, with the expectation that these will be triaged. If people stick with this, then the top 50%ile will be spread between 10 and somewhere between 50 and 60.

  • physioprof says:

    And Jeremy is exactly right about the competing renewals. I had forgotten about this when I first looked at the NCI data.

  • Grumble says:

    Well, DM, I don't think either one of us has anything more than anecdote to offer regarding how much personal bias contributes to grant scores.

    I've heard lots of anecdotes in which the Applicant seems to think that having a Buddy on the study section really helped. And cases where having an Enemy (or even just not having a Buddy) seemed to tank the proposal. Most of this is just guesswork, and you've told me before that you think those guesses are likely wrong. But, in the absence of solid evidence, I use what I have to draw my own conclusions, and your anecdotes of Personal Nobility are no more or less credible than my own sources that suggest that not everyone is quite so Noble, at least not in all situations.

    Look. If everything else is more or less equal among the grants you are reviewing, and the only difference is, this project will yield results that support your Theory of How the Bunny Hops, and all the others either will lead in a different direction or are irrelevant to Bunny Hopping, which project will you push for? It's a reasonable guess that it's probably not a "tiny minority of cases" where this is what you are looking at when you get 10 grants to review. But the review officers says the scores have to be distributed evenly, AND you know that only one 1 or 2 are going to get funded. You're telling me you'd just assign scores randomly? I'm glad you're that Noble. 'Cause I'm not, and I really doubt that anyone else is, either.

    (OK, maybe all 10 grants aren't "more or less equal", but what might be the case is there are 5 great grants and 5 not quite so great grants. Makes no difference to my point: you still have to decide which of the top 5 to push for.)

  • physioprof says:

    Oh, and one other thing. I suspect that the reason Innovation correlates less well with scoring than Significance and Approach is that Innovation is really fucken arbitrary. How do you define "innovative"? I bet there is much more variance in Innovation scores between reviewers than Significance and Approach. With Significance, the question is clear: how will the proposed research influence either our understanding of some basic biological process, disease state, or lead to a clinical outcome?

  • Grumble says:

    I agree that "Innovation" is just a stupid meaningless buzzword. All the addition of this criterion does is drive a potentially counterproductive techniques arms race, without necessarily increasing the quality of the proposed science. In my view, it's a good thing that scores aren't being driven by assessment of Innovation.

    Take, for instance optogenetics. Last year it was innovative. This year it's in everyone's grant, so presumably it no longer garners innovation scores as high as it used to. Yet it's still a young technique with lots of potential (not to mention kinks to be worked out), and grants that propose to use it shouldn't suffer because reviewers think it's no longer innovative.

  • Drugmonkey says:

    My anecdata from a four year stint on a study section totally trumps your paranoid PI anecdata Grumble. Seeing how things go down on the inside is superior to self-interested suspicion from the applicants.

  • Drugmonkey says:

    I am not confusing anything PP, I am relating what a study section chair claimed was a clear directive from Scarpa about perfect scores. I disagree that anything should be calibrated against a lifetime distribution of possible apps assigned to a given reviewer.

  • I disagree that anything should be calibrated against a lifetime distribution of possible apps assigned to a given reviewer.

    Dumshitte, I'm with you on that. The scoring rubric you linked to has nothing to with thatte shitte, so if you think it does, you are fucken confused.

  • Grumble says:

    "My anecdata from a four year stint on a study section totally trumps your paranoid PI anecdata"

    Really? So just by sitting there, you were able to discern whether other reviewers were biased or not? I'd love to know how you did that.

Leave a Reply