SUMMARY: With this notice, the Office of Science and Technology Policy
(OSTP) within the Executive Office of the President, requests input from the community regarding enhancing public access to archived publications resulting from research funded by Federal science and technology agencies. This RFI will be active from December 10, 2009 to January 7, 2010. Respondents are invited to respond online via the Public Access Policy Forum at http://www.whitehouse.gov/open, or may submit responses via electronic mail. Responses will be re-posted on the online forum. Instructions and a timetable for daily blog topics during this period are described at http://www.whitehouse.gov/open.
DATES: Comments must be received by January 7, 2010.
ADDRESSES: Submit comments by one of the following methods:
Public Access Policy Forum: http://www.whitehouse.gov/open.
Via E-mail: firstname.lastname@example.org.
Mail: Office of Science and Technology Policy, Attn: Open
Archive for the 'open access' category
Quick note: OSTP is requesting input on public access to publications resulting from federally funded research
This is the final in my practice essays before taking the real comps test in the end of July. I need to correct the record, though. Apparently although all of these questions came from my advisor, he didn't write them all. These were ones proposed by committee members and rejected for inclusion in the exam. (the gap in numbers you see are two essays that didn't go well). This particular question might be by my advisor with an ok from the two STS committee members. I didn't have any STS questions to practice with so he came up with this one - which I think is an excellent question.
Discuss the forces that move scientists towards open sharing of information and the countervailing forces that prevent scientists from sharing information or encourage them to actively guard information. You may want to distinguish between information on research problems and hypotheses, raw data / data sets, information on methods and apparatus, and information on results. Consider the role of technology in your answer.
I know, right?
In the past two decades, much controversy and discussion has centered on public access to scientific information, the cost of scholarly journals, and information sharing within science. There are many strong forces that encourage scientists to share and equally many countervailing forces that discourage scientists from sharing. This essay describes these forces and role of technology. The essay ends by considering the role of various mandates in supporting information and data sharing in science.
1. Forces That Encourage Scientists to Share
There are many forces acting on scientists to encourage them to share information and data. These include:
- wider recognition
- finding collaborators and making information available within collaborations
- making scientific information available to the public, scientists not in research institutions, and for data mining or serendipitous location
- for generalized reciprocity, in order to get data
- to increase the speed of science or creatively solve problems
Science runs on reputation and recognition; that is, promotion, tenure, winning grants, and attracting graduate students all depend on successful publication of research results in prestigious journals and the citedness of those journal publications. Research has shown that there are many correlates to higher citation rates outside of the quality of the document. These include:
- article is on the cover
- article is discussed in the media
- article is a review article
- article is longer
- article is in a more prestigious publication
- article is open access.
This final correlate is somewhat disputed as there are studies showing both that open access does favorably impact citedness, immediacy, and usage as well as studies showing no statistically significant correlation between open access and citedness over the long term. Even if open access is not significant, we can see that being on the cover and being discussed in the media are both ways that the research is brought to the attention of other scholars. The point is that information sharing with the media increases article citation and recognition of the scientists.
Likewise the sharing of data, workflows, and algorithms in disciplinary repositories can lead to greater recognition of the scientist and his or her lab. Deposits to disciplinary repositories are signed, so high quality results are attributable to their source. The technology of the repository and standards for information structures within repositories make the shared information findable and useful.
In addition to recognition for promotion, tenure, and grants, recognition can also help in finding new collaborators and in sharing information within collaborations. By seeing the contributions of a person to a data, workflow, or e-print repository, a scientist looking for collaborators can judge the relevance of that person's experience and can also assess his or her expertise in an area.
Once scientists are in a collaboration, open and free information sharing is necessary for trust and to make the project work. This seems obvious but it must still be stated as the lack of information sharing within collaborations is frequently listed as a reason collaborations fail.
1.3 Making information available outside of the invisible college
Despite the frequent mentions in the literature that scientists do not want to consider the societal impact of their research (Polyani, Merton) and do not want to communicate with the public (Weigold), recent surveys indicate that 75% of scientists do communicate with the media about their research and most scientists want their research to be useful and used. Forces moving scientists toward open information sharing include making information available and useful:
- to scientists who cannot afford toll access to the literature
- to scientists outside of the particular research area
- for data mining
- to the public.
1.3.1 Scientists without toll access
Scientists who are not in large research institutions do not have the same access to the literature because many of the abstracting and indexing services and journals are extremely expensive. Scientists publish in open access journals, post e-prints to their web pages or repositories, or respond favorably to reprint requests to make their research available to these scientists. There is some altruism involved, but the point of publication is to make the results of research available, so sharing of publications does this.
1.3.2 Serendipitous Finds
Scientists who share information in places indexed by major search engines enable serendipitous discovery by researchers outside of the invisible college. Scientists within the research area likely know what labs are doing which work and have access to new research results. Scientists outside of the research area might happen upon this work when looking for something else.
1.3.3 Data mining
In many if not most or all areas of science, computational methods that leverage large collections of data are being used to make new knowledge. Scientists are encouraged to share information and data without restrictive licenses to enable these new uses.
1.3.4. The Public
Open sharing of information with the public can have a positive impact on the government funding of research as well as showing return on past investments in science. Besides getting government funding, scientists can be altruistic, too. While rare, there are often-touted examples of parents researching the biomedical literature to assist in the diagnosis and treatment of their sick children.
Scientists might share data for specific or generalized reciprocity. In other words, scientists might share data in order to get data from another scientist in particular or in hopes of getting data in the future from some other scientist.
1.5 Speeding Up Science
Scientists might want to share information openly to get feedback and to solve problems and to speed up the cycle of science. Posting of data or publications on a web page prior to official publication makes that information usable sooner and to a larger group. Many scientific instruments output electronic information. This information can be shared in real time via the web to allow multiple simultaneous diverse uses.
2. Forces That Prevent or Discourage Scientists from Sharing
There are many forces acting on scientists to discourage them from sharing information and data. These include:
- fear of being scooped or ideas being stolen
- Inglefinger-type rules preventing information sharing prior to publ
tual property concerns of the organization
- sensitivities of information regarding human subjects or national security
- concern over misuse of information by anti-science groups
- effort required to describe or format information for deposit or reuse
2.1 Being Scooped
Scientists sometimes do not want to share data until they have "wrung" all of the possible publishable science out of it. The concerns are that another scientist will publish the same information more quickly without the expense of gathering the data or that another scientist will find different information in the data that the original scientist missed (Birnholtz).
Indeed, a cited form of misbehavior in peer review is that the reviewer who is a competitor might use information in the submitted article or might hold up publication of an article until his or her own article is published first.
Some conferences and small workshops do not consider information shared to be "published" and there are guidelines on how this information can be used. Nevertheless, attendees might act on the information and might publish first.
2.2 Inglefinger Rules
The Inglefinger rule from the New England Journal of Medicine states that the journal will not publish any information previously presented in any venue or discussed with the media. Similarly, many journals have an embargo on discussing findings with the media until the date of publication of the journal or the posting of the article on the journal's web page in "early view." Scientists might not share information if they fear that by sharing they will not be able to publish in a prestigious journal. Some of these rules were strengthened after the cold fusion episode in which the scientists held press conferences before peer review of their work. Subsequent peer review and evaluation by other scientists found that their results were not reproducible.
2.3 Intellectual Property Concerns
Scientists might be prevented from sharing data or publishing if their organization intends to patent their discovery. Discussing a discovery or publishing the results starts a clock for patent application or can prevent a patent from being filed.
Scientists who work with human subjects or with national security information might be discouraged from sharing due to sensitivities about protecting the privacy of the subjects or concerns over export control or classified information. There are ways to anonymize human subjects data but this still presents a barrier. Likewise, scientific facts should not be classified, but the sensitivities of the research funder trump the forces encouraging the scientists to share information.
2.5 Concern Over Misuse
Open sharing of research using animal experimentation or stem cell research has endangered the physical security of the researchers. By publishing in obscure disciplinary journals, the information is available to other scientists but less likely to attract attention from anti-science groups who have reacted violently in the past. Short of these violent reactions, scientists might be concerned that their research will not be understood.
2.6 Effort Required
Finally, a force acting against the sharing of data is the effort required to describe and make data accessible for wider use. In some fields it is quite easy and straightforward to share data in pre-existing, established, and well-supported repositories. In other fields there might not be any repositories or what repositories exist might be fragmented and with uneven funding and support (Borgman). It is often easier to save the data on a cd-rom in a box under your desk than to properly document it and find a place to store it online.
Despite the competing forces that encourage and discourage scientists from sharing information, there are mandates to share information coming from several sources. First, funding bodies may require submission of a journal publication to an open repository as a condition of accepting the grant and funders of big science projects require them to make resulting data freely available online. Second, research institutions may mandate submission of publications to the institutional repository for all of their scholars. Third and most successfully so far, groups of journals in a research area might require that the supporting data be submitted to a repository at the time of publication.
There are many competing forces moving scientists to share and not to share data, workflows, and publications. The salience of each of these depends on a number of factors not discussed explicitly above but including:
- the norms and the culture of the research area (sub-discipline)
- the existence of standards and established infrastructure to support sharing
- the funding source for the research
- the scientist's employer
- the scientist's place in his or her field (in other words, an established scientist might be less concerned with being scooped)
Information scientists can support information sharing by removing barriers related to finding a repository and making the deposit of data or publications. Likewise, we can address ways to secure information such that only those who should have access do. We can also help scientists discuss information sharing with publishers and other scientists to make these concerns explicit and to remove unnecessary barriers.
Technology has facilitated information sharing and discovery, but it alone does not address the cultural and social barriers to information sharing. Ultimately, understanding the social aspects of science along with the technological requirements for information sharing is needed to encourage scientists to share.
Michael J. Kurtz of the Harvard-Smithsonian Center for Astrophysics came to speak at MPOW at a gathering of librarians from across the larger institution (MPOW is a research lab affiliated with a large private institution). He's an astronomer but more recently he's been publishing in bibliometrics quite a bit using data from the ADS. You can review his publications using this search.
As an aside, folks outside of astro and planetary sciences might not be familiar with ADS, but it's an excellent and incredibly powerful research database. Sometimes librarians turn their nose up at it because it's all about being functional and not at all about being pretty, but it essentially rocks (I'm definitely going to have to do a post on freely available research databases besides PubMed).
Kurtz' talk was basically at the speed of light and broken down into two parts: bibliometrics using usage data as compared to citations in astro with ads data and then more on scholarly communication.
I only have hand-written notes so let me just try to capture some of his points in bullets:
- like Amazon, successful recommender systems use usage data
- not new, Derek J deSolla Price graphed the obsolescence curve for articles (not cited, then get the most citations, then trails off, eventually flat with few citations after some period of time that depends on the subject)
- in an article he mapped the usage vs. age. He showed us graphs of 110 years, a few years (?), and then 90 days. (maybe doi: 10.1002/asi.20096 ). This can be modeled using exponentials with 4 different time scales.
- he showed the different usage - age graphs for traffic coming directly to ADS (presumably mostly professional scientists - this looked just like Price's model and the citation model), for people coming from Google Scholar(they take to be students), and from people coming from google (flat across all years, taken to be random members of the public).
- astronomers read and cite the same things so you can use usage instead of citations to look at individuals, institutions, countries
- the MESUR project - gathering usage data from a pile of places. Problem is the quality of the data available - doesn't follow a user through what looked at, what linked to.
- ADS has a popular items algorithm: put in a search - it matches, people who have read also read, ranks those by # of usages
- should use citedness for tenure decisions - very unstable at about 7-10 years where as usage data is pretty stable
- usage is better at measuring journals than citedness. example: medicine - clinicians read a lot more articles but don't write so much (if at all).
- page rank gets it right, IF gets it wrong (I think this was mapping various things like usage, citation, impact factor... on some big graph...)
So that's the notes I have from the first section - here's the second.
- ADS has semantic links between scholarly papers, the observations they' are based on, and other sources of data for that astronomical object (this is actually wicked cool)
- ADS also links to ArXiv and has openurl linking so you can find a copy your institution subscribes to (I had them list our parent institution, but you have to set up your own preferences to turn it on, they don't register IPs with institutions)
- it's a hodge podge now, but they're working on a virtual observatory that will make this more seamless
- elsewhere - he mentioned provenance (briefly - I saw more at the IEEE eScience conference) and the value of sharing workflows (like myExperiment) - and VisTrails
- he ended with an exhortation to support Open Access (this crowd already does - well at least the STEM folks)
I needed about 5 more minutes with each slide, but it was still a great talk. I'll have to go back and read/re-read his articles after this comps thing is over. BTW- if you're reading Michael - I'm waving and thanks for coming!
Position Statement From University Press Directors on Free Access to Scholarly Journal Articles:
1. The undersigned university press directors support the dissemination of scholarly research as broadly as possible.
2. We support the free access to scientific, technical, and medical journal articles no later than 12 months after publication. We understand that the length of time before free release of journal articles will by necessity vary for other disciplines.
3. We support the principle that scholarly research fully funded by governmental entities is a public good and should be treated as such. We support legislation that strengthens this principle and oppose legislation designed to weaken it.
4. We support the archiving and free release of the final, published version of scholarly journal articles to ensure accuracy and citation reliability.
5. We will work directly with academic libraries, governmental entities, scholarly societies, and faculty to determine appropriate strategies concerning dissemination options, including institutional repositories and national scholarly archives.
University Press of Florida
University of Akron Press
University Press of New England
Athabasca University Press
Wayne State University Press
University of Calgary Press
The University of Michigan Press
Ann Arbor, MI
The Rockefeller University Press
New York, NY
Penn State University
University Park, PA
University of Massachusetts Press