Online Chemistry Nexus Proposal

From WikiChem
Revision as of 14:10, 29 July 2009 by Walkerma (talk | contribs) (An Open Online Nexus for Chemistry: further fmt - change bullets to asterisks for wikiformatting)
Jump to: navigation, search

This is a rough draft of a proposal being prepared by User:Walkerma and others, to apply for funding through the NSF STCI program in August 2009.

An Open Online Nexus for Chemistry

The World Wide Web has transformed the way chemists work, and yet we are still a long way from realizing the full potential of this technology for an open network of chemical information. What is needed is some organization of resources, so as to create some major hubs for the new information landscape. This proposal outlines how one such "chemistry nexus" might be created, and what services it might provide for the community.

1. Introduction
a. Chemical information needs

Chemists will typically search for a variety of chemical information during a normal workday. The type of information depends greatly on the specialty of the chemist, but some general areas of information include:

  • Chemical literature – searches for relevant papers, reviews of specific topics, current awareness, patent searching, "grey information," as well as the actual primary source material.
  • Common properties of chemical compounds – structure, molecular weight, synonyms, melting point, solubility.
  • Chemical reaction information – different synthetic pathways, reagents & catalysts, reaction conditions.
  • Personal networking – job searches, other chemists working in your subject area or locality, conferences, grant advice.
  • Resources – grants available, graduate programs, sabbatical opportunities, government support, legal & business advice.
  • News and general chemical knowledge – chemical industry developments, "hot" subject areas, broad changes in law or government.
b. Current networks

To develop a successful network hub, we must consider what works well for chemists at present. We should not create a site and then try to persuade chemists to come; rather, we should examine the current needs (and frustrations!) of chemists, then aim to meet those needs.

  • Professional societies – organizations such as ACS and RSC already provide a superb array of information and resources to meet the needs of chemists, both personal and professional. These societies traditionally form the core of networks for chemists, though obviously many resources are closed to non-members. Ideally, any new information resource should be developed in collaboration with these organizations.
  • Successful free information "hubs" on the Web – besides Google, chemists frequently search for chemical information on websites such as Wikipedia, ChemSpider and government sites.
  • Many successful information hubs in chemistry require a fee, but they are available to some members of the chemistry community. The most powerful is Chemical Abstracts Service, which provides a remarkable array of information, particularly for searching the chemical literature. Other important resources include the Science Citation Index and Beilstein/Gmelin.
  • Chemists often use information hubs that have a broader scope than just chemistry, for example the Derwent World Patents Index, Lexis-Nexis and the sites of for-profit publishers such as Elsevier (Science Direct) and Wiley.
c. Why we need this Web nexus

With such great resources, why do chemists need yet another website?

  • Unfortunately, many of the existing information networks available to chemists are closed, and many involve a fee. Such networks do not work well on the modern Internet, and they simply cannot function as Web 2.0 sites. If chemistry is to capitalize on the full power of the Internet, we need new sites that work in ways that are different from the traditional providers of chemical information.
  • Younger chemists naturally turn to the Web for information, and they expect to find it there for free. A fellow-scientist recently shared his frustration that his graduate students rarely think to go beyond a Google search, to use the fee-based powerful resources that are freely available to them at the university. Rather than making a (fruitless) effort to "re-educate" every new student, we should adapt the resources to ensure that young researchers find the information they need.
  • Many existing open sites meet specific information needs for chemists, but there is no single site that brings together all of those needs under one "roof." ChemSpider provides property information on chemical compounds; Webreactions supplies chemical reaction listings. The Organic Chemistry portal describes the literature in that field, while webelements.com provides useful descriptions of the chemical elements and their basic compounds. Yet there is no site that links all of these sites together, to allow chemists to find all their answers from one chemistry portal.
  • Wikipedia is defined as a general encyclopedia, and it specifically excludes original research, specialist technical documents, opinion pieces, educational materials, etc. As such, it can never serve the broader needs of chemists, though it provides an excellent model.
  • To develop an online community of chemists. This is the means to an end – to produce a large amount of chemistry content – it is not the purpose of the proposed site. If we can bring together even a few dozen chemists, and interest them in contributing their knowledge, we can create a paradigm-changing resource. Perhaps the most active online community at present is based at the Chemistry & Chemicals WikiProjects on Wikipedia, and this numbers around 20-50 active contributors. But this group is limited to writing encyclopedic articles. ChemSpider has a smaller group of chemists curating chemical compound/structure information. But there is no online community of chemists generating broad content to meet the wider needs of the community.
d. What works well on the Web

The history of the Internet is littered with websites that have failed. We must learn what works well, and try to avoid the pitfalls. To be successful, we must

  • Speak the language of chemists. It is vital to have any new information nexus organized by chemists for chemists.
  • Know the needs of chemists. The site should provide real and relevant content, not simply trivia. It should not be centered around an exciting program, algorithm or piece of technology - that may be "cool," but is it something chemists will really use a lot?
  • Use technical expertise. A website that is slow or unreliable will never flourish, even if it is tailored for chemists. Experts can also take full advantage of more advanced technology to provide additional features such as Jmol for structure displays. However, the technological wizardry should mainly lie in the background, and it should never be allowed to dominate the site.
  • Have a functional layout. There is the basic requirement that users be able to find what they want with as few clicks/scrolls as possible, without being overloaded with words and boxes. Successful sites such as Amazon.com, Ebay.com and Etsy.com thrive because the site is designed around what the user wants and needs.
  • Have an attractive design. Aesthetics matter! We may consider ourselves scientists who don't worry about "trivia" such eye-appeal, but in fact this can make the difference between a successful site and a failure.
  • Define a clear purpose. If the site tries to do too many things, or the purpose and scope of the site is unclear, it will fail.
e. What works well with a wiki

With the success of Wikipedia. wikis have become popular in recent years, yet many wikis fail to achieve even a basic level of use. Since we plan to use a wiki for the main infrastructure, it is critical to organize so that it flourishes. It needs to have:

  • Clear scope. Although this is important for a traditional website, it is absolutely critical for a wiki, since people will only contribute their time if they see a very clear purpose for all their hard work. Wikipedia is an encyclopedia, not a sales brochure, a "how-to" site or a site for opinion pieces. Wikitravel provides travel information, it does not try to compete with Wikipedia.
  • A reason to contribute. Many wikis fail not because the purpose is poor, but because people don't care enough about helping that purpose. Wikipedia flourishes precisely because contributors have a passion for sharing their knowledge with the world, and many will work late into the night to serve that "higher purpose." Many chemists have a similar love of chemistry, but they will only share their valuable time if they can see that the site really captures that passion and adds value to their work.
  • A community of users, and a community of contributors. The user community in our case is clear. But for the wiki to succeed, we also need to identify people who will contribute content to get the site off the ground. Without contributors, a wiki will completely fail.
  • Critical mass. As well as needing contributors, a lot of work must be done to set up and publicize the site. Much of the early content may need to be written from scratch by paid employees, in order to build a body of content that makes the site viable.
  • Scalability. Any wiki that plans to become an information nexus must be able to handle growth and traffic. Beyond the obvious technology needs, there must be a definite infrastructure to organize the contributors and direct the growth effectively. There needs to be a group of paid staff to provide commitment and continuity during times of growth - voluntary contributors come and go.
  • A clear set of rules. Wikipedia would have failed but for certain rules such as "neutral point of view" that have served as community norms, and helped restrict the scope of the site. Any social networking site has to deal with people as well as content, and people often have strong opinions and feelings. A certain amount of disagreement is inevitable and even good, but clear policies can help to reduce unnecessary friction. Rules should define what type of content is inappropriate or outside the scope of the site.
  • A style guide. How should chemical structures be drawn? What chemical names are appropriate? Should the site use American English, British English, or both? How should pictures be formatted?
  • Open access, free content. In order to allow mashups with other sites, the site must be completely open. If a wiki is not completely open to the world, it will never become significant in size, and it will not be "noticed." If a wiki charges for any significant part of its content, then nearly all volunteer contributors will be completely alienated and the site will fail. All content needs to be clearly labeled with an open copyright; we plan to use the very successful Creative Commons license for this purpose.
2. Goals of the website

We would like to see the website serve two main purposes:

a. A repository of user-generated content

The site should allow for chemists to place information and data easily on the web, in a place where others can find it through a simple search. We envisage scientists providing a wide variety of information including experimental methods and results, topical reviews, news stories, physical data; more details are given in section 3.

Most chemistry resources currently have a "top-down" model, where the site administrators define the information needs with blank spaces. This site would instead follow the Wikipedia model, where the contributors themselves decide what is presented and how; this should foster an culture where most contributors feel that their work is valued. It may mean that the site develops in unexpected ways, but that should be seen as an asset, not a handicap. A flourishing community of chemists of this sort will be necessary if the site is to succeed.

b. A free chemistry portal

We aim to provide chemists with a simple, portal through which they can find chemical information on other sites. Core content available under open licenses (for example Wikipedia articles, and perhaps some ChemSpider content) might be provided on the site itself, so that it could be integrated into the site and formatted to meet our needs. Ideally we would use on-site specific "unvandalized versions" that are periodically updated, while all editing would be redirected to the host site, in order to reduce unnecessary work and "forks". Information outside the core would ideally be provided through mashups. By having one portal that allows a standard form for structure searching (and other semantic searches) of hundreds of chemistry sites, users will not need to download dozens of different Java scripts and learn the foibles of each site separately. If our portal should rise high in Google rank, as we hope, it will allow users to uncover information from small sites that may be otherwise hard to find via Google, and this will in turn also benefit lesser-known sites.

3. Structure of the site
a. General layout

In order to meet the dual goals of being both an information repository and a portal, the site should allow users to achieve either of these within a minimum of mouse-clicks, without overwhelming users with scores of buttons and search boxes. Functionality and mashups should be organized in a straightforward manner, yet they should work seamlessly around the user's needs. This means that the main features of the site will need to be laid out carefully in advance, rather than simply being allowed to evolve in a random way. The organic nature of the site means that (in time) new features may just evolve; if these become a core part of the website, the main layout may need to be changed in order to remain efficient.

We will assume that nearly all users will simply want to search on the site for information. These tasks need to be the most straightforward, but not to the total exclusion of others, since we want to encourage users to contribute content. Using the Wikimedia software, all pages have a text search box; the main page would also have a large area devoted to a structure drawing interface for inputting chemical structures.

All pages will have a simple URL that is "human-friendly" (e.g., the page on methanol would be called "Methanol") and which can have a permanent link. The Wikimedia software is organized in this way by default, and even older versions of pages can be seen and linked to.

Data will need to be organized in two ways; they must be easy for human users to read, but they should be machine-readable in order to produce virtual databases of chemical information. Fortunately, the Wikimedia software allows this through the use of transclusion and bots, and many of the technical aspects are well understood by the chemistry community on Wikipedia.

b. Components of the site
  • Chemical compound pages – listing physical properties, links to other sites, and if possible, prose content.
  • Chemical reactions, reagents – these might link to relevant literature references
  • Experimental data – physical properties, reaction results, etc. – these pages could incorporate data from open notebook science groups.
  • Literature reviews, summaries
  • Experimental procedures
  • Educational materials – lesson plans, study problems, teaching materials, at every level from grade school to graduate school.
  • News – the latest information on important breakthroughs, industry takeovers, government regulations, as well as site news.
  • Chemistry connections – links to professional societies, blogs, scientific publishers, information on grants, conferences, etc.
  • Blog – commentary on new chemistry or news
  • Wikichem community – interest groups, technical help, rules & guidelines, etc.
4. Searching the site

We expect that most users entering via the main page will use the search feature by default. The search options will be as follows:

a. Text searches

This will be most suitable for topic searches, e.g. "nanorods" or "DSC", but it might also be used to search on simple chemical names. The standard option would be a general site search using the default Wikimedia search engine. This has two main button: the "Search" button gives the user a Google-type ranked list, while the "Go" button takes the user to a page called "Nanorods" or "DSC" if one exists (if not, it defaults to the Search mode). In addition, the site structure would allow users to search selectively for certain namespaces such as research/literature content, educational content, experimental procedures, data or news.

For example, a general search on nanorods might find an encyclopedic article on nanorods, a literature review of recent developments in nanoscience which mentions them, an experimental procedure for preparing nanorods, and an archived news story about a breakthrough in the technology.

b. Structure searches

Structure searches need to have one or more chemical structures to be input by the user. As with text searches, the results could be limited by namespace - substance, reactions, educational, experimental procedures, etc. These searches could be done in two ways: through a structure drawing, or through a machine-representation of that structure such as an InChI or SMILES.

The latter would be the easier to provide. A simple text input would allow the user to search for InChI, SMILES or CAS number, and the search engine would be designed to recognize which type was being input. Ideally, the same box could also accept chemical or trade names, as is done on the ChemSpider search engine.

The former input method would require a structure drawing interface would require a Javascript, such as the one available from Symyx. Ideally we would like users to be able to copy and paste from common drawing packages such as ChemDraw. We would also like to allow substructure searches, but that would probably not be possible in the early days of the site.

Users would select from various go/search options, where "Go" would go directly to a single page on that topic, whereas "Search" would list pages meeting the search criteria.

  • Chemical compound go/search – this would take the user to pages that contain information about the specific compound. A person wanting to know a melting point or a literature reference to the compound might choose this option.
  • Chemical reaction search – the user would indicate that a particular structure was a reactant, a catalyst, a solvent or a product. The search results would screen out everything but chemical reactions. This search feature would allow one or multiple structures to be input. In this way, the user could have a very broad search ("Give me all reactions using this compound as a reactant") or a narrow search ("Give me a method for converting this compound into that compound"). Initially, we might only be able to offer simple searches with exact matches (perhaps via InChIs), but in time more sophisticated searches might be offered that require atom-atom mapping.
5. Resources needed

To build a site of this complexity cannot be done as a hobby project – it will require paid employees. We envisage the following full-time employees being needed on a permanent basis:

  • Overall site administrator – this person would be the figurehead of the organization. As well as guiding the overall direction and development of the site, she/he would work to build connections and promote the site within the chemical community.
  • Technical developer – this person would work to add valuable features to the site, organize the servers and handle bugs in the software.
  • Content administrator – this person would initially write some core content, and locate external resources that could be added to the site. He/she would also coordinate the user community and ensure proper copyright compliance.
  • Support during the development phase would also be provided through summer work by students (including undergraduates) and faculty.