e-Publications

Chemistry & the Internet 1999

back to e-Publications

 

CHEMINT99: CHEMISTRY AND THE INTERNET, SEPTEMBER 1999, WASHINGTON, DC

A conference report by Dr. Wendy A. Warr, Wendy Warr & Associates

Title (click on the link to view the article)
E-commerce - why bother?
Database Protection at the Crossroads: Recent Developments and the Long Term Perspective
Panel Discussion on E-Commerce
Collaborative Authoring with the Environmental Molecular Sciences (EMSL) Remote Document System
Ten Big Questions about Electronic Journals: Muddling our Way through Controversies and Uncertainties
Internet Journal of Chemistry - A Status Report
Panel Discussion on E-publishing
Teaching Chemistry in the Electronic Age
Gateways to Chemical Information - the MetaChem and Janus Projects Down-Under
Incorporating Multimedia into Chemistry Courseware. Examples from Oxford University
Interactive Web Page Development with Chime and JAVA
Learning Polymer Science Over the Internet: The Polymer Science Learning Center
SPARK - A Tool for Discovering Structure-Property-Activity Relationship Knowledge
Development and Growth of the Rohm and Haas Intranet - Chemists and Web Culture
Concept and Realization of Bayer’s Integrated Chemistry Information System on the Corporate Intranet
Panel Discussion on Corporate Internet/Intranets
The IBM Intellectual Property Network: An Internet Resource to Search, View, Retrieve Copies of Patents (and Patent Applications) from Around the World
The NIST Chemistry WebBook
ChemSymphony Beans for Chemistry Clients and Database Access
Enhanced CACTVS Browser of the Open NCI Database: Second Round
Closing Remarks

Most of the slides used in the presentations are available at the conference Web site. This report is designed to be complementary: it is detailed but purely textual.

E-commerce - why bother?

Bill Town, ChemWeb

Why when Amazon makes such losses does Amazon stock continue to rise? The answer lies in vision. Town gave three examples of lack of vision: the IBM executive who forecast that the world would need eight computers; an American small town mayor who said, in 1888, that one day every town in America would have a telephone; and Bill Gates, whose lack of vision and preparedness for the Internet is reported at http://www.thebee.com/bweb/iinfo58.htm. Admittedly, Gates has caught up and IE5 is now overtaking Netscape.

What is e-commerce? E-commerce is not new. EDI (Electronic Data Interchange) and EFT (Electronic Funds Transfer) were introduced in the late 1970s. Credit cards, ATM (Automatic Teller Machines) and telephone banking grew rapidly in the 1980s. E-commerce results from a convergence of technologies, namely telecommunications and computing. Today, the concept “e-commerce” is synonymous with the operation of Web sites which enable end users to make purchases of services or goods via the Internet using credit cards or other means of payment.

There are consumer Web sites for books (Amazon.com, barnes & noble, BOL, etc.), music (CDNOW, Amazon.com, etc.), travel (Expedia, Travelocity, go-fly.com, etc.) entertainment (Ticketmaster, etc.), stockbroking (Charles Schwab, etc.), software (Egghead, etc.), computers (Dell, Gateway, etc.), cars (Autobyte!, etc.) and auctions (Ebay, QXL, Amazon.com, etc.). Almost half the airline journeys in the US are ticketless. Amazon.com’s market capitalization is greater than that of News Corporation.

There are some e-commerce neologisms such as “e-tail”, a new term trying to separate e-commerce (business to business) from online retail shopping. “Clicks and Mortar” (Bricks and Mortar) means moving one’s capital from bricks and mortar (shops) to an online trading environment. A recent survey suggests that 50% of UK office workers eat their lunch at their desks. These people are “Mouse Potatoes” (Couch Potatoes) who are a captive audience for e-tailers. Will online advertising move to a TV-style prime time model?

E-commerce has also had an impact on chemistry. Online purchasing systems for chemicals, laboratory equipment, laboratory supplies, books and software may have an end-user or business-to-business perspective. Online purchasing systems for chemicals are available through ChemConnect, Chemdex, CheMatch, e-Chemicals, Eastman, fobchemicals.com, Polymerland.com and Sigma-Aldrich. Intranet/extranet solutions are offered by EMAX, MDL Information Systems (SMART), and CambridgeSoft (ChemACX). Online catalogs include ACD (from MDL Information Systems) and ACX (from CambridgeSoft). SciQuest.com, Chromatography.net, PowerPurchasing.com and selectscience.net offer online purchasing systems for laboratory equipment. Lab-X runs auctions of laboratory equipment.

Chemdex Corporation is a provider of business-to-business e-commerce solutions for the life sciences industry. It has an extensive online marketplace, powerful purchasing functionality, tailored to the needs of each customer, and comprehensive service and support. It was founded in 1997 and was valued at $863 million on flotation in 1999. It claims to “bring together life science enterprises, purchasing professionals, researchers and suppliers to buy and sell products, resulting in streamlined business processes, enhanced productivity and reduced costs”.

Primary goals for an e-commerce site are ease-of-use, customer value, customer service, branding, trust, speed, and good information. Ease-of-use is primarily a function of design: removal of barriers to access and removal of barriers to purchase. “A site should have a simple and intuitive interface that does not distract users from buying products.” Access to a wide selection of products and a personalized customer experience enhance customer value. “Users expect a better price for an online offering because they believe it shortcuts traditional distribution channels.”

Town listed some customer service features. Response time for emails should be less than 2 hours. Other requisites are fast reliable deliveries; tracking the status of order and delivery online; publishing phone and fax numbers; and providing local customer service in the user’s own language. As regards branding, strong online brands develop rapidly in e-commerce (e.g., Amazon.com) but existing brands can be ported to the Web with care (e.g. Barnes & Noble, Charles Schwab). Trust is important. The customer must feel comfortable about the credibility of the merchant and the safety of the purchasing process. Privacy protection is a major issue in Europe. Security policy, privacy statements and consistent branding help to build user confidence.

Speed is primarily a function of design. The choice of Internet Service Provider (ISP) is critical. Programming has a major impact. “Good information” includes a wide range of accurate and reliable content, an intelligent search engine, and up-to-date prices. The most important elements of design are clear navigation, simplicity, download efficiency, consistency, personalization and interactivity.

Security and reliability are significant issues. Users should be able to encrypt their messages and orders to ensure privacy. Identities of purchasers and vendors must be established by the exchange of private keys. Digital certificates are the long term solution. In terms of reliability, leading edge e-commerce sites aim for 99.999% uptime (5 minutes of outage per year). A modular architecture with plenty of scope for redundancy is essential.

The cost of building an e-commerce storefront ranges from $2 million to $40 million and the cost of maintaining it is $2 million to $50 million per year. So, why bother? The potential size of the market is one reason. Consumer e-commerce business was worth $13 billion in 1999; the business-to-business market was worth $68 billion. By 2008 the figures are expected to be $108 billion and $3200 billion, respectively. A leading US consultant on e-commerce has said “It’s not easy, it’s not cheap, and it’s not optional”.

Town concluded with a quotation from Lou Gerstner, head of IBM:

"The new dot.com start-ups are fireflies before the storm - all stirred up, giving off sparks. The storm that’s arriving - the real disturbance in the force - is when the thousands and thousands of institutions that exist today seize the power of this global computing and communications infrastructure and use it to transform themselves. That’s the real revolution”.

Discussion:

Someone suggested that “mom-and-pop show” ventures are possible. ChemConnect started with only three people. Steve Bachrach does not see the revolution mentioned in the above IBM quotation. He sees an evolution; Sears was the revolution. Town said he sees a rapid evolution. Another attendee said the revolution is in the information about the product: how much information do you give for free? Steve Heller said that you cannot always bypass the dealers: someone still has to fix your car. Everything cannot be free. People have to be paid. There was a question about microcharges. Town agreed that short term subscriptions or microcharges are possible ways ahead.

There was some discussion about Amazon.com and its need for huge warehouses. Heller said the average size of an order (in dollars) is going down and Amazon is having to build buildings. Henry Rzepa said that the irony of Amazon is that 25% of an order is the delivery charge. Amazon.com is a temporary solution until e-books arrives. Heller said you will then order the book electronically and go round to Kinko’s to collect it. This will soon be the way with university courses too. Someone else pointed out that Amazon is already more than just a book store. Randy Marcinko said that WebVend is valued at $1.3 billion and it no longer delivers just groceries: it will deliver anything for your home. This might be revolutionary. Patrick van der Valk pointed out that buyers are now in control.

Back to the top

Database Protection at the Crossroads: Recent Developments and the Long Term Perspective

Jerome Reichman, Vanderbilt University, School of Law

Reichman represents the National Academy of Sciences (NAS) and others, as well as the International Council of Scientific Unions (ICSU). He therefore has a "point of view". First he addressed the situation as it is now. If you buy a chemistry handbook or article with extra data it is eligible for copyright. Copyright law is user-friendly and balanced with respect to scientific users: all of us can use the ideas and data since copyright protects the style, not the data. There is no "right to use" as in patent law, so you could write your own version as long as you do not "pass off". In European law, normal rights of attribution are very important. You can write a follow-on book from the ideas and data so long as you make your own selection and arrangement. You can combine lots of databases into one new database without paying the originators. You benefit from "fair use" (US) or "fair dealing" (Europe). There are compulsory licenses for classroom use and you can give away your book or lend it (under the first sale doctrine). You can also use it as often and as long as you like (cf. "private use" in Europe). The sui generis model of database protection is a hybrid of copyright and patent law. There are licenses and database law strengthens the licenses. Suppose Europe brings in sui generis (bearing in mind that a US version is possible, see HR354, the latest version). The EU 1996 directive, the mother of all this, protects the database against extraction and reutilization of the whole or of substantial parts. Protection is normally for fifteen years. However, if the database is maintained, you can protect it for ever. There are no exceptions for reuse by academics. There are, however, limited exceptions for extraction for scientific purposes (but not in France).In the US one lost sale is significant: small parts could be protected. Is the reciprocity clause illegal internationally? Publishers use this as a scare clause to suggest that European law applies in the US. Publishers are protected if use or extraction would cause "harm" in the actual or neighboring market for the product or service that incorporates the collection. In the United States, static databases are not probably protectable after 15 years. There would be limited exceptions for scientific use but that is of little comfort.Now look at the model again, in the new version. You can’t use the data: use is controlled by the site license. You pay to read the data, then there is a second price for downloading and a third price for reuse. You can’t write a new article based on the data because the data doesn’t enter the public domain. Independent creation is allowed, so scarce funds are used to duplicate knowledge. This contradicts the laws of science. Data may not be regeneratable in some cases. Sole source providers are pushing this through: competition in the database industry is the exception rather than the rule. The database owner is not obliged to give permission. From the NAS point of view, data from public and private databases cannot be provided without multiple permissions.

In the electronic environment there is never a sale, only a license. The “USCITA” model law would make shrink wrap licenses properly legal and perpetual. The price will be as high as can be obtained in a market with no competition. We can’t lend our data to other scientists even though we have paid to access data. There is no sale, only a license. We would be “harming” the market by lending our data, which will not be in the public domain until 15 years hence. In the past, in the US, data was (almost) free. In the future, data already paid for by the taxpayer will be sold at an inflated price. This is what is happening in Europe. The EC is charging for reading and downloading and reuse. There is no obligation to license the data, and the reader has no rights except for the right to use an insubstantial part. The price is what the monopoly market will stand. HR354 in a limited way recognizes the misuse of the right.

Science builds on science. The data most commonly used is disproportionately affected. All research will cost more and there will be more administration. There will be limits on which investigations can be done with data. Government data will be captured. West’s monopoly as case law has been broken so vendors such as West are pressing for HR354. This would lead to disincentives to reuse data, diminished cooperation between scientists, and lost opportunity costs. It would have a downstream effect on the US economy. A true unfair competition model is based on conduct. For example, trade secret is not an exclusive right but it protects you against bad behavior.

An important recent good development has been the production of government position papers supporting the scientific community’s view. The US Patent and Trademark Office (USPTO) has changed its attitude recently and the Department of Commerce’s General Counsel has taken a position. The intention is that any new legislation must prevent private parties capturing government data. Senator Hatch invited stakeholders to negotiate between August and October 1998. US academics used internal lobbying funds to participate. Academics are in favor of some legislation in the form of an unfair competition model. Publishers have alarmed staff members.

The Hatch database discussion draft is no longer secret: it is on the table. It has moved away from the EU model and the notion of “harm” has changed. There is a definition of significant harm, e.g., preventing investment. “Fair use” is a safety net whereas under EU law every case of possible fair use would have to go to court. Additional immunities have been put in for libraries, classrooms etc. There is a clearly worded duration clause but the dynamic database issue is still a problem. Publishers would not be able to overrule “scientific need” by a contract but they would be protected against piracy. However, not all the issues have been addressed. The “value-added” problem is not solved. Dun and Bradstreet oppose the plan because it would cost more to acquire the data than they could get by selling it. Disney is also opposed.

These provisions were in HR2281. Database protection was taken out and became HR354 in 1999. A coalition of universities and scientists could not get HR354 altered so HR1858, a truly minimalist bill, was produced in opposition. The two bills cancel each other out and there is stalemate. The Senate will intervene. Publishers can still get control even without legislation. The NRC (National Research Council) is to publish a new document on October 15 which can be downloaded on October 30. This is a serious study. This committee does make recommendations. (“Bits of Power” did not.) The “sharing ethos” must be encouraged. Database protection ignores the dual characteristics of data, impeding the genius of US innovation.

Back to the top

Panel Discussion on E-Commerce

Panel Chairman:
Wendy Warr, Wendy Warr & Associates

Panel Members:
Louis Culot, CambridgeSoft
John Custer, Sigma-Aldrich
Jeff Leane, Chemdex
Bill Town, ChemWeb
Patrick van der Valk, ChemConnect

Each of the five panelists first gave a short introduction to his own e-business. Culot said that EDI is not new. The Net's not new. You can do EDI on the Net. What’s really new is the Web browser. Everyone has the client. The need to get an EDI connection up and working used to be a serious barrier. In the past, only a handful of people in the purchasing departments at large companies could talk to the suppliers; now the Web browser provides an intimate connection between the supplier and consumer. A scientist can search CambridgeSoft’s ChemACX at his desk. The system is modeled on Amazon, with a shopping cart etc. The scientist can choose his brand and do chemical substructure searching. The purchasing department reviews the order and fires it off. The scientist is the structure-searching customer. CambridgeSoft places emphasis on scientific search tools: ChemDraw has been in use since 1986.

Custer said customer are employees and shareholders. There are new jobs for employees. Shareholders need a profit. Custer likes panel discussions because he interacts with customers. The Sigma-Aldrich focus is three fold. The company wants to keep direct relationships with customers so three things are needed from the Web site. Firstly content. Sigma-Aldrich brands have been in business 50 years. Tons of information are available, including Material Safety Data Sheets (MSDS) and technical bulletins. Ease of use of this information is important. You can search for text, structure, application and CAS Registry Number. To close any gaps, Sigma-Aldrich has strategic partnerships.

E-commerce has been around for a long time. Sigma-Aldrich has had EDI for many years. July 1995 saw the first Web order but this was faxed. Their philosophy is to emulate the business process. Thus you will see these components from the company: product availability, back order status and contractual pricing.

Leane said that Chemdex customers are in biotechnology and pharmaceutical research. There are three constituencies or groups of customers: individual scientists, the enterprise, and suppliers. The Internet creates an opportunity to cut the world up into pieces and rearrange the pieces in ways that are more beneficial – that is important to add. There are new opportunities for sellers and buyers. Buyers need procurement solutions not “shopping”. Custer talked of non discretionary purchases. Are you going to do it efficiently? The three types of users (end users, enterprise users and suppliers) have different needs.

The marketplace is robust. There is a critical mass in one place on the Net, and the tools for finding and buying are there. There are also collaborators in groups, for example, HighWire Press. Individual scientists work in an enterprise and a hosted procurement system can cut overheads. The usual business processes must work in the online system. Chemdex offers an outsourced service for customers but some customers have their own procurement software. They need to integrate their system with Chemdex’s. Chemdex offers training, support, and customer service. It is a professional service organization. It needs to do all these things to provide a complete solution for all three user groups. In September 1999, the company had 60 enterprise customers including Rhône Poulenc Rorer, Bristol Myers Squibb, Genentech, The University of California, San Francisco and Harvard University. Some 15,000 scientists have registered to use Chemdex; 700,000 products are offered. (Again, these are September figures.) This is critical mass.

Van der Valk started with some company history. At Betacyte he worked with ChemConnect for 2 years before they finally met face to face. In 1997 they combined the companies and in 1998 they went for venture funding. The company now has offices in San Francisco, Houston, Singapore and London. ChemConnect runs a global marketplace that deals with large quantities but never takes possession of the chemicals. The size of an average transaction on their site is around $200,000. People place offers and make bids. However, this is not a true auction since the host can chose the trading partner unlike the “forced” partner with the best price in an auction model. Bids are made for lots or partial lots and ChemConnect takes a percentage of the completed transaction. There are no membership fees. There is no ad placement fee. Critical mass is very important to ChemConnect: they need a lot of traffic. Existing members have stayed with them. ChemConnect operates a true marketplace where people can see pricing, see demand etc.

Town said ChemWeb.com is an information aggregator with 127,000 members, 30% of whom visit at least once a month, while 10% visit more often. WebTrends for ChemWeb.com shows about 90,000 visits a month. A user session lasts about 15 minutes. Web traffic continues to increase: currently it is 800,000 page impressions per month. ChemWeb.com promotes traffic by offering free services such as Beilstein Abstracts and a live Webcast of the IgNoble awards ceremony.

Advertising revenues are growing rapidly. There is also growth in article sales and database subscriptions. Growth is 100% year on year. Full text journals and databases comprise about 16 million records, most of them in databases. There are also 3000 books, hundreds of software items and some equipment.

Henry Rzepa said that the three concepts one-stop shopping, relationships with customers, and critical mass are not reconcilable: you cannot have all three. Where do the various companies stand on this? Culot cited examples on the Internet where 5-12 companies service an area. The horizontal Web indexes Yahoo, Lycos, InfoSeek and AltaVista do have critical mass and they do have relationships with customers. CambridgeSoft does not manufacture chemicals but it does establish relationships with manufacturers. The same companies may be available through Chemdex but the interface or whatever may be different. The heart is in the supplier relationships.

Van der Valk talked about competition in real life, for example many bakeries in one district. Ultimately mergers will happen in the Internet business as well. Custer agreed with Rzepa that you cannot have all three features. He said that Sigma-Aldrich is not a one-stop shop. The company has over 200,000 products but there are some markets that it does not get into. It is not a distributor. It is, however, after the one-to-one relationships. Town said that ChemWeb.com aims to be a one-stop shop, although it is not there yet. It has aggregated over 30 publishers and data suppliers. It is aiming at a critical mass. Not only does it have relationships with customers: interaction between ChemWeb.com members also takes place in discussion groups and forums. The panelists’ companies all have a rather different focus.

Leane agreed that each company has a different approach to the problem. The customer definition of one-stop shop might be whether he can solve his problem better or more efficiently by using this service. Rzepa pointed out that there is a bunch of one-stop shops. Leane agreed - but there is a whole bunch of problems too. Tim Cook (Cherwell Scientific) asked a subsidiary question to the question of speed of consolidation in the market place and that is the issue of branding. How do you establish yourself as a premier brand in e-commerce?

Leane emphasized the speed and convenience of getting your job done effectively. Three hundred individual suppliers save money and their brands are all preserved within Chemdex. Van der Valk has had a new service up for only 8-9 weeks and is still figuring out a branding strategy. Culot said scientists understand brands from a chemical point of view. They are not after “CambridgeSoft hexane”. Smaller specialty chemical manufacturers get their name out there by using the CambridgeSoft (CS) site. CambridgeSoft’s strength is on the scientist’s desktop: the CS brand focuses on the scientist. Custer noted that Sigma and Aldrich are two big brands, entrenched in the marketplace. Sigma-Aldrich makes customers aware of the company’s other brands, for example, a chromatography brand. The corporate entity represents various brands. Town commented that no-one knows the size of market. He guesses that there are perhaps half a million or one million chemists world-wide. The success of ChemWeb.com’s branding shows in its penetration. This is somewhere between 10% and 20% now. But branding success also shows in the number of people who are linking to ChemWeb.com and how often the logo appears on other people’s sites.

Dick Wife asked if there is room for a fifth Chemdex and a tenth CambridgeSoft? Leone thinks there will be a very small number of winners. In the Chemdex aggregation model, every new buyer on board enhances Chemdex’s attractiveness to sellers and every new vendor on board enhances its attractiveness to buyers. This is a self-reinforcing phenomenon so businesses will either have a very high market share or a very low one. There is perhaps room for two Chemdexes but not many more. Competitors include the Anderson Unicom group, SciQuest, BioSpace and Cybersystems. Van der Valk says it depends on how you define the company. ChemConnect is not as broad as a Vertical Net, which is too broad. PetrochemNet handles a small number of chemicals, only in the US. ChemConnect is broader than that. There will not be lots of ChemConnects. Culot said that the panelists work in different markets but there will be some consolidation in the industry. Town thinks that competition is healthy but people do want one place to find everything in the chemical area. The end user wants aggregation, not a multiplicity of publishers’ sites, for example. People can switch between vendors more easily nowadays, said Culot. Customers have mobility. The biggest natural barrier to entry is supplier relationships. CambridgeSoft couldn’t break into the Wallmart market but Wallmart does have competitors.

Bachrach asked what happens if you get critical mass. Custer suggested he define critical mass. Van der Valk stated that for ChemConnect critical mass is critical. They must have enough advertisers or the customers will not come back again.. Chemdex sees critical mass differently: the customer does not see critical mass, Chemdex does.

Warr asked a question about US orientation. Custer replied that Sigma-Aldrich takes orders from all over the world. They will soon have a Japanese site, then a Chinese site. However, features such as order tracking will initially be for the US only. Leane recognizes that science is global. The problems Chemdex addresses are even more complex outside the US. About 35% of ChemConnect members are from the US but India and China have also embraced the system. ChemConnect is introducing a rating system. How do you rate companies across the whole world? There are also interesting legalities issues such as a Ukrainian company selling TNT to a Chinese buyer. Is ChemConnect liable? Who knows? ChemWeb.com has global membership, said Town, and there has been rapid growth in Asian usage. Many publishers are based in the West and the Far East needs timely information. Culot said that the CambridgeSoft site uses multiple languages. Buying ChemOffice software is not a problem but there is a barrier to buying a chemical. There are problems with currencies and distribution since CS is not the supplier. It is hard to keep pricing current.

What about handling chemical structures? ChemConnect does not need them. Structures are critical to CambridgeSoft in the research marketplace. Custer said that there are 15,000 searches a day in total on the Sigma-Aldrich site and 10% of those searches are by structure, especially from users in the combinatorial chemistry field. Town said that structure searching in ChemWeb.com was vital for the end-user scientist, using for example, the Available Chemicals Directory (ACD), the OHS MSDS database, and property databases on ChemWeb.com. Leane said that Chemdex has a life science orientation so structures are less important. However, precise search on biomolecules requires specialized search engines. Chemdex has a family of such search engines e.g., for monoclonal antibodies and DNA probes.

Norman Schmuff asked whether trusted authorities such as Verisign were very important in e-commerce and, if so, how do you certify the certifying authority? Do we need them if IP tunneling is done? Culot has found that smaller biotechnology companies will do business with CambridgeSoft over the Internet but Procter & Gamble wants ChemACX in house, for security reasons. It protects their chemistry searches and orders. People want to keep their reagents secret. Van der Valk said his entire chemical exchange is on a secure server but even with secure servers there are problems. Microsoft Internet Explorer 5 had a security bug and a patch was needed. Secure email is still a problem. Leane says he will talk to anyone about this issue. Aggregators must “leverage” security. Leane has demonstrated to a customer how easy it was to hack into his site. It’s hard and expensive to build a secure site.

Back to the top

Collaborative Authoring with the Environmental Molecular Sciences (EMSL) Remote Document System

Chris Parkinson, Battelle Pacific Northwest National Laboratory

There are multiple environmental options: single user or multiple users, working on single documents or multiple documents, on one platform or multiple heterogeneous platforms or multiple homogeneous platforms. Producing a solution for one user on a PC on the desktop is no problem. Multiple users on a single platform cause no problem. For multiple users of multiple documents on the Macintosh, AppleTalk AFs are an easy solution. Multiple users on multiple heterogeneous platforms are more of a problem. Outlook can cope with more than two users, read-only but there is no UNIX support. There are many options for a Web based system but they are write-once read only and do not invite collaboration. So EMSL needed to solve the problem of multiple users reading and writing multiple documents, on multiple heterogeneous platforms.

Parkinson gave a diagram of the Remote Document System (RDS) in which multiple users have access to a Java applet on a Web server which links with a document server. He also showed an RDS applet screen. This was a somewhat clumsy solution. Document sharing involved document locking. Access via the applet offered no modern UI behavior. There was no application launching so editing was clumsy. Not surprisingly, the system was unused.

The EMSL Desktop extended the RDS system. There was a move away from the browser to the desktop. Drag and drop and application launching were allowed. The code is Java and the system works on Macintosh, PC and UNIX. It is like one big window and within it windows: a desktop on a desktop. Parkinson showed a diagram of a user accessing the EMSL desktop application, which links to lots of multiple file servers and lots of different hard drives. There is just one GUI. He then gave a diagram of the file server architecture. Sitting on top of a basic file server are the FTP file server, the Web space file server and the collaborative file server. The FTP file ever can access any existing FTP server. The Web space file server allows the user to edit files directly on a Web server. Multiple authoring, version control and document tracking are handled by the collaborative file server, which can be accessed by PCs, Macintoshes and the UNIX boxes attached to NMR machines. There is user file space and project work space on the server, which is used by 2000 people from different machines. Instruments are actually linked. It is possible to drag and drop to a native operating system and to drag and drop between the EMSL desktop and native desktop. Or you can drag and drop between EMSL desktop windows. Thus there are from a Macintosh to a PC, for example, on the other side of the world.

When a user opens a local application with a remote document the file is locked to other users until the first user has closed Word. What about chemistry? There is a built-in molecular viewer for collaboration on multi-author papers. Molecular viewers and editors can be added. For remote instrument data gathering, a file server is placed on the calculation machine to monitor jobs.

All technical issues from RDS have now been resolved and a full version of the new system will be deployed in November. Public release is scheduled for December. In the meanwhile the old version is on the Web.

Back to the top

Ten Big Questions about Electronic Journals: Muddling our Way through Controversies and Uncertainties

Robert Bovenschulte, American Chemical Society (ACS)

Bovenschulte's subtitle was “the problem we face”. He spoke about STM publishing and Web publishing, and not necessarily about ACS policies. Does electronic publishing reduce costs ? In a way it does because it delivers process re-engineering but a large incremental investment is needed in infrastructure and staff to achieve the enhanced functionality. So there are two answers: in the short run, no savings are in sight, just more cost, while in the long run some moderate savings are possible. Today all STM publishers are pouring money in.

Is faster better? In principle, yes, but in practice, the answer is uncertain. Bovenschulte gave examples of ACS innovations: articles ASAP (as soon as publishable), electronic authority and proofing tools, where the author proofs the final version on the Web (used only for Organic Letters at the moment), accelerated peer review and new database production systems and online distribution. Is it worth the cost? Publishers compete for the best authors and editors. Whether or not there is a social advantage, all publishers are going this way because of competition.

How should the electronic archive be maintained? The print analogy can be misleading. In the electronic arena, there are problems with different versions of articles over time, keeping pace with ever changing technologies, and migration. Then there is the question of funding the archive. Who will pay?

What is the outlook for pricing models ? There are many approaches but there is little consensus:

  • Print and online: bundled versus unbundled.
  • Locus of value: print versus electronic. (Is the electronic version an advantageous add-on or is it the main product?)
  • Unlimited versus metered access.
  • All journals versus selected titles. (ACS gives the individual a choice.)
  • Subscriptions versus article packages.

The challenge is how to relate price to usage. Some librarians like the idea of article packages, particularly in corporates.

Who should own the copyright to journal articles ? The tradition is for the author to assign copyright to the publisher but the tradition is being challenged. If the challenge prevails, the communication of scientific information will be retarded. ACS is inundated with requests for reuse. If they have to consult the author in every case, publication will be delayed and costs will be increased. Publishers could have a system in which the author decides which rights he needs to be asked about. Administrators of libraries want to break with tradition as a way of reducing costs but Bovenschulte thinks it would actually slow things down and add to costs. ACS has found that authors do not feel strongly about this issue: they think the current system works perfectly well.

Should all the sciences adopt the preprint/e-print model? The advantages are speed, free information and accessibility. The disadvantages are lower quality standards, more clutter, haphazard peer review and the problem of prior publication.

Should peer review be redefined? Is it a useless vestige of elitism or an indispensable certification of acceptable quality? The issue is informal and undirected commentary versus the tradition of solicited evaluation. Is the type of review carried out by LANL good enough? [LANL is the Los Alamos National Laboratory which handles e-prints in physics.] Disciplines differ. There is little prospect of sweeping changes soon. Chemists are much less willing than physicists to accept the LANL notion and ACS editors are unanimously against the idea of a repository of reprints, but who knows what might the situation might be in two years time? [After Bovenschulte’s talk, Mark Doyle from the American Physical Society (APS) challenged his views on the LANL model and Bovenschulte agreed that he oversimplified because of time constraints.]

Who should link to whom, and how? There are many gaps in the “seamless Web”. The user’s ideal is a single interface and universal access. Technically there is complexity: all-to-all versus nodes. Business relationships and money are involved: publishers are concerned about their brand identity and their image. Issues of control of searching and access worry secondary publishers such as Chemical Abstracts Service (CAS) and the Institute for Scientific Information (ISI) and also primary publishers. Cooperation among competitors is also a concern for publishers: if one or two systems gain control, it could lead to a secondary service which would be in charge and dictate terms.

Will secondary publishers become obsolete? Almost certainly not. Searching is more important than ever; efficiency and effectiveness are crucial. Primary publishers are building search layers but secondary publishers own the tools and the expertise. Who is the “parasite” now? Perhaps this signals a shift in power, Those in power could dictate terms to primary publishers.

What is the government’s proper role in scientific publishing? Medline has been followed by services such as PubRef, PubMed, PubMed Central (forming E-biomed) and PubScience. PubMed Central is less threatening than E-biomed but is still a troubling prospect. Should the government be providing funds or taking over the process? The troubling prospect is centralized authority and control. There is value in multiplicity although “free” is always the customer’s favorite price.

Back to the top

The Internet Journal of Chemistry - A Status Report

Steven M. Bachrach, Trinity University

Bachrach started by answering the question ”Why launch a new journal?”. By 1996 HTML was sufficiently full-featured and chemical MIME had appeared. Electronic chemistry journals were inferior to print ones: the print editions were the authority. Electronic journals used bad HTML and were featureless except for speed. The Internet Journal of Chemistry (IJC) was launched with several targets including full incorporation of multimedia (enhanced chemistry publication), promotion of Internet technologies, low cost, and liberal copyright policies. It is peer reviewed and covers all areas of chemistry.

Bachrach gave a demonstration. You don’t need to register. The opening screen had 5 frames: a logo (link to home) top left, text top right, references bottom right, and two navigation frames on the left, one for the site and one for the current page. You can change the layout if you want. When you click on a molecule, MDL Chime is launched. Thus you can manipulate molecules. You can annotate, although this is restricted to your space at the moment. Here is an advantage of registration: you get some disk space for annotation. Bachrach showed a table of vibration frequencies. These are hyperlinked so that you can get a different unit of measurement if you like. In the personal preferences area you can customize the journal. You can use Chime or a Java applet or another viewer or just GIF. On-the-fly units conversion can be set up or units enforcement. You can have some control of the reference styles. This sort of arbitrary decision should be in the hands of the user. This is what Bachrach means by “enhanced chemistry publication”: the user can change the screen layout, frames or colors.

Submission of articles, review and revision are all electronic, using Web pages or email. The author submits a manuscript in “standard” HTML with no special tags. The journals editors had to devise a way of tracking articles. Bachrach gave a diagram.

AUTHOR

 

WEB SERVER

 

EDITOR

Article in HTML

®

M/s submission form

®

Assign referees

        ¯
   

Article tracking form

¬  
    ¯    

Write review

¬      
¯ ®

Referee report submission form

®

Accept/reject

        ¯
   

Article tracking form

¬  
    ¯    

Revise article

¬      
¯        
  ®

Revision submission form

®  

The only piece of paper is the copyright licensing agreement.

Bachrach gave another chart to show how customization works. The author submits HTML which goes into the IJC parser (a Perl script) which makes an IJC metafile. XML-like tags are used for units and the original value is included in the form of an HTML comment. Page requests go to the Web server request handler. The server generates HTML and sends information to the browser. An “article” is actually a database. The Web server is written in Java. Bachrach gave an example where the server-produced HTML operated with a user profile which indicated display of energies in KJ/mol, hot-linking unit conversion and the energies directed to a certain frame.

IJC does not have "issues" as such but special issues are possible as overlays of the normal journal. Color highlighting in the Table of Contents indicates participation in the special issue and articles may be sequential or non-sequential within the journal. The library republishes "lost" articles. For abstracting and indexing, relies on other vendors. CAS has indexed since launch, ISI since Volume 2. (ISI originally had problems with electronic journals and was unable to print if there was no PDF format.)

How do you cite an IJC article? There is no page number but you can use the article number in the page number field. The URL is readily automated. Bachrach prefers to use the URL rather than a citation e.g., http://www.IJC.org/articles/YEARvV/article#/. He also showed some IJC meta tags.

A number of collaborations are in progress. IJC is available through ChemWeb.com but is not on the ChemWeb.com server. Full use is made of ChemWeb.com search and indexing tools. Individual article access may perhaps be available through ChemWeb.com. IJC and SPARC are working together to make IJC a “leading edge partner”.

IJC licenses copyright from an author for $10. obtains commercial rights for redistribution but the author can do non-commercial distribution. An individual subscription costs $48 a year. Educational institutions pay a site license of $289 and a corporate site license costs $489. Neither license places restrictions on downloading etc. Pricing is aggressive and use agreements are liberal. IJC published 38 articles last year and has published 22 in 1999 (up to September). The number of registrants is 4708 but remember that there are non-registering users.

In the question and answer session, someone asked about archives. Bachrach replied that IJC is no worse than ACS or Springer. Has Bachrach approached his tenure committee? The idea seems to worked well enough for him. He was asked why PDF downloading was not implemented. He agrees that this could now be reexamined but he disagrees with the idea of a standard print version. Someone suggested hyperlinking IR wavelengths to a database of IR spectra. Bachrach would love to do this if he had the resources. He was asked what was the minimum size or type for an article? One reaction is perhaps not what he is looking for. There were comments about the need for more depositing of supplementary data in the literature in general. Someone asked if it were possible to steal an author’s X-ray structure. Bachrach replied that this is data, which is not copyrightable. The current copyright law is fine as far as IJC is concerned. Finally, someone asked about the costs of the whole operation. Bachrach replied that it is many orders of magnitude less than the numbers people bandy about. IJC does not have to generate print.

Back to the top

Panel Discussion on E-publishing

Panel Chairman:
Stephen R. Heller

Panel Members:
Mark Doyle, APS
Lorrin Garson, ACS
Randy Marcinko, HighWire Press
Henry Rzepa, Imperial College

Doyle finds the ACS position puzzling and gave the APS’s answers to Bovenschulte’s 10 questions. APS is a non-profit organization of 40,000 physicists. It publishes Reviews of Modern Physics and Physical Review, the latter being divided into 7 distinct journals. Doyle once worked with Ginsparg of LANL. LANL [the Los Alamos National Laboratory which handles e-prints in physics] covers much more than physics. It also has a computer science archive and a mathematics archive. Some articles are reprints, already reviewed, not preprints. Conference proceedings, talks, and lecture notes are handled. There are 15 academic mirror sites around the world. Electronic access is very good: better than for electronic journals. You see a snapshot of your whole field, not just a few journals. Disintermediation matters. APS provides links from APS journals to LANL and has an agreement with LANL to eventually implement links from LANL to APS journals. An advantage is that the Indian physicist gets the same exposure as the Princeton physicist.

APS will print articles that have been on LANL server. Indeed it accepts submission directly from the LANL server so that the author only has to upload once. APS agrees with ACS that the publisher needs to hold copyright but APS hands more rights back to the author. The Society allows authors to update their e-prints after publication, except for APS formatted articles. An APS article can go on your own Web site. APS is looking into making journals an overlay on the e-print archive. APS costs are reduced because they don’t have to send out hard copy for review etc. No-one wants to do away with peer review. The e-print archive has no impact on subscriptions even for Physics Review D (a high energy physics journal).

Garson said that electronic publishing does not necessarily reduce costs. Eighty per cent of costs are first copy costs, 20% is for “paper, postage and distribution” (PP&D). The medium is irrelevant. So with electronic journals you still have 80% of the costs. There may also be costs for electronic distribution e.g., file conversion. This is not a problem for ACS, but for many publishers there is a cost involved. There are hardware and software costs. Fault tolerant systems are needed if uptime is to be 99.99%. Multiple routers, and multiple connections to the Net are needed. Staffing is costly: a computer geek is expensive. Disaster recovery procedures are also an expensive exercise: you have to contract for computer facilities and staff.

The expectations of customers constantly increase. Putting in new links from primary publications to CAS had a significant cost. In the long term, electronic publishing could be very profitable but in the short term a publisher could go bankrupt without a financially responsible pricing policy. How should we and will we maintain the electronic archive? Historically this was the role of libraries. This is not true for electronic journals. The CORE project showed the cost of doing this. Publishers have to assume some sort of role. How will the archive be funded? File format conversion costs money. Conversion from SGML to XML will be trivial but what will come in 10 years time? There have been three media conversions in the last 5 years because of obsolescent hardware. This sort of work is not cheap if you have Gigabytes of information. Subscription records have to be managed: ACS is on its fourth subscription fulfillment system in the 20 years Garson has been at ACS. What happens when Colossus Company buys Mom’s Little Company?

There are various business users. Will secondary publishers become obsolete? No. The scientific literature in chemistry continues to grow exponentially: 7% a year compounded. Every 10 years chemical information doubles. This cannot go on forever so filtering tools are needed. Secondary services are geared to handle this. There are now 20 million CAS Registry Numbers. No one publisher dominates the Scientific Technical and Medical (STM) area and secondary publishers save you from having to go from publisher A to publisher B to publisher C etc. looking for what you want. Secondary publishers have services such as substructure searching.

Marcinko said we might ask what an entrepreneur such as himself was doing as a consultant to Stanford. Stanford, through HighWire, has nearly 200 high impact journals in the life sciences. It does electronic publishing for 60 publishers. Yes, data conversion is a high cost venture. What is an electronic archive ? You have to keep the electronic journal environment, and have to maintain links etc. This is not like a paper repository in a library. HighWire has National Science Foundation (NSF) funding for a program concerning electronic redundancy asking questions such as “What is an archive?” and “What is an e-print?”. HighWire will treat this as an experiment. They will look at the properties of the e-print but will not do the archiving themselves. HighWire has set a launch date of Spring 2000 for a new product. Out of 53 of publishers, most (35) are happy with e-prints/preprints. Maybe the review(s) can be attached to the e-print.

What next? What comes after linking? HighWire can already link the primary literature to gene sequences. Can any electronic journal be an island? Is there any merit in a linked set of journals from just one publisher. The British Medical Association (BMA) doesn’t want to be in such a restricted (one publisher) environment.

Rzepa is a user, and a reader, and an author. The author is not viewed as a powerful figure. The reader is confused with the librarian but it is the reader who has to cope. There was no librarian on this discussion panel. Rzepa was trained as a mechanistic chemist. He became a computational chemist, also generating data. This colors his perception of electronic journals. Reichman talked of “transformability” of data, and transformability in an interdisciplinary sense too. This is what drives Rzepa’s interest. In order to transform data you have to capture it in the correct form. Rzepa trains young chemists to value data. The reader needs to be taught how to use data. Separating the data from the style in which it appears may be the publisher’s responsibility; querying the data is the reader’s responsibility. This is how the XML community has developed. XML is important because of separations between data, style and querying. This teases out interesting copyright issues. Data is not copyrightable. Style is copyrightable. As authors, how do we represent impact factors. They are awful things.

The first question in the open discussion session concerned the information explosion and tenure review. Garson said that in print publication, economics restrict article size. Electronically, discussion should be concise but there should be lots of data. ACS has a supporting (supplementary) information program. Electronic publishing doesn’t do much to solve the problem. He doesn’t know how tenure will be handled in the new era. Marcinko said you can rank journals in other ways: there must be more creative approaches than impact factors. Bachrach says that just because an article has been read, it doesn’t mean that it’s good.

Having mentioned evaluation of a Web site by means of the sites which link to it, and the snags of citation for negative reasons, Doyle addressed the issue of why scientists publish. The scholar is publishing to contribute to science not to earn royalties. He wants to be read by his peers. He may also need tenure, of course. Any restriction on publishing prevents your work being read. Garson agreed that if you take a non-cynical view, a scientist wants to make a contribution. Speed of publication is important to prestigious authors. They want circulation. There is also the issue of tenure or promotion.

Somone asked Doyle about the costs of archiving and the issue of all those mirror sites. Doyle replied that TeX (the file format used) is ASCII so it will be available in the long term, but he conceded that LANL does not do manually-intensive conversion to formats like fully tagged archival formats (SGML or XML). They do convert TeX to PDF and PS. Maybe a centralized service is needed. Garson said mirroring is a form of data protection but it is not archiving. No-one knows the cost of archiving yet. There are two, non-trivial costs: file conversion and maintaining subscriber information. Doyle says it costs $10,000 to mirror the Los Alamos site. This is not a big cost: it is so low that people volunteer to do mirroring. Ideally, authors who submit files to journals which are hard to convert should pay to cover the additional costs. Marcinko said that HighWire journals are free to everyone after 2 years, so there is no need to maintain subscription data. Mirroring is a problem. If one site is hacked, all sites may be affected.

Rzepa stated that nine different types of Word files have been sent in for his electronic conferences. Office 2000 will be much easier to handle. Some lessons have been learned. Heller said that the concept of an 80% first copy cost assumes you don’t re-engineer the process. When IJC wanted to set an individual subscription of $25, one organic chemist announced that his accounting department charged $27 for internal billing. So IJC couldn’t work with this organization. You have to do things differently sometimes. If you reduce costs, then reduce the price, you can still make a profit.

Norman Schmuff (of the Food and Drug Administration) wants a single file format for chemical structures in New Drug Application submissions. Rzepa said that formats must be self-defining but they need not be unique. They must be transformable. APS first copy costs are about 80% (stated Doyle); authors must inform APS whether their articles have been converted correctly.

Richard Wife wondered whether electronic publishing could help capture the 80% of chemistry that is performed but lost to posterity. Heller accepts that proprietary chemistry will be “lost” but he questioned how much of the 80% is good chemistry. Rzepa wonders how to persuade authors to submit (and use) supplementary data. One answer is to pushing costs back onto the author, suggested one delegate. Another did not favor this solution since publishers are in competition for authors. Commercial publishers don’t have page charges. If you make it inconvenient for an author he may go to a competitor. But, retorted Rzepa, authors now type everything and don’t regard this as a burden, so why should they worry about paying?

Doyle suggested, somewhat tongue-in-cheek, that if you publish in an expensive journal your library should charge you for that journal’s subscription. He emphasized that to get rid of the subscription model requires not localized, unilateral solutions, but global solutions requiring the cooperation of authors, librarians, publishers, and funding agencies. He also emphasized the importance of closing the feedback loop between authors and the cost of where they publish (hence the somewhat idealistic suggestion above) as a way to move things along. Page charges are linked to lower subscriptions. However, there is talk of getting rid of subscriptions altogether. Page charges are an unfair burden on theoretical chemists.

Back to the top

Teaching Chemistry in the Electronic Age

Donald DeCoste, University of Illinois at Urbana-Champaign

Teaching should not be a question of “pouring water from a jug into students” but should make the students interactive if possible. Manuals are not ideal. We should have students struggle with a novel problem. With some homework assignments the student just keeps hitting “enter”. Better homework assignments are on the University of Illinois at Urbana-Champaign Web site.

Students usually have one a week to do. An example is calculating the atomic mass of M in the dichloride of M. If you just type in a number it won’t give you the answer. The system hints “Write and balance the equation for the reaction”. The Help function tries to lead the student to find an answer without giving too much away. Problems are individualized for students. The instructor can read the grade book. Another example is a stoichiometry problem. The hint screen has boxes for balancing the equation. Thus the instructor can see whether the student has problems with balancing equations or with molar mass or some other concept.

DeCoste turned to a different topic: a game that they have been developing for one year, especially for high school use. He showed a screen where you walk through the world, pick up tokens etc. There was a chat room box at bottom left. You have to answer chemistry questions to get through doors. The mole world is underground. Various people have to be battled. DeCoste gave a demonstration of the game.

Back to the top

Gateways to Chemical Information - the MetaChem and Janus Projects Down-Under

Alan Arnold, University College, University of New South Wales - Australian Defense Force Academy

In five years time (i.e., in 50 Internet years time) we won’t be having another ChemInt meeting. We might have “Molecular Sciences on the Internet” but we won’t call the scientists “chemists”. The metaphor we call the Net will change. You won’t need a wire or optic fiber to connect your computers: you will use wireless transmission. Linking back to earlier discussion, Arnold wondered why the young chemist of the future will publish.

Arnold’s job title is now “Director of Flexible Education”. His talk covered a snapshot of Australian University Chemistry, the MetaChem project and the Janus project which is currently on the drawing board. Australia has 36 universities. (Is this too many for 19 million people?) There is a group of 8 traditional, research-based universities; others are newer and more regional. Funding is from the Commonwealth not from the State. The universities are autonomous bodies, responsible for their own governance. They are responsible for 60% of R&D in Australia (1.8% of GDP, about the same as the EU average). Ten per cent of R&D is done in industry.

Arnold showed a map of Australia and its universities. There are great expanses of land with no university. The universities are in areas which have a big population. There are 520,000 full-time equivalent (FTE) students, 63,000 full time staff and 11,000 part time staff. Sixty per cent of students are under 24 and 55% are female. FTE students in science courses number 56,000 EFTS (11%). Teaching and research staff in science number 3,200 (5% of total staff, and decreasing). Of the FTE science students, 9,000 are in the chemical sciences. There are 45 departments with B.Sc. and Ph.D. programs in the chemical sciences. All schools offer PhDs but not all of them are viable. Schools have 3-25 FTE academic teaching staff, compared with 3-40 in 1990. There are about 1200 chemical majors and about 150 PhDs a year. The largest chemical library had a serials budget of 0.5 million Australian dollars in 1999. There are 200,000 Australian dollars worth of cancellations for 2000. Chemical Abstracts is beyond the budget of most libraries; Beilstein is extinct and CrossFire is too expensive. Constant factors are change, and increasing student load, decreasing staff numbers, decreasing job security and conditions, decreasing resources, and flexible teaching and learning, also known as “teaching by the Web”.

So, what about a shared national chemical information infrastructure? In 1996, the National Library of Australia (NLA) met with CAUL (Council of Australian University Libraries), the Royal Australian Chemical Institute (RACI), the Commonwealth Scientific and Industrial Research Organization (CSIRO) and a few chemists including the OzChemNet coordinator. This led to a grant for the Australian Research Council research infrastructure and equipment program (RIEFP) for the MetaChem project. MetaChem partners include most of the “gang of 8” and most big universities. There was moral support from NLA and RACI.

What is MetaChem? It is a WWW gateway to national and international print and electronic sources of chemical information: a single starting point for efficient resource recovery and retrieval. Sites were to be evaluated, described, classified and indexed so a database of metadata was set up. Information about each site, links to it and its level of authority are included. MetaChem is a searchable, extendable, value-added model for other disciplines.

MetaChem tools include the metadata database (public domain in mSQL), the metadata harvester, metadata creators and editors, and a metadata aware search engine. The harvester (“can we harvest your site?”) is from DSTC, a Distribution Systems and Technology Center spun off from University of Queensland. Editors include Reggie in Java (Mac-free zone) and Reg in Javascript. These are also from DSTC. Search engines are Hot Meta (DSTC to HTML) and a browser, Hyperindex.

Which pages are of high value to the community? Whose metadata do we trust? That of chemists, who are creators of the original resources but are not metadata aware, or that of subject specialist librarians, or that of professional societies such as RACI, ACS, and RSC? Metadata is Dublin Core (qualified), EdNA and AdminCore. Dublin Core includes Title, Creator, Subject, Description, Publisher, Contributor, Date, Type Format, Identifier, Source Language, Relation, Coverage and Rights. EdNA includes Function, Availability (whether it is licensed etc.), UserLevel, and Review, AdminCore is metadata about metadata: CreatorPersonal, and DateCreated. Rzepa has worked on chemical metadata; have others? Librarians seem to be resistant to a chemical scheme; they just use a subject.

Web sites, research reports, course work, personnel data, (e)journals, theses, conferences, software, molecules, spectra etc. were selected. Evaluation was then needed. A PICS filter is one way of describing the quality of a Web page (no nudity, no sex, no violence, no bad language). You need a PICS label on your site if Australian students are to have access. But who does the evaluation and what are the rating criteria?

Arnold described the HotMeta search engine and HyperIndex browser and showed some screens. A little icon at bottom left of the MetaChem page shows Bobby approval (accessibility for the disabled). Arnold showed a laboratory safety example and some metadata for one of the Web sites (a readable, meaningful display of description, publisher etc.). The HyperIndex browser allows you to refine your search. Arnold showed the form that can be used by authorized metadata creators.

What next? They have been working on MetaChem for a year. Now there has been a new proposal from the players. The government wanted to set up a network of collaborative information centers, consortia of libraries, and one-stop shops for access to research information. A pilot project was set up. The players were in the National Scholarly Communications Forum (NCSF) which includes vice chancellors, academics, CSIRO, NLA, CAUL etc. They propose to set up Janus centers for chemistry, agriculture, and philosophy. (Janus is the god of doors, gateways, and beginnings). They will draw on international developments and enhance Internet links (e.g., iMesh, an Internet community of collaborative centers). These centers will have two components, Subject Gateways and Call Centers. A Subject Gateway will give direct access to virtual collections. It will perhaps evolve from MetaChem and will allow collaborative purchasing. A Call Center will offer mediated help from subject experts. It will be available 24 hours a day, 7 days a week (unlike chemistry librarians).

Issues to be resolved are funding for sustainability (MetaChem was funded for just one year); quality of research content; integrated print and electronic access; duplication of effort; cross-gateway searching and tools; and archiving and persistent naming (there are PURLs but no-one uses them).

Back to the top

Incorporating Multimedia into Chemistry Courseware. Examples from Oxford University

Karl Harrison, Oxford University

Harrison’s focus is on teaching: this is unusual for a UK university. He incorporates multimedia in chemical courseware, using, for example, Apple Quicktime, MDL Chime, Macromedia Flash and Shockwave. The courseware questions students’ understanding of chemistry. Harrison gave a courseware example: the structure of solids. This is a “virtual practical”. It used to be done with classic models. Atoms are shown in a cube and the cell is rotated. The student is asked “With reference to Figure 4 confirm that percent space filling = 74%” etc. This is the traditional approach. How can we move this forward and why should we? The object is to give the student instant feedback, to complement factual knowledge (testing through drills), to introduce problem solving, and to illustrate difficult concepts.

Methods could have included the use of off-the-shelf packages and development environments e.g., CASTLE, WebCT and Question Mark to make customized WWW pages. WebCT is quite expensive and needs a change in teaching culture. Question Mark is not cheap and all departments need to collaborate well. Also, 3D information not easily incorporated. CASTLE was free to academics in the UK but is not good enough.

There are other ways of creating your own tests. HTML and JavaScript, DHTML, and embedded multimedia (using plug-ins such as Director) were easiest for Harrison since he is not a programmer. CGIs and Java, Microsoft Active Server Pages (ASP) plus SQL databases, and advanced programming are other options. In the early days, HTML gave static WWW pages. Nowadays dynamic pages are possible: the Web server program code sends a newly generated HTML page to each different client, and the client Web browser sends data to the Web server and requests new dynamic HTML pages.

Harrison has used HTML and JavaScript in the following ways:

  • simple URL link
  • simple URL link within frames
  • simple URL link with JavaScript alert
  • image maps
  • Forms
    • radio button with JavaScript
    • check box with JavaScript
    • data entry numbers with JavaScript
    • data entry text with JavaScript

He gave some examples. The first used a simple URL link within frames and concerned complex ions. It uses a Quicktime video library of 70 reactions. The student clicks on cobalt and gets test tubes for cobalt chloride and sodium hydroxide. There is a dark precipitate; is it (a), (b) or (c)? The student can go backwards and forwards in the virtual experiment. This is part of the “virtual laboratory” but it does not replace wet chemistry. Experiments do go wrong and students need to know that.

The second example, using forms, radio button and JavaScript, was also from the virtual chemistry laboratory. It concerns rates of chemical reactions. This is a chapter of a text book. There is an animation illustrating activation energy: a little ball goes over a hill then runs down a slope. Multichoice questions are not ideal but they do reinforce concepts. Wrong answers and repeated attempts are not monitored and there is no feedback.

The third example using the Director tool, and data entry text with JavaScript, concerned electron counting. This is a whole lecture course online, a course on mechanisms of organometallic reactions and catalysis. It teaches electronic counting and the 18-electron rule. The quiz at the end tests the student’s knowledge. Harrison tested the audience’s knowledge and his own. He tried three questions and got only the third one right. A score of 33% displayed. The score is not recorded.

In another example DHTML plus JavaScript with frames were used to keep score. JavaScript controlled the quiz. Harrison showed a picture of an element from the Royal Society of Chemistry’s periodic table, Visual Elements. He entered “H”, guessing, correctly, that this was a picture of hydrogen. A score of 100% displayed in a “thermometer”. A related example questions the student’s knowledge of Valence Shell Electron Pair Repulsion (VSEPR). At the end of the test he prints off a WWW page and his practical tutor signs it off. There are 200 molecules in the test and the student must answer 20. The test is designed so that it is easier to get it right first time than to do it over and over again.

Harrison also demonstrated the use of ASP and SQL databases. Dynamic HTML creation and use of the database monitors student information. If all the students get a certain type of structure wrong in the VSEPR test, it may be that some part of the course is not being taught well. This is where monitoring comes in useful. Chemscape Chime is used for rotating VSEPR shapes and models. The quiz requires that you type in your name and password. Harrison showed 15 molecules generated. He was asked for the shape of boron trifluoride. He chose “trigonal” from a pull down menu and chose and rotated the 3D shape. During this sort of procedure, the system is recording where the student is going wrong.

Back to the top

Interactive Web Page Development with Chime and JAVA

Robert Lancashire, University of the West Indies

Lancashire showed a traditional chemistry department home page. His own department’s page was developed around local themes. He uses Jamaican themes for his lecture material, e.g., extraction of bauxite, chemistry of spices, jerk pork, the chemistry of ginger. Chime structures are embedded. Coffee and exotic fruit and vegetables are featured.

He showed some course related lecture material: his first year class is run using Netscape frames. Notes are on the left and Chime examples on the right. He talked about his paper in the Summer 1999 CONFCHEM conference concerning spectroscopy of first row transition elements. Tanabe Sugano diagrams need ratios of heights of lines. You jump to a Java applet for clicking on a graph to get the required ratios. The University of the West Indies gets many visitors because the JCAMP-DX data viewer for Windows (95, 98, and NT) is from there. Chime spectra only work on PCs at the moment but much of Chime 3 will be Java based and that will solve the current problems. Lancashire displayed a spectrum on a page with molecule vibrations on the left.

SPARTAN, HyperChem, and Gaussian can be used to predict the vibration modes. There is a paper in J. Chem. Ed. about this and details are also on the university Web sites. Lancashire showed some script. Then he showed mass spectra animations using Chime 2 with JCAMP DX MS files. He referred to Prof Eric Martz’ site at the University of Massachusetts which has the MORPH program. In the demonstration a fragment of mass 42 broke off (in a Chime box) alongside a spectrum with a peak at 42.

Lancashire’s site has been going for five years during which time users of 180,000 machines have visited. There are 4000-5000 files on site. Lancashire is starting a trial with staff at the Nathan Boddington building in Leeds. This will have a Java server, Microsoft SQL Server, and dynamic pages.

Back to the top

Learning Polymer Science Over the Internet: The Polymer Science Learning Center

Lon J. Mathias, University of Southern Mississippi

Mathias spoke first about Macrogalleria on the University of Southern Mississippi Web site. This started as a shopping mall of polymer stores and information. The Macrolab is an online laboratory, and Polydelphia, a city, will be developed over the next few years. They really want to change the world: make polymer information free, make it useful, interesting and fun.

Macrogalleria is the polymer exploratorium; the Macrolab is a pre-lab. They use Hot Potato, a quiz generating program from Half Baked Software. A materials database is being generated. Virtual field trips are polymer expeditions. The Story of Rubber (done in conjunction with the Chemical Heritage Foundation) can go back in time, and can go to all sorts of countries virtually. The designers hired a “videographer” to help with this.

A site called “How Polymers Work” is for high school students. “Contests” is also used by high schools: it contains a competitive aspect. A Kids Stuff page is being developed including safe labs for use in the kitchen. There is a Training Guide to which other teachers can contribute. Mathias demonstrated with a tour of the Macrogalleria: a cyberwonderland of polymer fun. Translations into other languages are being carried out.

Level 1, “polymers are everywhere”, is set up like a shopping mall. Polyisobutylene is manipulated in Level 2. Level 3 introduces Chime concepts. Level 4 is synthesis and mechanism; Level 5 characterization. Levels 6 (industrial processes) and 7 are under development. The developers also want to tie in engineering and kinetics.

Is the site useful? Dozens of colleges use it and several high schools. The site generates thousands of hits. Hundreds of individuals buy CD copies at $30. People offer to translate it free. Industrial sites buy it (or steal it). Why does it work? It meets a real need. Multimedia is changing education. This site is focused on content, not on flashiness. It is conversational and fun. People like it. The designers are not out to make money and they don’t have a hidden agenda. The site is fun and students love it. Fancy, flashy stuff does not work; the site must be rich in content. It is the multimedia and the browser that matters, not the Internet which is just a wire. You don’t have to have Internet access to use HTML. This product is system independent. It requires exploration and interactivity. You must pay attention to nonlinear linking. There are many different pathways to the same information, providing reinforcement in learning. The system is multisensory, viewer centered, multidimensional and multilevel. Education is learning, not teaching. Next, Polydelphia, the city of the future, will be built.

Back to the top

SPARK - A Tool for Discovering Structure-Property-Activity Relationship Knowledge

Yvonne C. Martin, Abbott Labs

Computational chemists have to cope with medicinal chemists regularly coming in and wanting, for example, CLOGP run on a compound. Such programs are often too complicated for bench scientists to run. For the Computer-Aided Molecular Design (CAMD) group the problem is that only one person may know how to run a certain program and he may not be available. Capabilities are not reusable in a different context. So, the goals of the SPARK project were to provide validated desktop calculations of molecular props for project scientists, to provide tools to discover relationships between properties and activity and to integrate lots of programs.

The first solution was a Web page. More than 300 scientists have requested passwords for this and logP calculations appear to be done routinely. The technology used was Perl scripts (DayPerl) generating HTML. Maintenance is time consuming and every new calculation becomes harder to integrate, resulting in “spaghetti code”. Components are not reusable. There are no graphs and statistics and the system is tied to the Daylight view of world.

The system currently being developed uses Java beans. It is being done in collaboration with Cherwell Scientific: Abbott is mainly wrapping applications. They purchased charting and graphing from the KL Group. They are on the hunt for statistics software but if they find nothing else they may wrap SAS functions.

Martin gave an overview of SPARK capabilities, marking those functions that are being handled by Abbott. Others are done by Cherwell Scientific. Files and manually edited data, results of database searches and merging of disparate sources form the structure and data input. (Abbott handles search.) SPARK structure processing includes substructure search, similarity search, searching and counting, transformation to products, and calculation of molecular properties. The structure processing functions are being integrated by Abbott, except counting which is done by Cherwell. Data can be viewed as a spreadsheet, a compound sheet, or master details. Output is in the form of HTML and files such as SMILES and SD files. Output to databases has not yet been written. The statistics and graphs functions are all missing. Scatter plots, histograms, fit to a line, comparing lines and distributions, and cluster observations all need to be added.

Property calculations being handled by Abbott include those done by Biobyte, Syracuse, ACD, CONCORD and Tripos software. Property calculations from Cherwell include substructure count, Daylight fingerprints, molecular weight and molecular formula. Structure searching is performed with RS-cubed and ORACLE: structures retrieved by Abbott number are read in from a file or typed in. Substructure and similarity searches are possible. The structure transformation software reads in a list of structures and uses software from Barnard Chemical Information to make Daylight fingerprints. SMIRKS from reactions are entered into ChemDraw. Structures are then transformed using the Daylight toolkit into parallel synthesis libraries and substituents.

Martin showed a screen with properties on the left alongside a spreadsheet containing structure SMILES and caco-2 numeric results. Numbers such as number of carboxylic acid groups can be shown as bars using Cherwell software, alongside structures. Now Martin showed a screen of bars alone: no correlation could be seen in these properties. The user may, however, really want a graph. Martin showed a plot and clicked on a menu to produce a log/log plot. The font can be changed in this plotting package, color can be changed and you can have more or less lines. No programming is needed. You can click on a point and get a highlighted row in a spreadsheet. This is the advantage of the Java tool. Wouldn’t it be nice to be able to click on a structure and do a literature search?

Abbott have been implementing the demonstrated system for a year. The experience has had its negative aspects. In practice it can take two weeks to wrap an existing program, although it should be easy. There can be performance problems with such a multi-component system: everything must be up and running. The language is immature and there are no statistical beans yet. There have been many pluses, though. Wrapped programs are running on SGI, Sun and NT but the user sees only one platform. Several programmers can be working on a project at once without getting in each other’s way . The reuse of components has been demonstrated. Purchased components fit in nicely and provide good functionality. Uses of SPARK and related systems include property calculations and statistics graphing for project scientists, forming and analyzing QSAR, using SAR in the design of parallel libraries, and organizing CAMD software.

Back to the top

Development and Growth of the Rohm and Haas Intranet - Chemists and Web Culture

Thomas Pierce, Rohm & Haas

Goals of the Rohm and Haas (RH) intranet are communication, collaboration, and continuous education. Are there any effective strategies or tactics? You tend to use new technologies for what you did yesterday. The good news about Web communities at RH is the diverse set of users; the bad news is that little chemical information is stored or shared through the Web. Existing communities are communicating externally and internally across geographically diverse sites. They are collaborating globally with like-minded colleagues and share the same collaboration technology.

Pierce outlined some history. In 1993, at RH there were more than 3 Web servers, all under UNIX. In 1995, there were more than 25 UNIX, Novell and NT Web servers. In 1997, there were more than 150 Web servers, handling 50,000 documents for 5000 users. By 1999, more than 250 servers under NT and UNIX are handling 120,000 documents, plus 20,000-40,000 confidential ones, There are 11,000 RH staff. About 8000 used the intranet every month this year, i.e., nearly all the knowledge workers in the company, including about 500 Ph.D. chemists and 300-400 engineers. Some 35% of usage is at 3 research sites; 65% is spread across about 20 sites. There are 12,000 people at Morton, in Chicago, which RH bought. The Morton intranet had one Web page. Now the intranet gets two million hits a month and everyone is using it about 10 times a month.

The intranet is being used for tactics rather than strategy. Web usage requires a culture change which may take years. Tactics work for today; strategies can go wrong. Successful tactics include communication (mail lists, Web sites, search engines) and collaboration (NetMeeting). Chemical information sharing is a failed tactic. Web applications include mailing lists, distribution lists, USENET mailing lists and Web archives, INN for News, Majordomo and MHonare. The search engine is Ultraseek. There is some scientific and engineering software available: POLYPAK for co-polymer sequences, legacy software, part of a polymer toolkit, DIPPR, and a Web front end. Subject search engines (now called “portals”) include a Yahoo-like hand built one, hand updated and database driven. There is also Gulliver. Pierce showed an opening screen with the corporate home page at the top and the bottom two thirds customizable.

One Web application is Web video for scientific seminars, executive speeches, and diverse corporate training. For Web video, Real Network G2 was evaluated against Microsoft Netshow/mediaplayer. The corporate group decided on mediaplayer in August 1998. Video usage at RH is very popular: 5000 videos were started in August 1999 and usage has risen from 40 to 600 people. People prefer these presentations (with slides etc.) to plain video.

Pierce gave some history showing the increasing implementation of Web applications. In 1996 the phone book and a search engine were offered. In 1997, MSDSs, POLYPAK and ocWeb appeared. In 1999, with IE4, Active X and XML were developed and released to the desktop. There are 20+ applications now and the number of business applications with databases is growing monthly. New tools and desktop software matter.

In future, the company hopes to blend all components of RH with similar computer technologies: Internet Explorer, NetMeeting, Video etc. Desktop computers control which technologies are shareable. Legacy systems will be made available through a Web interface, but will chemical structures and molecular graphics be used? There will be more collaborative technologies and tactics but Pierce is not sure that chemical structures are considered important.

Back to the top

Concept and Realization of Bayer’s Integrated Chemistry Information System on the Corporate Intranet

Achim Zielesny, Bayer AG

Zielesny summarized the evolution of computing: mainframe computing in 1960; personal computing in 1980; worldwide computing in 2000; intelligent computing in 2020; and self aware computing in 2050. Scientific IT systems have also evolved:

1990 terminal/host (ORAC, CASP, Resy at Bayer)

1995 PC-Client/server (Integrated Chemical Information System, ICS)

1996 Web technology (ICS Web - static then dynamic applications)

A strategic decision was made to develop the Bayer intranet, ICS Web. It reduced maintenance efforts and offered easy and worldwide access, the highest level of integration and user friendliness. Upper management have looked at the system.

Zielesny showed the design goals in the form of a diagram with ICS in a circle of arrows linked up, with chemical structures, patents, research reports, library services, images, CD-ROM services, reactions, and substance properties all around. In his talk he did not cover the technical problems: IT people tend to live in the future and talk about “real soon now” systems. So, Zielesny showed things that had been set up by early 1999, are used by 500 chemists, and really work.

First of all he talked about chemical structure based systems, based on real connection tables, not pictures. Substance property prediction is done in collaboration with ACD/Labs. So are names and spectra. He showed crossover to STN Easy and text based systems. The user can construct Boolean searches or use a profile set up by an expert. Results are displayed in tables and there is a link to archive systems of 2.5 million scanned patents and research reports and 1 million other documents. This is not the world’s biggest archive but it one of the fastest, using a fast juke box and caching services to retrieve a document within one second. Fast printing is very important.

The CD-ROM systems are very heavily used. There are 150 concurrent users worldwide producing 5000 to 10,000 sessions a day. Zielesny demonstrated literature tracking, and display of a PDF file or an order form for sending to the library. The virtual library would be a lecture in itself. Zielesny showed the bookstore and order desk, a Table of Contents for electronic journals and a very attractive display with an image which can be clicked.

Next he turned to ICS Web on the Bayer intranet. The “glue” is the ICS Switchboard which is usually invisible. There are 8 million static HTML pages on the intranet. Zielesny gave a (canned) demonstration. First the system checks that you have the right browser and plug-ins. Access to 1.5 million searchable structures requires Chime. More than 2000 chemists have been trained in ISIS/Draw. Zielesny cut a structure from a results table and rotated the 3D structure.

The system showed that the compound was made in 1987 and it gave a boiling point. Say the user doesn’t believe the boiling point on file: he can predict it. There are pull-down menus for the properties offered by ACD/Labs. Zielesny ran ACD/Name then started an NMR calculation with a Java applet. He demonstrated structure spectrum correlation: when you click on a proton the related signal in the spectrum is marked. These are systems for the use of laboratory chemists.

The table results table said that a report is available, so Zielesny went to ICS Research Reports to see an abstract, then displayed the scanned original. He went back to the original display and saw that there was a CAS Registry Number. He left the intranet and went to the Internet to access STN. Unfortunately it takes five to ten times longer to search the Internet compared with the intranet. Zielesny showed an STN Easy search (with the real waiting time) then logged off and showed that the cost was 40 DM.

He showed an e-commerce application on the intranet. Authentication was carried out to get supplier data. A Material Safety Data Sheet can be printed. Zielesny demonstrated the ordering system. He searched for a vacuum pump: 26 vacuum pumps were listed. He “bought” a vacuum pump and another item then looked in his shopping basket before really buying any items. He checked the status of his previous purchase orders.

He returned to the ICS home page and searched patents. There is a neural network classification and class population can be shown as histograms. The ICS page locator was demonstrated. Although this is a local application it works with the Web applications. All the possible connections that Zielesny demonstrated are available today.

Cheminformatics has to be linked to bioinformatics, biology and biotechnology. In future a bioinformatics integrated layer will be built and also a cheminformatics integration layer. Chemistry is done with ICS and the electronic technical library links to Bayer research reports and patent information. Thus intellectual property and knowledge management are included in the system. All this would be impossible without Web technology.

An organizational consequence of all this is that Zielesny’s team has become an intranet/Internet Competence Center. They have made a successful step into worldwide computing. Next will come the semantic Web and XML systems. Then they will have to make a step into intelligent computing. Bayer wants all this free on the Web. So far it has involved about 15 man years of work.

Back to the top

Panel Discussion on Corporate Internet/Intranets

Panel Chairman:
Stephen R. Heller, NIST

Panel Members:
Joe Cerro, Bayer USA
Phil Kutzenco, Cytec
Yvonne Martin, Abbott Labs
Thomas Pierce, Rohm & Haas

Cerro said that social/cultural issues are as important as technical ones. Bayer US has been added to the corporate intranet. Cerro mentioned documentation, communications (a discussion board), a corporate data repository and third party issues. Data from Lion AG and Incyte is accessible.

Kutzenco gave a Cytec profile. There are 5000 employees including about 3000 knowledge workers (judging from email registrations). The Cytec Internet site is centralized and controlled by Public Relations. It is “brochureware”. It is designed by graphic artists. It is in English only. The company is preparing to add i-Business and extranets. In contrast, the corporate intranet is distributed, both static and active in content, and developed by providers and users, although it too is in English only.

Cytec has a home page banner at the top of its intranet site as does Rohm and Haas. Kutzenco listed selected uses: a laboratory procedure safety system, plant trial design and analysis software distribution and tutorial, remote operation of plant analytical equipment, remote sensor monitoring of LabView modules, CySmart sales and cost data warehouse (via Cognos WebReports and PowerPlay), Xerox Docushare, O’Reilly WebBoard, and Net-It Central The last three applications are relatively inexpensive collaborative facilities. This sort of thing is replacing Lotus Notes.

Kutzenco listed some topics concerning the opportunity offered by an intranet. The first is the impact of browser standardization. Content management is much less centralized or policed by Rohm and Haas than by other companies. There may be concern about legal or licensing issues: the Cytec logo must not be near other graphics and the company wants care to be exercised in the use of other people’s logos.

Some topics for panel discussion were listed. What is our area of responsibility: only the technical community (chemists and engineers)? How are we using the intranet and what problem does it solve? Who “owns” the Web in your company ? Is it supported by Public Relations and IT geeks? XML is becoming popular in e-commerce. Who are the publishers and how do people publish in your company? Everyone publishes in Rohm and Haas. How big is your intranet ? Compare your corporate intranet with your corporate Internet, which might only offer brochureware.

Martin described the Abbott Computer-Aided Molecular Design intranet Web page. They want to put up content: structures, ChemDraw etc. The system calculates lots of properties. It generates tautomers, ionization states, and CONCORD 3D structures. More than 300 people have requested passwords. It is used in drug discovery, drug safety and pharmaceutics.

Martin listed some internal issues: What security is needed? Who administers this? Management thinks the system is less secure than the VAX system, for example, for giving out a structure for an Abbott number. Who keeps the servers running? Who can set up a server? Who keeps the links current? Who establishes rules for publishing and control of content? There are legal issues such as copyright, revisions, and out-of-date items. Who owns the files and who can edit them?

Kutzenco also wondered who checks for copyright violations etc. Cerro said that there is an IBM spider doing it but Bachrach pointed out that the crawler won’t get into Abbott’s site. Rzepa said that metadata is useful for checking on copyright but it is little used. Most people doing pages are amateurs. Cerro agreed that probably there are all sorts of “illegal” things going on.

Bachrach asked which companies have formal training concerning Web use? Cerro said that in Bayer even the CEO is supposed to be trained if he uses Microsoft FrontPage. Kutzenco said that in Cytec the Webmaster at each location is trained. Martin knew of no help in Abbott concerning intellectual property. In Rohm and Haas, management is responsible for what their staff publish but there is no requirement to take classes. Training in FrontPage is offered but it is not obligatory.

In response to a question, Pierce said that discussion forums are used even less in house than publicly. Discussion boards have a six month life time. It is hard to get people to write things “permanently”. Cerro said collaborations give rise to extranets for discussions but separation is necessary for secrecy reasons. Martin agreed that email, personal contacts, and meetings are used for communications rather than discussions. Email threads die out after four messages.

Kutzenco finds discussion groups are useful within an active team especially if there is a geographic spread. The method works well for the life of a team. Cerro confirmed that Bayer also uses O’Reilly WebBoard. The advantage over email is that you can see the history of a discussion. The product has an entry level price of only $60 and it runs on a low end Pentium. It works on the Internet. Nine out of ten messages on GCG are spam. You can embed HTML links in messages. Someone asked about collaborative whiteboards. Cytec uses NetMeeting but the slowest link is the rate-determining step. Rohm and Haas has 1000 users. There was a question about archiving of discussion. Kutzenco replied about document retention (and destruction) policy. Cerro pointed out that there are always the back-up tapes and quoted a case from Ernst and Young where the project failed.

Kutzenco turned the discussion towards browser standardization, brand loyalty and features. Martin said that Abbott’s IT people decided on Internet Explorer. The decision makes VRML impossible. Internet Explorer cannot be used on SGI and some UNIX boxes. Some people feel that you should write to the common denominator. Kutzenco says that people keep both browsers on the machine so they can use VRML. Someone in the audience pointed out that you want to reach everybody so you have to test with many different environments. It is a big problem to get something that works on every platform. And what about the 5 versions of Java? Cerro agreed that there has to be a balance as regards the bleeding edge. Do you need to use this neat new technology? Rzepa feels we are paying the price of creeping featuritis in Internet Explorer and Netscape because of competition in the mid-nineties. Cerro said that application service providers such as E-bioinformatics could make an infrastructure available. This could be used to provide a Bayer-type system.

Back to the top

The IBM Intellectual Property Network: An Internet Resource to Search, View, Retrieve Copies of Patents (and Patent Applications) from Around the World

Stephen Boyer, IBM Almaden Research Center

Boyer outlined the history of the IBM Intellectual Property Network, beginning with US bibliographic data, then addition of maintenance status in 1997. Patent Cooperation Treaty (PCT) and European Patent Office (EPO) patents were added beginning in 1998. Japanese Patent Office (JPO) patents, Technical Disclosure Bulletins, and Inpadoc data have been added this year, as well as e- commerce, pink dots, class browsing, assignee browsing, machine translation and clustering. You can download a copy of S/3 for e-commerce. A pink dot is a link to the inventor or assignee, which helps people license technologies. Inventors can tag their products. All IBM patents are pink-dotted. There are two sites, a free one and the Gold site (IP for business). There is auto-translation into 8 languages.

Today the site is used as a document retrieval service. Tomorrow it has the potential to evolve into a more productive resource. We need tools to access and integrate with other remote collections, to perform business-to-business transactions and to provide increasing awareness (via alerting).

As a resource for business professionals, IBM is introducing Mapuccino, a new visualization tool. Boyer showed a big star diagram from this and the citation linking tool. These tools are available on the business site. Boyer gave a data mining example. He did an asthma search from which 241 documents were listed. He clicked on “classify” and the documents were put into 26 “buckets”. He clicked on “map”, which started a Java frame. In topic 12, 6 documents were listed, all of them SRS antagonists. Remember, these were clustered not using the term “SRS antagonist” but just on the search term “asthma”.

Low end charting is being introduced. The most referenced patents, the most active assignees in any class etc. can be visualized. Boyer showed a more complicated chart, the 3 axes being number, International Patent Classification (IPC) class, and assignee. Subclasses can be viewed in the same way. The work is done on the server and the search is downloaded.

English can be translated into French, German, Spanish, Italian, Danish, Japanese and two versions of Chinese. IBM are working on reverse translation with another company. Translation is not exposed to the main site: with one million hits a day, scalability has to be considered. Four new servers were needed just to address translation. This new service should be available soon.

Boyer talked about patent families, the class browser, assignee browsing, and technology neighborhood, which is related to C/R. For example, AHA software has just one patent. C shows the ones from other companies in the same class as AHA’s. R is the number of references in a patent. The ratio of the number of references to the number referred is the technology neighborhood for a company.

Boyer concluded with a big network diagram. A clone of the IBM site has been sold to the patent offices in Canada, The Netherlands, the UK, Taiwan, and Switzerland. The Swiss wanted to offer the gold service to customers. Getting data in for 53 countries is a phenomenal effort. If IBM handled 192 countries, it would be a mammoth task. Why not let the individual patent offices do what IBM did but for their own customers or country? Above the green line is an e-commerce service.

The ISI Web of Science patent links is another example. Every patent in the database is hyperlinked with respect to references. ISI can do document delivery on non-patent documents. Boyer indicated the business-to-business level and retail level in the diagram. A prototype has been built for you to do a search on “x” then click to search “x” on other sites. For mapping across sites we need to develop ICE standards: metacrawlers can then work. You have to build thesauri so that mapping can be done across all the sites and languages. The biggest challenge is getting cooperation and establishing standards.

Between nine and fifteen people do daily production for IBM. Patents in on Tuesday are available on Thursday. There is a Verity index and there are dedicated modes for gold customers who access secure pages.

Back to the top

The NIST Chemistry WebBook

Gary Mallard and Peter Linstrom, National Institute of Standards and Technology (NIST)

The goals were to produce a simple tool for access to all NIST data, an access point for high quality computed data and a tool for evaluation, to distribute data automatically to software and to be a central point for chemical data. They stay away from synthetic organic chemistry because of lack of resources. The definition of “simple” in the American Heritage Dictionary includes the terms “easy to use”, “not adorned”, and “not elaborate or luxurious”. The NIST WebBook aims to be “simple”. It can be used effectively even with a fairly slow modem. The Web has the most diverse audience ever served. What is simple to some may be complex to another. What is user friendly to some may be time wasting to others.

Information without context is not useful. Mallard discussed some changes in functionality in the WebBook. The original tool accessed thermodynamic data by name and formula. The facility has now been moved to fully hyperlinked searching of all compounds. Partial formula search has been added as well as full search of all data in a given citation, graphical display of spectral data, graphical display of numerical data as a function of temperature and pressure, graphical and tabular display of equations of state calculations, search by author of a paper, and exact and substructure search using a user-drawn structure.

Recent additions include gas phase ion energetics data, gas phase thermochemical data; condensed phase thermochemical data, phase change thermochemical data, and reaction thermodynamic data. The following are being slowly added: IR spectra, mass spectra, fluid property data sets, vibrational electronic spectra and electronic levels, spectral constants of diatomic molecules and UV spectra. There are about 32,000 compounds. The number of compounds is not increasing but the amount of data is.

The number as printed in the literature can be misleading. Compiled data is literature data corrected for changes in reference state, auxiliary values, and standard states. Data is also evaluated. In the preliminary evaluation, the computer checks for data consistency. In the detailed evaluation, an expert does cross correlations and extensive calculations, and refers to the original literature.

The current edition of WebBook has been online for 300 days. Mallard quoted some user statistics for this period. One page with a lot of images is only one piece of information, so a total number of successful requests of 11,463,619 actually equates to 4,809,049 transactions. There are 300,000 users a year; 10,633 users a week. There were 6,138 new users (hosts) in the week before the ChemInt conference. WebBook is used all over the world. IP addresses are broken down as follows:
 

IP address

Visits

Unresolved

22.3%

21.5%

Nominally US

53.7%

52.9%

Non US

24.1%

25.6%

Mallard charted three years use of WebBook. There is a dip every Christmas and use is lower in the summer when academics access data less, but there is an underlying steady rise is use. Usage may be reaching saturation point. Mallard does not believe that there are one million chemists and they don’t all need the WebBook. About 40% of the IP addresses are from visitors returning, about 60% are new. The people who come back more than once take more of the data each time. System uptime is 99.9%. Minor maintenance has been done at each change of versions: downtime is usually less than 20 minutes. Use was continuous during firewall installation.

Data to be added in the immediate future includes additional entropies and heat capacities; a revised set of heat capacities for liquids; half cell potentials; limited mixture data; more mass, IR and UV spectra ; chromatographic retention data; and gas phase chemical kinetics data. Prediction methods for thermochemistry by revised Benson methods and prediction methods for vapor pressure of pure compounds will be added. Name search will be improved with partial names. There will be a link to the computational site at NIST.

Back to the top

ChemSymphony Beans for Chemistry Clients and Database Access

Lewis Jardine, Cherwell Scientific

The aim for ChemSymphony is to become the standard Java toolkit solution for chemistry. ChemSymphony beans are chemistry aware Java Beans components. They are meant for building clients for corporate intranets and for creating advanced interactive Web pages. Benefits are cross-platform portability, flexibility, easy integration and lower risk.

ChemSymphony Beans 1.1 (the current version) has 2D and 3D renderers and sketchers; other display components; controls and customizers for the rendering, sketching and display beans; a periodic table and a 3D fragments library; load file, save file, and data transfer beans; and file format filters for MOL, XYZ, CML etc. Very soon Rasmol scripting support, SQL database access, and configuration facilities will be added. Access into the system will be better, for example, rotational scaling of renderers will be accessible.

MetaSymphony has a novel data relationship model. It captures relational linkages between data and allows assimilation and collation of data from different sources, e.g., Daylight, SQL databases via JDBC, flat files and computational engines. The integration of computational engines is at the planning stage at Cherwell Scientific but Abbott have already done some work in this area (see Martin’s paper above).

The user interface involves a spreadsheet viewer. Combined schema are displayed as a tree. Simple drag and drop selection is offered. The user interface supports access to database searches and allows integration of existing programs. Jardine demonstrated the directory tree of schema. SMILES strings can be changed into chemical structures. He pulled an item from the directory tree onto the right hand side of the screen and it made a new column in the spreadsheet. A column may contain bars rather than numbers, to aid in understanding. There are simple statistical tools, e.g., bars can be moved around the arithmetic mean. Thus the user can pick out immediately what is close to center or what deviates most.

Jardine gave a diagram of the MetaSymphony architecture and connectivity. In the center is the data relationship model. There are bidirectional links to adapters (such as the Daylight and SQL database adapters) and to flat files. Front end depictors include the spreadsheet viewer. A pointer to the data is held, rather than the data itself, until the actual data needs to be displayed.

Jardine listed some features of the system. It saves results as flat files (e.g., SDfiles) or HTML. Images are output using JPEG or PNG. Users can enter and apply equations. Configurations can be stored between sessions. You can use database computational engines on data from other sources without needing to create a database (although creating a database might speed things up) and use functionality on the fly.

The system has considerable flexibility. Any type of data can be captured. The architecture is modular, with many hooks. Other display components can be added. Programmers can integrate the data from existing programs using Java interfaces. Non-Java programs can be integrated. Jardine acknowledged the efforts of Yvonne Martin’s team of experts at Abbott in establishing functional requirements and doing focused beta testing. He thanked Daylight Chemical Information Systems for providing access to Merlin and Thor, and for technical support. In summary, Cherwell Scientific can offer an intuitive user interface, an adaptable component architecture, platform independence, a generic data relationship model, and integration with other programs, adding up to a powerful solution.

Back to the top

Enhanced CACTVS Browser of the Open NCI Database: Second Round

Marc C. Nicklaus, NIH/NCI

Co-authors were Wolf-Dietrich Ihlenfeldt and Frank Oellien at the University or Erlangen-Nürnberg. The National Cancer Institute (NCI) data has been put into a browser for public use. (Some compounds obviously have to be excluded and the Developmental Therapeutic Program is separate.) The system is called Erlangen/Bethesda Data and Online Services. The Spring 98 version is still the current one on the Web. Nicklaus was using a new URL which is for the developers only.

He demonstrated form-based query entry. Also available is a Structure Based Focusing (SBF) toolkit for subsetting from large databases etc. New features include a new GUI, continuation of search, new substructure search facilities, more improved searchable criteria, logP data (KOWWIN) and experimental logP for some compounds, Wagener’s druglikeness factor and Poroikov’s PASS activity spectrum predictions. There are links to additional other services for processing of search results. MOL files can be output. Chime, a 3D Java viewer, JRML, and spreadsheet formats for hitlists are offered. SMILES output is now unique. 3D pharmacophore search allows up to 25 conformers per compound.

Nicklaus demonstrated (live) the Query Form page of the new version. There are 4 search fields. NSC number, CAS Registry Number and Formula options all have pull-down menus and the three data fields can be searched in one go. There are many criteria in a pull down list, e.g., molecular weight, PASS, logP. Exact structure etc. is under NSC, CAS Number, and Formula boxes. At the top of the screen, to the right, are “Negate”, “Query” and “Value”. Nicklaus demonstrated NSC input producing lots of displayed data and a structure. If there are many hits, you get a list. A display button allows you to put structures into the list/table. There are various viewing options, e.g., ChemSymphony, Chime, VRML. The buttons “data retrieval”, “visualization”, and “external service” appear at the top of the screen above the spreadsheet.

Next, Nicklaus demonstrated substructure search. The Ertl structure editor is used for query entry. He clicked to transfer the structure to the query form. The relevant substructure is highlighted in the hit display. It is also highlighted in the VRML and Chime displays. CORINA used for 3D structure generation. At the moment, 3D pharmacophore search requires an external file construction method, e.g., Catalyst or MDL software. The team is working on overlays. The alpha version will be finished in about a month, the beta-testing will be carried out (by volunteers from the public) until year end.

Future plans include a hit list manager, extended 3D search (centroids, points, lines etc.), visualization of 3D search criteria in hit structures, more physicochemical data, and a “rule of 5” facility. Complete PASS data will be made available: 100 million data points. There will be more complete support of SMARTS and absolute SMILES. The new Ertl structure editor (a Java molecular editor) will allow disconnects. More 3D capabilities and embedded COMPARE-like functionality will be added. (COMPARE is an NCI program for activity patterns for 60 cell lines.) Unfortunately the NCI raw data had no stereochemistry.

Back to the top

Closing Remarks

Henry Rzepa, Imperial College, London

Yogi Berra said that in a time of change, the future isn’t what it used to be. Today we are in the world of macro and micro business electronic aggregators. Henry has bought shareware through an aggregator that can charge as little as $10. Will there be a university aggregator? He watches this area with great interest. What about virtual courses in education: is there scope for e-commerce there? We are into a data market where we will share data too. “Tidy” will change your Web site to X-HTML. There will be many one-stop shops.

Now that journals and databases are coalescing, how do you distinguish them? In the world of macro and micro publishing, why publish at all? Should information be totally free? In this last conference session we saw a quarter of a million free molecules. This is reminiscent of Weininger’s cult dividers at last year’s meeting.

Rzepa turned to the infamous impact factor. We should separate the impact factor for good science from the impact factor for bad science. We need tools, adaptive tools, and components from more than one vendor. Retraining costs with changing software are a big problem. There will be major advances here.

The lack of chemistry in corporate intranets, and “brochureware” in Internets were mentioned at this year’s conference.. Henry is intrigued that recorded lectures are so popular. How many talks like those this weekend have been missed? How do you index and search for talks? The Internet and TV will become indistinguishable. Good high quality multimedia is ferociously expensive in the education sector. Sociological issues have an impact on the need to publish. Yogi Berra said when you come to a fork in the road, take it!

This page updated on 8th May 2000