Warrzone 2005

back to WarrZone

Twenty Five Years of Progress in Cheminformatics


Paper given at the Spring 2005 ACS National Meeting held in San Diego, California, in the symposium honouring Peter Willett, winner of the ACS Award for Computers in Chemical and Pharmaceutical Research.

Wendy Warr & Associates, 6 Berwick Court, Holmes Chapel, Cheshire, CW4 7HZ, England. Tel/fax +44 (0)1477 533837. Email: wendy@warr.com

It is unusual for a consultant to talk about the past: it is better for business to talk about the future. Another danger in this presentation is the impossibility of covering 25 years of scientific achievement in 30 minutes. This, therefore is just a personal selection, omitting, amongst other things, progress in chemical reaction searching and prediction, synthesis design, online information, patents, computer-aided structure elucidation, and QSAR. The reason for picking a 25 year period was Peter Willett's arrival at Sheffield in 1979, about 25 years ago. I have been involved with J. Chem. Inf. Comput. Sci. (now J. Chem. Inf. Model.) for about 18 years (on the Editorial Advisory Board from 1987 and Associate Editor since 1989) so my talk is probably biased towards papers published in that journal, the leading publication in cheminformatics. The Department of Information Studies in Sheffield, founded by Michael Lynch in 1965, figures prominently in this presentation and it should be noted that it would still have been prominent even if this talk had not been specifically dedicated to Peter Willett.

From 1972 until 1991 I worked for ICI Pharmaceuticals Division. Back in 1979, many, if not most pharmaceutical companies stored their chemical structures in the form of Wiswesser Line Notation (WLN), e.g.,

WLN used only the letters and numbers of a standard typewriter keyboard and the "one structure-one WLN" canonicalisation was achieved by application of an esoteric set of rules. WLNs could be searched using the CROSSBOW system (Computerised Retrieval of StructureS Based On Wiswesser) developed at ICI Pharmaceuticals but also used widely outside that company. CROSSBOW search featured two screening steps prior to the cpu-intensive atom-by-atom search stage. Fragments and connection tables were generated from the line notation. The first screening stage was fragment search and the second was search of the WLNs. A highly significant feature (for a system of that period) was display of the hit structures, again done using keyboard characters.

It is worth mentioning here that much of the early research into efficient fragment screening was done (before 1979) by Lynch's team at Sheffield. During the 1970s, controversy continued over the pros and cons of WLN and connection tables; use of WLN was on the decline by the early 1980s. Chemical notations, however, are not dead: SMILES is still in widespread use (Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31-36; Weininger, D.; Weininger, A.; Weininger, J. L. SMILES. 2. Algorithm for Generation of Unique SMILES Notation. J. Chem. Inf. Comput. Sci. 1989, 29, 97-101) and users of Tripos' SYBYL system use SYBYL Line Notation (SLN).

The "compact connection table" was first developed by D. J. Gluck of du Pont (Gluck, D. J. A Chemical Structure Storage and Search System Developed at Du Pont. J. Chem. Doc. 1965, 51, 43-51). Gluck stated that they had disclosed the details of the system for the benefit of the entire chemical community and offered them to the American Chemical Society (Chemical Abstracts Service) for the design of their system for storage and retrieval of chemical structures. Harry L. Morgan at CAS then published the well-known algorithm (Morgan, H. L. The Generation of a Unique Machine Description for Chemical Structures - a Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 1965, 5, 107-113) that became fundamental to the CAS Registry System.

Another connection table (Wipke, W. T.; Dyott, T. M. Stereochemically Unique Naming Algorithm. J. Am. Chem. Soc. 1974, 96(15), 4834-4842) formed the basis of the MACCS system of Molecular Design Limited (later MDL Information Systems, and now Elsevier MDL). In the 1980s, many pharmaceutical companies converted their files of WLNs into MACCS connection tables. Incidentally, the CAS connection table does, of course, handle stereochemistry nowadays (Blackwood, J. E.; Blower, P. E.; Layten, S. W.; et al. Chemical Abstracts Service Chemical Registry System. 13. Enhanced Handling of Stereochemistry. J. Chem. Inf. Comput. Sci. 1991, 31, 204-212).

Two other early systems must also be mentioned. The NIH-EPA Chemical Information System, developed by Feldmann et al. (Feldmann, R. J. Milne, G. W. A. Heller, S. R. Fein, A. Miller J. A., Koch B. An Interactive Substructure Search System. J. Chem. Inf. Comput. Sci. 1977, 17(3), 157-163), was in operation well before the launch of CAS ONLINE but I will not dwell on it since it is pre-1979 and I have excluded online information from this talk. Another early system was DARC (Description, Acquisition, Recherche et Corrélation, according to one early version of the acronym) developed by the team of Dubois in Paris (Dubois, J. E. French National Policy for Chemical Information and the DARC System as a Potential Tool of this Policy. J. Chem. Doc. 1973, 13, 8-13). Early publications about DARC appeared in French in Bull. Soc. Chim. France in 1968. Using DARC, the CAS Registry could be searched on a single mainframe even before the earliest incarnation of CAS ONLINE. DARC was a potential competitor to MACCS (and DARC-RMS to REACCS): that MDL won the battle for the market was in large part due to marketing inexperience rather than technical inferiority on the part of DARC.

To bring the structure representation story right up to date, I have to mention the IUPAC International Chemical Identifier, InChI (see the paper by Heller in this report). Canonical numbering for InChI was modified from McKay, B. D. Practical Graph Isomorphism. Congressus Numerantium 1981, 30, 45-87, by Dmitrii Tchekhovskoi, working for Steve Stein at NIST.

Graphics systems for displaying structures were also being developed in the 1970s (and even 1960s). Corey and Wipke's seminal publication (Corey, E.J.; Wipke, W.T. Computer-assisted Design of Complex Organic Syntheses. Science, 1969, 166 (3902) , 178-192) on a system later called LHASA, precedes my 1979 timeline. Todd Wipke, Jeff Howe, Dick Cramer and Bill Jorgensen were all involved early on in the LHASA project. In the middle stages, Harry Orf, Alan Long, Peter Johnson and Stewart Rubenstein worked with Corey (see http://lhasa.harvard.edu/?page=LHASApublications.htm). The project had a significant impact on my own career when chemists learnt of the possibilities for interacting with chemical search systems themselves without the intermediation of an expert in the black art of WLN. This led to pressure to implement a system such as MACCS in-house.

Early graphics systems used very expensive terminals, such as the Imlac terminal we first used with MACCS at ICI. At £20,000 each (a lot of money in 1982), these were far too costly for putting on the benches of even a few chemists. Later implementation of MACCS on the reasonably priced DEC VT640 terminal was part of the strategy of ICI and Molecular Design Limited. Equally or more important, was the development of a faster version of MACCS (called "Big MACCS" in-house) done by MDL but using the substructure searching techniques well understood at Sheffield University and ICI.

The next significant hardware development was that of microcomputers, with their WIMP (Windows Icons Mice Pointers) technology. Stuart Marson, CEO of MDL, first introduced me to the mouse in 1984 before we made our way to a Chemical Structure Association conference at the University of Sheffield. The days of the VT640 were numbered. The early IBM PC read real floppy disks (the ones "the mailman could bend"). Tetrahedron Computer Methodology, a revolutionary electronic journal, using MDL's "Chemist's Personal Software Series" (CPSS) was issued on these floppy disks. It was ahead of its time. It perhaps failed through human factors rather than technological ones, e.g., the time taken to review articles does not depend only on the availability of software. The last issue appeared in 1992 (with a highly significant issue as will be seen later), although it was dated 1990, to meet the calendar requirements of a more old-fashioned publishing culture.

The method of searching 2D structures is based on graph theory techniques, in which the nodes and edges of a graph are used to denote the atoms and bonds of a molecule. A subgraph isomorphism algorithm is used to carry out an atom-by atom-search, after an initial screen search. As long ago as 1977, Gund had suggested that such methods could be applied to 3D structure searching, with the nodes and edges of a graph being used to represent the atoms and inter-atomic distances in a molecule. Unfortunately, at that time, few 3D structures of interest were available for searching and the algorithm proposed was in any case, too slow.

Experimentally measured 3D structures have been stored for many years in the Cambridge Structural Database (CSD), the world repository of small molecule crystal structures:

Kennard, O.; Watson, D. G.; Town, W. G. Cambridge Crystallographic Data Centre. I. Bibliographic File. J. Chem. Doc. 1972, 12, 14-19.
Allen, F. H.; Bellard, S.; Brice, M. D.; Cartwright, B. A.; Doubleday, A.; Higgs, H.; Hummelink, T.; Hummelink-Peters, B. G.; Kennard, O.; Motherwell, W. D. S.; Rodgers, J. R.; Watson D. G. The Cambridge Crystallographic Data Centre: Computer-based Search, Retrieval, Analysis and Display of Information. Acta Cryst. 1979, B35, 2331-2339.
Allen, F. H. The Cambridge Structural Database: a Quarter of a Million Structures and Rising. Acta Cryst. Section B 2002, 58, 380-388.

The Cambridge Crystallographic Data Centre, which builds CSD, was founded in 1965 (the same year as the Sheffield department). The database now holds 335,276 structures for 303,733 different compounds. This number is, however, very small compared with the number of known compounds so the appearance of programs to generate 3D structures from 2D ones was a significant development. CONCORD (Pearlman, R. S. Rapid Generation of High Quality Approximate 3D Molecular Structures. Chemical Design Automation News, 1987, 2, 1, 5-7) and CORINA (Hiller, C.; Gasteiger, J. Ein Automatisierter Molekülbaukasten. In Software-Entwicklung in der Chemie, Gasteiger, J., Ed.; Springer: Berlin, 1987; Vol. 1; pp. 53-66; Gasteiger, J.; Rudolph, C.; Sadowski, J. Automatic Generation of 3D Atomic Coordinates for Organic Molecules. Tetrahedron Comput. Methodo. 1990, 3, 537-547) are still in regular use today.

Willett's team designed a screening system for searching large databases of chemical structures (Jakes, S.E.; Willett, P. Pharmacophoric pattern matching in files of 3D chemical structures: selection of inter-atomic distance screens. J. Mol. Graph. 1986, 4, 12- 20; Cringean, J. K.; Pepperrell, C. A.; Poirrette, A. R.; Willett, P. Selection of Screens for Three-dimensional Substructure Searching. Tetrahedron Comput. Methodol. 1990, 3, 37-46) and demonstrated the applicability of the Ullmann algorithm to the 3D subgraph isomorphism problem (Brint, A. T.; Willett, P. Pharmacophoric Pattern Matching in Files of 3D Chemical Structures: Comparison of Geometric Searching Algorithms. J. Mol. Graph. 1987, 5, 49-56).

Teams at Merck and CAS adopted similar approaches in the late 1980s. ALADDIN (Van Drie, J. H.; Weininger, D.; Martin, Y. C. ALADDIN: an Integrated Tool for Computer-assisted Molecular Design and Pharmacophore Recognition from Geometric, Steric, and Substructure Searching of Three-dimensional Molecular Structures. J. Comput.-Aided Mol. Design 1989, 3 (3), 225-251) and MENTHOR (Martin, Y. C.; Danaher, E. B.; May, C. S.; Weininger, D. MENTHOR, a Database System for the Storage and Retrieval of Three-dimensional Molecular Structures and Associated Data Searchable by Substructural, Biologic, Physical, or Geometric Properties. J. Comput.-Aided Mol. Design 1988, 2 (1), 15-29) were other early systems in the field. Willett and Martin reviewed 3D searching in Martin, Y.C.; Willett, P. Three-dimensional Chemical Structure Handling. Tetrahedron Comput. Methodol. 1990, 3, 525-774. (Note that I have now made several references to Tetrahedron Computer Methodology. )

These early systems took no account of flexibility of molecules, searching only one low energy conformation for each molecule. Systems such as ChemDBS3D from Chemical Design Limited overcame this problem by storing a small number of low energy conformers. Willett's team investigated the use of upper and lower bounds for each edge of the graph. They also investigated a range of techniques for the final stage of 3D searching: a conformational searching procedure to ensure that the hits at the previous stage are truly feasible (Clark, D. E.; Jones, G.; Willet, P.; Kenny, P. W.; Glen, R. C. Pharmacophoric Pattern Matching in Files of Three-dimensional Chemical Structures: Comparison of Conformational-searching Algorithms for Flexible Searching. J. Chem. Inf. Comput. Sci. 1994, 34, 197-206). They concluded that the so-called "directed tweak" approach (Hurst, T. Flexible 3D Searching: the Directed Tweak Technique. J. Chem. Inf. Comput. Sci. 1994, 34, 190-196) seemed to be the most effective. Later, the Catalyst system used the alternative approach: storing a small number of conformers to represent a quasi-exhaustive set, as opposed to generating additional conformers on the fly at the search stage (Smellie, A.; Kahn, S. D.; Teig, S. L. Analysis of Conformational Coverage. 2. Applications of Conformational Models. J. Chem. Inf. Comput. Sci. 1995, 35, 295-304).

Similarity searching presents an alternative to traditional substructure searching (Willett, P.; Barnard, J. M.; Downs, G. M. Chemical Similarity Searching J. Chem. Inf. Comput. Sci. 1998, 38, 983-996). Adamson and Bush were the first to suggest that compounds with fragments in common might be similar (Adamson G. W.; Bush, J. A.. A Method for the Automatic Classification of Chemical Structures. Information Storage and Retrieval 1973, 9, 561-568) but nine years passed before two papers appeared within months of each other describing the use of fragments in similarity searching (Carhart, R. E.; Smith, D. H.; Venkataraghavan, R. Atom Pairs as Molecular Features in Structure-activity Studies: Definition and Application. J. Chem. Inf. Comput. Sci. 1985, 25, 63-73; Willett, P.; Winterman, V.; Bawden, D. Implementation of Nearest-neighbour Searching in an Online Chemical Structure Search System. J. Chem. Inf. Comput. Sci. 1986, 26, 36-41). Since then, the use of fragments to represent a chemical structures, and the Tanimoto coefficient to measure the similarity between pairs of molecules, form a method that has been adopted very widely in operational systems.

Most of the research on 3D similarity searching has focused on the use of either maximal common substructures (MCS) or molecular field overlaps. Both methods not only measure the similarity between two molecules but also provide an alignment. A paper by Crandell and Smith (Crandell, C. W.; Smith, D. H. Computer-assisted Examination of Compounds for Common Three-dimensional Substructures. J. Chem. Inf. Comput. Sci. 1983, 23, 186-197) inspired Willett to develop a better algorithm for identifying common 3D substructures, eventually demonstrating that the Bron-Kerbosch clique detection algorithm was the best generally applicable solution (Brint, A. T.; Willett, P. Algorithms for the Identification of Three-dimensional Maximal Common Substructures. J. Chem. Inf. Comput. Sci. 1987, 27, 152-158; Gardiner, E. J.; Artymiuk, P. J.; Willett, P. Clique Detection Algorithms for Matching Three-dimensional Molecular Structures. J. Mol. Graph. Model. 1998, 15, 245-253). Recent work on 2D similarity is discussed in Willett's award address later in this report.

The 1990s were a period of an immense data explosion from genomics, high throughput screening, and combinatorial chemistry. It was during this period that the term "cheminformatics" (or "chemoinformatics") was born to describe the new discipline of storing, retrieving and analysing all these data. This was also the time when the study of molecular diversity (the "opposite" of similarity) came to the fore. Not surprisingly, the team at Sheffield was soon applying its ideas to molecular diversity.

As an example of compound selection, Weininger has been credited with commenting that there are 10180 possible drugs, 1018 likely drugs, 107 known compounds, 106 commercially available compounds,.106 compounds in corporate databases, 104 compounds in drug databases, 103 commercial drugs and 102 profitable drugs. (The "Weininger number" of 10200 possible compounds is sometimes used nowadays.) Apart from the impossibility of testing billions of compounds, random screening poses other challenges: it is too expensive, its hit rate is low, false positives may be a problem, and it consumes expensive compounds. So, computational chemistry methods are used to choose those compounds most likely to be hits.

Selection of diverse subsets may be carried out by clustering (a subject much researched at Sheffield), dissimilarity-based selection (first described by Bawden and Lajiness), partitioning, or cell-based approaches (as in Pearlman's Diverse Solutions program, or the 3D approach of Mason's team at Rhône Poulenc Rorer) and optimisation-based methods (e.g., Martin, E. J.; Blaney, J. M.; Siani, M. A.; Spellmeyer, D. C.; Wong, A. K.; Moos, W. H. Measuring Diversity: Experimental Design of Combinatorial Libraries for Drug Discovery. J. Med. Chem. 1995, 38, 1431-1436).

Early clustering work at Sheffield concentrated on the Jarvis-Patrick clustering method (Willett, P. Similarity and Clustering in Chemical Information Systems; Research Studies Press: Letchworth, 1987) but later work favoured Ward's clustering, given an efficient algorithm for its implementation (Downs, G. M.; Willett, P.; Fisanick, W. Similarity Searching and Clustering of Chemical Structure Databases Using Molecular Property Data. J. Chem. Inf. Comput. Sci. 1994, 34, 1094-1102). Cluster-based compound selection is a two-stage process: after the compounds in the data set have been clustered, a representative compound has to be picked from each cluster ((Willett, P.; Winterman, V.; Bawden, D. Implementation of Non-hierarchic Cluster Analysis Methods in Chemical Information Systems: Selection of Compounds for Biological Testing and Clustering of Substructure Search Output. J. Chem. Inf. Comput. Sci. 1986, 26, 109-118).

I invited the speakers in this symposium to send me a list of 3-5 papers that they considered had been most significant over recent years. Two papers in the diversity field were mentioned by more than one of us: Brown, R. D.; Martin, Y. C. Use of Structure-activity Data to Compare Structure-based Clustering Methods and Descriptors for Use in Compound Selection. J. Chem. Inf. Comput. Sci. 1996, 36, 572 -584, and Brown, R. D.; Martin, Y. C. The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-receptor Binding. J. Chem. Inf. Comput. Sci., 1997, 37, 1-9. Brown and Martin demonstrated the effectiveness of Ward's clustering and MDL keys and established the 85% similarity threshold above which similar compounds are likely to have similar biological properties.

Reagent-based selection involves choosing subsets of reagents whereas product-based selection involves enumerating the final library and selecting from the resulting products, a more computationally demanding procedure in all. Willett et al. showed that a product-based approach results in a more diverse subset (Gillet, V. J.; Willett, P.; Bradshaw, J. The Effectiveness of Reactant Pools for Generating Structurally Diverse Combinatorial Libraries. J. Chem. Inf. Comput. Sci. 1997, 37, 731-740). This work was confirmed in another study (Jamois, E. A.; Hassan, M.; Waldman, M. Evaluation of Reagent-based and Product-based Strategies in the Design of Combinatorial Libraries. J. Chem. Inf. Comput. Sci. 2000, 40, 63-70).

It was not long before researchers began to realise that diversity in itself was not enough: other factors such as synthetic accessibility, cost of reagents, layout of synthesis blocks etc. must be considered. In particular, the concept of "druglikeness", as defined in the Lipinski "Rule of Five" became widely recognised (Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Delivery Rev. 1997, 23 (1-3), 3-25).

Later still, Hann and Oprea suggested that not "druglikeness" but "leadlikeness" might be a significant issue (Hann, M. H.; Leach, A. R.; Harper, G. Molecular Complexity and its Impact on the Probability of Finding Leads for Drug Discovery. J. Chem. Inf. Comput. Sci. 2001, 41, 856 -864; Oprea, T. I.; Davis, A. M.; Teague, S. J.; Leeson, P. D. Is There a Difference Between Leads and Drugs? A Historical Perspective. J. Chem. Inf. Comput. Sci. 2001, 41, 1308 -1315; Teague, S. J.; Davis, A. M.; Leeson, P. D.; Oprea, T. The Design of Leadlike Combinatorial Libraries. Angew. Chem., Int. Ed. Engl. 1999, 38 (24), 3743-3748). The scene was set for the development of "fragment-based" screening methods to detect smaller lead molecules with scope for a growth in size at the lead optimisation stage.

High throughput screening was originally oversold as a panacea for providing large numbers of hits that might produce leads but as disillusion started to set in, enthusiasm for virtual high throughput screening increased. Virtual approaches included 2D similarity, 3D database searching and the use of pharmacophores, and docking of ligands into a protein of known structure. Docking actually precedes the field of molecular diversity by many years: Kuntz' team developed the precursor of the DOCK program back in 1982 (Kuntz, I. D.; Blaney, J. M.; Oatley, S. J.; Langridge, R.; Ferrin, T. E. J. Mol. Biol., 1982, 161, 269; Shoichet, B. K.; Kuntz, I.D. Protein Docking and Complementarity. J. Mol. Biol. 1991, 221, 327-46; Shoichet, B. K.; Stroud, R. M.; Santi, D. V.; Kuntz, I. D.; Perry, K. M. Structure-based Discovery of Inhibitors of Thymidylate Synthase. Science 1993, 259, 1445-1450).

Docking presents two particular challenges: producing a method that is fast enough to use for screening very large collections of potential ligands, and developing an accurate scoring function to rank the ligands in terms of best fit in the binding site or to measure the binding energy. The Böhm scoring function is one well known scoring method (Böhm, H. J. The Development of a Simple Empirical Scoring Function to Estimate the Binding Constant for a Protein-ligand Complex of Known Three-dimensional Structure. J. Comput.-Aided Mol. Des. 1994, 8(3) , 243-56). A great many docking methods have been published (AutoDOCK, DOCK, FlexX, Glide, etc.). GOLD, the program developed by Willett's team (Jones, G.; Willett, P.; Glen, R. C.; Leach, A. R.; Taylor, R. Development and Validation of a Genetic Algorithm for Flexible Docking. J. Mol. Biol. 1997, 267, 727-748) is reckoned to be one of the better methods.

If I had time to venture into further detail about genetic algorithms, pharmacophore mapping, and multi-objective optimisation, I would be starting to move from the history of cheminformatics into systems which are currently being developed. Some of these developments are discussed by others speakers in this symposium. I mentioned earlier that I had invited all the speakers to send me a list of 3-5 papers that they considered had been most significant over recent years. Two papers that were suggested by one speaker but are not in the selection of papers I have cited above are Taylor, R. Simulation Analysis of Experimental Design Strategies for Screening Random Compounds as Potential New Drugs and Agrochemicals. J. Chem. Inf. Comput. Sci. 1995, 35, 59-67; and Young, S. S.; Sheffield, C. F.; Farmen, M. Optimum Utilisation of a Compound Collection or Chemical Library for Drug Discovery. J. Chem. Inf. Comput. Sci. 1997, 37, 892-899. (I have excluded a number of suggestions in the field of QSAR.)

To conclude this presentation, I thought it would be interesting to see which articles in J. Chem. Inf. Comput. Sci. have been most cited and to see if there is any agreement with the papers I myself chose to cite in this presentation. I am deeply indebted to CAS for producing the following list for me. CAS holds citations from journal and other non-patent documents published in the Roman alphabet from 1997 that have been selected for coverage in CAplus, and examiner citations from basic patents from the US, EPO, WIPO, and German patent offices, starting in 1998. Beginning in 2003, patent examiner citations from British and French basic patents are also included.

Up to March 3, 2005, CAS found 14 papers that had received more than 100 citations (ranked from 1, the most cited paper, down to 14, the least cited):

14. Gillet, V. J.; Willett, P.; Bradshaw, J. The Effectiveness of Reactant Pools for Generating Structurally Diverse Combinatorial Libraries. J. Chem. Inf. Comput. Sci. 1997, 37, 731-740.
13. Hall, L. H.; Mohney, B; Kier, L. B. The Electrotopological State: Structure Information at the Atomic Level for Molecular Graphs. J. Chem. Inf. Comput. Sci. 1991, 31, 76-82.
12. Hall, L. H.; Kier, L. B. Electrotopological State Indices for Atom Types: a Novel Combination of Electronic, Topological, and Valence State Information. J. Chem. Inf. Comput. Sci. 1995, 35, 1039-1045.
11. Ghose, A. K.; Crippen, G. M. Atomic Physicochemical Parameters for Three-dimensional-Structure-directed Quantitative Structure-activity Relationships. 2. Modelling Dispersive and Hydrophobic Interactions. J. Chem. Inf. Comput. Sci.; 1987, 27, 21-35.
10. Carhart, R. E.; Smith, D. H.; Venkataraghavan, R. Atom Pairs as Molecular Features in Structure-activity Studies: Definition and Applications. J. Chem. Inf. Comput. Sci.1985, 25, 64-73.
9. Wessel, M. D.; Jurs, P. C.; Tolan, J. W.; Muskal, S. M. Prediction of Human Intestinal Absorption of Drug Compounds from Molecular Structure. J. Chem. Inf. Comput. Sci. 1998, 38, 726-735.
8. Brown, R. D.; Martin, Y. C. The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-receptor Binding. J. Chem. Inf. Comput. Sci. 1997, 37, 1-9.
7. Willett, P.; Barnard, J. M.; Downs, G. M. Chemical Similarity Searching. J. Chem. Inf. Comput. Sci. 1998, 38, 983-996.
6. Rogers, D.; Hopfinger, A. J. Application of Genetic Function Approximation to Quantitative Structure-activity Relationships and Quantitative Structure-property Relationships. J. Chem. Inf. Comput. Sci. 1994, 34, 854-866.
5. Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31-36.
4. Viswanadhan, V. N.; Ghose, A. K.; Revankar, G. R.; Robins, R. K. Atomic Physicochemical Parameters for Three-dimensional Structure-directed Quantitative Structure-activity Relationships. 4. Additional Parameters for Hydrophobic and Dispersive Interactions and their Application for an Automated Superposition of Certain Naturally Occurring Nucleoside Antibiotics. J. Chem. Inf. Comput. Sci. 1989, 29, 163-172.
3. Brown, R. D.; Martin, Y. C. Use of Structure-activity Data to Compare Structure-based Clustering Methods and Descriptors for Use in Compound Selection. J. Chem. Inf. Comput. Sci. 1996, 36, 572-584.
2. Fletcher, D. A.; McMeeking, R. F.; Parkin, D. The United Kingdom Chemical Database Service. J. Chem. Inf. Comput. Sci. 1996, 36, 746-749.
1. Allen, F. H.; Davies, J. E.; Galloy, J. J.; Johnson, O.; Kennard, O.; Macrae, C. F.; Mitchell, E. M.; Mitchell, G. F.; Smith, J. M.; Watson, D. G. The Development of Versions 3 and 4 of the Cambridge Structural Database System. J. Chem. Inf. Comput. Sci. 1991, 31, 187-204.

Allowing for the fact that I deliberately excluded QSAR-related papers, it seems that we speakers made a fairly good job of selecting popular publications. I will leave readers to draw their own conclusions about the perceived "value" of these 14 papers but conclusions could certainly be drawn about authors' reasons for citing an article. The top two papers are "obligatory citations": authors cite them when they use the CSD or the UK Chemical Database System.

The progress of science depends on researchers building on what has been discovered in the past. In seeing the work of others we get new ideas of our own. Sometimes this may be an example of "standing on the shoulders of giants". In the circumstances of this symposium, it seems fitting to quote an "Irishism" of Mike Lynch's: Here we sit side by side with those on whose shoulders we stand" . Peter Willett has been a worthy successor to Mike Lynch and has brought accolade and success to the department where Mike was an early leader. I offer him my sincerest congratulations on winning the ACS Award for Computers in Chemical and Pharmaceutical Research.

.This page last updated 26th February 2006