The final version of this manuscript is published in “Vegetation History and Archaeobotany”, Online First with Open Access and is free fordownload here.
R M Fyfe, J-L de Beaulieu, H Binney, R H W Bradshaw, S Brewer, A Le Flao, W Finsinger, T Giesecke, G Gil-Romera, P Kunes, N Kühl, M Leydet
Pollen stratigraphies are the most spatially-extensive data available for the reconstruction of past land-cover change. Detailed knowledge of past land cover is becoming increasingly important to evaluate the present trends in and drivers of vegetation composition. The European Pollen Database (EPD) was established in the late 1980s and developed in the early 1990s to provide a structure for archiving, exchanging, and analysing pollen data from throughout Europe. It provides a forum for scientists to meet and engage in collaborative investigations or data analysis. In May 2007 several EPD support groups were developed to assist in the task of maintaining and updating the database. The mapping and data accuracy work group (MAPCAP) aims to produce an atlas of past plant distributions for Europe, in order to meet the growing need for these data from palaeoecologists and the wider scientific community. Due to data handling problems in the past a significant number of datasets that are in the EPD have errors. The initial task of the work group, therefore, was a systematic review of pollen sequences, in order to identify and correct errors. The EPD currently (June 2008) archives 1001 pollen sequences, of which 671 sequences have age–depth models that allow chronological comparison. Many errors have been identified and corrected, or flagged for users, most notably errors in the pollen count data. We discuss here the types of errors encountered. The application of spatial analyses to pollen data is related to the number of data points that are available for analysis. We therefore take this opportunity to encourage the submission of pollen analytical results to the relevant pollen database. Only in this way will the wider scientific community be able to gain a better understanding of past vegetation dynamics.
The potential value of palaeoecological and geological databases has increased considerably in recent years, driven by increasing amounts of data and the use of dynamic vegetation models to study the past and forecast the future (Miller et al. 2008, Sitch et al. 2008). Databases such as the European Pollen Database (www.europeanpollendatabase.net)(EPD) are now considerably more than long-term data repositories and have become important tools in multi-disciplinary research projects. The large body of European pollen data is widely dispersed in the literature, but when organised into a common format becomes accessible for research into broad-scale vegetation dynamics and its interactions with climate and long-term development of human societies. The intention of this paper is (i) to review the development of the EPD and (ii) to highlight efforts made and solutions found to improve this data archive such that it may better serve the wider scientific community in the future. The decision to produce such a paper was driven by the combined aims of (1) presenting the work of a Mapping and Data Accuracy Support Group established in 2007; and (2) setting a background for important discussions surrounding the future development of the EPD. It is not intended as a definitive review, and parts may not reflect the views of the EPD community or those involved in its management.
Pollen stratigraphies are probably the most spatially-extensive data available for the reconstruction of past changes in terrestrial and aquatic vegetation composition. In addition to using pollen records to investigate vegetation dynamics at individual sites through time, paleoecologists have used the large amount of information stored in the database to address a range of scientific questions at regional or continental scales, such as i) the reconstruction of patterns of past climate change through time and space (e.g. Davis et al. 2003), that in turn is important in hindcasting studies evaluating general circulation models (Bonfils et al. 2004); ii) studies on the spread of plants, especially trees, since the last glaciation (e.g. Brewer et al. 2002; Terhürne-Berson et al., 2004; Giesecke and Bennett, 2004; Conedera et al., 2004; Krebs et al., 2004; van der Knaap et al, 2005; Magri, 2008); iii) reconstructions of past plant distribution patterns which allow to test our understanding of factors limiting these and models that attempt to capture them (e.g. Giesecke et al. 2007) and increased precision of reconstruction POLLANDCAL activities (others to add, Sugita refs). In addition, knowledge of pollen-inferred past land-cover changes makes it possible to evaluate the consequences and legacies of past land use and it provides information on the dynamic responses of vegetation with regard to a constantly changing environment. This may allow us to evaluate threats to our natural environment and define aims for the conservation and management of Europe’s landscape (Anderson et al. 2006; Willis et al. 2007).
The increasing understanding of these topics, and the need to analyse spatial patterns, makes it necessary to draw information from more than a single pollen record. Many of these investigations require the availability of datasets because only limited information can be extracted from published printed pollen diagrams. Data archives such as the European Pollen Database (EPD) and other global databases thus play an important role in the collation and archiving of data at the extensive spatial scales needed for analyses at regional or continental scales. The storage of original pollen count data and their associated metadata is therefore important for between site comparisons and spatial analyses. The EPD has become the main archive for provision of the above functions for pollen analytical results from western Eurasia. In addition to serving as a data archive for extensive spatial analyses, the EPD also plays an important role for individual data contributors in mitigating against the inevitable metadata loss (“information-entropy”, sensu Michener et al., 1997) that occurs through time (Figure 1). Both count data and metadata have a natural decay function through time as a result of memory recall, accidental data loss, changes in storage media and subsequent incompatibility, and retirement or death of the original investigators.
Towards the end of the 1980s the IGCP 158b working group led by B. Berglund and M. Ralska-Jaziewiszova, and palynologists involved in EU research programs on palaeoclimatology (A. Pons, W. Watts, B. Huntley) both needed improved maps of palaeovegetation based on the large sets of pollen data acquired since the 1960s. The work of Huntley and Birks (1983) had demonstrated the value of spatial analysis of pollen data, and had a great influence on many European palynologists. These groups realised the need for a pollen database (the EPD). In 1988 the first EPD-related discussions and meetings took place involving, amongst others, J. Guiot, C. Prentice, B. Huntley, B. Berglund and G, Jacobson. These discussions resulted in a 1989 meeting, convened by B. Berglund, to discuss the organisation of the database, which was attended by representatives of 18 European countries. The proposal by A. Pons to host the EPD in Arles was accepted. Two subsequent workshops in Arles (1989) and Wilhelmshaven (1990) led to the definition of the database software, the administrative structure (an Advisory Board representative of the different European regions, and an Executive Committee of three persons invited to meet every year to review EPD progress: Table 1) and the “protocol” ruling the rights and the duties of data contributors and users. The EPD and the North American Pollen Database (NAPD) were established simultaneously through active collaboration with Eric Grimm and John Keltner. This was done to ensure compatibility and to contribute towards the ultimate goal of a Global Pollen Database. In 1990, thanks to European and French national funding, R. Cheddadi was appointed to work alongside Joel Guiot as EPD Manager, and in January 1991 the first newsletter was sent to European Quaternary palynologists asking them to contribute their data to the EPD.
The EPD was constructed to provide an inclusive and permanent archival facility to all palynologists for storing the basic data that had been generated within European research. It was anticipated that the EPD would also become a tool by means of which further research on biogeographical, palaeoclimatological and palaeoecological problems could be addressed, at a variety of different spatial and temporal scales. The role of past environmental archives in the understanding of global climate change was clear from the early 1980s; contemporary societal concerns surrounding climate change have resulted in an even greater role for archives of past environments.
The 1990s was a very busy time for the EPD, with numerous training courses organised in Arles and elsewhere. The BIOME6000 initiative stimulated palynologists to share pollen data, and much of the former Soviet Union and Mongolian data was compiled as part of this project, facilitated by funds obtained from the EU (INTAS) to promote the participation of these partners (Prentice et al., 1996; Tarasov et al., 1998). Unfortunately, the team hosting the EPD (IMEP) did not succeed in securing a permanent position for a database manager from their French administration. From 1995 to 2003 funding for the further development and management of the EPD was dependent upon collaboration with foresters involved in EU research projects (Fairoak, Cytofor and Fossilva) that used the EPD as a tool linking phylogeography and palaeobiogeography of forest trees (Petit et al., 2002; Cheddadi et al., 2006; Magri et al., 2006). A consequence of linking the funding of the EPD to research projects was that the database manager became the principal user of the EPD, resulting in less of his time being available to undertake data compilation tasks. When the Fossilva project ended in the early 2000s the EPD was unfunded. It managed to survive thanks to the altruistic contribution of R. Cheddadi, whose position was now supported by other projects, thus limiting his ability to commit time to the EPD. The EPD became a relict database, with no development or incorporation of new data. At the end of 2006 IMEP obtained a permanent position for a new database manager (M. Leydet) from the University of Aix-Marseille and data compilation resumed, with the support of the NOE EVOLTREE project.
In May 2007 a special open meeting to discuss the future of the EPD was convened by Richard Bradshaw in Arbois (France) under the auspices of a EuroCLIMATE workshop. The workshop, attended by 78 European palynologists, had a range of outputs that can be reviewed on the EPD website (http://www.europeanpollendatabase.net). One output was the formation of a range of support and working groups to help maintain and update the database. The Mapping and Data Accuracy working group (MADCAP) is one of these EPD support groups formed at the 2007 meeting, with representatives from across Europe (Table 1). The aim of MADCAP is the production of a palaeovegetation atlas, based on the EPD. These working groups reported to a well-attended open meeting of the EPD at the International Palynological Congress, Bonn in 2008 where a new administrative structure for the EPD was proposed and accepted. It was decided that the EPD would be managed by a board comprising an elected chairperson and the spokespersons of the working groups. The term of office of the chairperson would be four years.
Since its establishment in 1992, many European palynologists have submitted pollen counts to the EPD and in June 2008 a total of 1001 pollen sequences from 849 sites were held in the database; together these sum to well over one million individual pollen counts. Of these sequences, referred to in the database as entities, 671 have an associated chronology that is in most cases based on an age–depth model that makes use of radiocarbon-dated samples from a series of known depths (Figures 2, 3 and 4). The full original information on age determinations, most commonly radiocarbon, is stored in archival tables of the database. Age–depth models derived from this information are stored in research tables and several age–depth models may be stored for any individual entity. Currently these models are generally based on the un-calibrated radiocarbon time scale. However, efforts are being made to construct age–depth models using calibrated radiocarbon age determinations. The EPD also gives palynologists the option to archive datasets but restrict access to the count data. This option may be used by authors who are still in the process of publishing their data but at the same time wish to highlight its existence because this may lead to the establishment of new collaborations. The EPD currently holds data for 143 sites that have a restricted status.
As a result of the considerable collaborative effort from the outset, the database is largely compatible with other continental databases and the Global Pollen Database. It currently contains 57 tables divided into 5 categories: archival; look-up; research; system; and views. Of these, the archival tables contain the original data (e.g. counts), the look-up tables contain reference information (e.g. plant taxonomy) and the research tables contain information relating to analyses of the data (e.g. age–depth models). The database is currently managed using Paradox®, although it has also been transferred to PostgreSQL® to allow web access and there is a general agreement to migrate to a new database system, expected in late-2008. The complex table structure of the EPD was designed to make full use of the power of a relational database, so that all entries in the database can be queried in parallel. It is thus possible, for example, to find all sites with chronological information that had more than 1% Plantago lanceolata pollen 5000 radiocarbon years ago. In order to be able to execute such a query, the user needs to download the full database. Users less experienced with working with Paradox or PostgreSQL databases may find it easier to work with the database in Microsoft Access®, and an MS Access version of the database has been provided for download from the EPD web site.
Users who are interested in working with a small number of sites or particular sites may find it easier to use the online facilities. Currently the data in the EPD can be browsed, queried, visualised and selected datasets downloaded using the ‘Fossil Pollen Database Viewer’ developed by Nicolas Garnier at MEDIAS-France in close co-operation with the African Pollen Database.
Developments in approaches to storing and accessing palaeoecological datasets will result in changes to the underlying structure of the EPD in the near future. There is a need to migrate the EPD to a modern SQL-compatible database structure and to change the table structure in line with other palaeoecological databases. These developments will offer the opportunity to combine other types of palaeoecological data into the database. For example, the growing macrofossil databases with European data could be easily incorporated. A downloadable MS Access version of the database will become available with a range of different example queries that allow inexperienced database users to gain full use of all aspects of the database. These developments will make it simpler for all palaeoecologists to access and use the database. It is hoped that greater use of the EPD will also lead to positive feedback and user-led improvements, for example, improved age–depth models generated within research projects could be submitted. Maintenance of the database in both the current and projected future format requires substantial effort from the database Manager. Collaborative community effort through new initiatives such as the EPD support groups will inevitably lead to a database that is better positioned to serve the needs of European palynologists and other potential users.
The mapping and data accuracy work group of the EPD was established at the open meeting in France in 2007 and aims to make the data in the EPD more available to the scientific community and thus to enhance their use. The key goal of the group is to produce a new web-based version of a European palaeovegetation atlas that provides maps of past pollen percentages for visualisation, teaching purposes and as a basis for data–model comparisons. In order to achieve this goal the group has undertaken a systematic review of the data currently held in the EPD with the aim of identifying problems with individual site records, and of flagging errors for correction within the database (see below). This process has followed a standardized protocol. Data have been downloaded from the EPD, pollen diagrams constructed and, wherever possible, checked against the original publications. In the first instance sites that have some chronological control were targeted and age–depth models were included in the review process, as these will form the basis for the palaeovegetation atlas. The age–depth models for each site within the database have also been checked. Members of the group combine different regional expertise so that diagrams from most European regions were checked by a person with knowledge about their regional vegetation history. Where the types of data handling errors described above have been identified, this has been fed back to the database Manager, who is a member of the group, for flagging or, where possible, correction. Until present, Madcap members have checked 711 pollen sequences.
Generation of the new palaeovegetation atlas is in progress using the dated sites from within the EPD. At the present time age–depth models are being constructed. The final dataset used to compile the atlas will be made available to the wider community following completion of the project, as will gridded results for each taxonomic unit.
The EPD evolved at a time when personal computing was an emerging growth area rather than a matter of routine. A consequence of this is that there are a number of errors within the EPD that result from data entry, handling, and conversion. Errors are manifest in both the metadata and in the raw count data. Although these errors are more common within older datasets, they still occur within datasets that have been submitted more recently, and are inevitable. The most severe errors in the metadata are incorrect latitude/longitude information that may lead to site offsets of hundreds of kilometres, and incorrect or missing site references. These errors were encountered in ca. 1.1% and in ca. 6.6% of the sequences checked, respectively (Table 2). A less severe error for the production of a palaeovegetation atlas is the incorrect or missing elevation of a site. With regional knowledge these errors generally are easily detected and correctable.
|error types/corrections made||# of errors||% of entities checked||% of entities with errors|
|Errors in the count data||Error in counts||26||3.7||56.5|
|samples switched or assigned incorrect depth||5||0.7||10.9|
|New chronology made||9||1.3|
|Coordinates||incorrect long/lat corrected||8||1.1|
|incorrect elevation corrected||3||0.4|
|References||incorrect/missing reference corrected||47||6.6|
Errors within the raw counts are typically the result of the process of conversion of the dataset into the EPD. Errors may be systematic within sites (e.g. switching of count data between taxon A and B), or random (e.g. one taxon for an individual sample swapped for another). These errors are usually obvious: The count for Artemisia may be switched for Alnus for a single sample, for example, resulting in an isolated high value associated with an atypical low value in the other. In very few cases, entire samples were switched (e.g. switching of depth between sample X and Y) or assigned an incorrect depth. These errors can be confirmed by checking against the original publication. It is possible that count data for individual taxa are missed through the data conversion. In cases such as this the original data contributor may need to be contacted to refresh the dataset, where that is still possible.
The errors described so far are grave and unfortunate because they are situated in the archival tables. Less serious for the database, but important for users, are errors, misjudgements or misinterpretations in the construction of age–depth models. Age–depth models that are based on very few age determinations often interpolate and/or extrapolate over many thousands of years. This can result in errors that sometimes may be obvious, for example, where a late-glacial pollen spectrum is assigned to a Holocene age or vice versa. Madcap members have, until present, suggested changes to the chronologies or already made new ones for a number of sequences (Table 2). Chronologies based on only one or two radiocarbon dates have been flagged due to their potentially low precision. However, users are encouraged to be critical of the available age–depth models and where necessary to construct their own. Users that construct new age–depth models are encouraged to submit these to the EPD where they will be stored in research tables.
All contributors and users of the EPD are strongly encouraged to report any errors they encounter to the database Manager. Each error should be clearly described and, if possible, suggestions made as to how it may be resolved.
Submission of pollen data offers a range of benefits to individual European palynologists and to the community as a whole. Over time, memories fade, storage media evolve and data loss occurs. The EPD provides an archive that can mitigate against at least part of this loss. Submission of data may act as an additional tool for wider dissemination of findings, especially in cases where the data are published in books or regional journals, or in languages other than English. In the past, submission of data has led to increased collaboration between data contributors, resulting in innovative projects, joint publications and greater citation of work. In order further to facilitate this, the international peer-reviewed journal Grana now offers short publications of pollen diagrams that have been submitted to the EPD in a new ‘Contributions to the European Pollen Database’ section (Bradshaw, 2007; recent examples include Jankovská et al. 2007 and Stefanova et al. 2008). The EPD has also dedicated space on the website for a wiki (www.europeanpollendatabase.net/wiki/). This space has been developed to foster discussion, community building, participation and knowledge exchange amongst European pollen analysts.
The precision of spatial analyses of pollen data is related to the number of data points that are available for analysis. Such studies, and their application to data from within the EPD outlined above, is made possible by the willingness of European pollen analysts to place their data in public-access archives. The original aims of the EPD were to provide an archival facility for the pollen analytical community on the understanding that data would be accessible and open for use. This remains true, and the importance of such an archive has been seen in recent trends towards using observations of past environmental changes ad their consequences to improve forecasts of future changes and their potential impacts. We wish to take this opportunity to encourage the submission of pollen analytical results as a routine part of research. Only in this way will we be able to gain a better understanding of the past and contribute to a sustainable future.
We wish to thank the efforts of all the individuals who have played a role in getting the EPD to the position it is in today and in particular Joel Guiot, Rachid Cheddadi, Eric Grimm, John Keltner, Brian Huntley and all those who have served as members of the EPD Advisory Board and Executive Committee. A number of these provided valuable constructive comments on an earlier version of this commentary, including Marie-José Gaillard, Sheila Hicks, Wim Hoek, Brian Huntley, Andy Lotter and Pavel Tarasov. The open meeting of the EPD in 2007 was supported by the European Science Foundation EuroCLIMATE programme. We also wish to thank all those who have contributed data to the EPD.
(RHWB to add)