Beginners Guide to SRA Results
The GBIF data cache, like other natural history collections, is an ad hoc data set that has developed from the efforts of multiple collectors over long periods of time. Natural history collections have been shown to give accurate species richness estimates, though they may show skewed abundance distributions due to preferential selection of under-represented species (Guralnick and Van Cleve 2005). A major strength of natural history collections is the potential to determine changes in species richness through time, especially at regional and continental scales.
Determining taxon sampling effort in a user-selected geographic region or time period is a necessary first step towards appropriate use of occurrence datasets for further hypothesis testing of species richness change over time and space. Methods have been developed to examine potential sampling bias and make better predictions of species richness given sampling issues. Colwell and Coddington (1994) and Gotelli and Colwell (2001) discuss some approaches for different kinds of sampling schemes. Natural history data best fit into the description of individual based assessments as defined by Gotelli and Colwell (2001), and they suggest the use of rarefaction curves as a means to begin assessing species richness and its variance due to sampling. These curves are the basis of the calculations performed by the SRA tool.
The SRA tool outputs the following:
1. Summary statistics
2. Two tables of non-parametric richness estimators in comma-separated files, corresponding to randomization with and without replacement
3. Statistical feedback
4. Seven figures, as both EPS and GIF image formats
These calculations may alternatively be made in EstimateS. See the additional documentation for a comparison of the analyses carried out in EstimateS and the GBIF-MAPA SRA tool. The user manual for EstimateS provides a more in depth discussion of the non-parametric species richness estimators. A list of references for further information are provided below.
Summary statistics
These are calculated directly from the sample. Figure 6, an octave abundance plot, and Figure 7, a rank abundance plot, also present information about the sample (see below).
Incidence statistics:
Variable |
Description |
S obs |
Total number of species observed in all samples pooled |
S rare |
Number of rare species (10 or fewer individuals) when all samples are pooled |
S abund |
Number of abun dant species (more than 10 individuals) when all samples are pooled |
S infr |
Number of infrequent species (found in 10 or fewer samples) |
S freq |
Number of frequent species (found in more than 10 samples) |
m infr |
Number of samples that have at least one infrequent species |
Q 1 |
Number of species that occur in exactly 1 sample (the frequency of uniques) |
Q 2 |
Number of species that occur in exactly 2 samples (the frequency of duplicates) |
N rare |
Total number of individuals in rare species |
N infr |
Total number of incidences (occurrences) of infrequent species |
C ice |
Sample incidence coverage estimator |
Abundance statistics
Variable |
Description |
m |
Total number of samples |
F 1 |
Number of species that have exactly 1 individual when all samples are pooled (the frequency of singletons) |
F 2 |
Number of species that have exactly 2 individuals when all samples are pooled (the frequency of doubletons) |
C ace |
Sample abun dance coverage estimator |
Tabular data – estimates of true species richness
These estimators are calculated through repeated random subsampling of the species occurrence data (note that some of the estimators and standard deviations are calculated analytically). They include abundance-based estimators—Chao 1 (Chao, 1984) and ACE (Chao and Lee, 1992; Chao et al., 1993)—and incidence-based estimators—Chao 2 (Chao, 1987), ICE (Lee and Chao, 1994), jackknife estimators Jack 1 and Jack 2 (Burnham and Overton, 1978, 1979; Heltshe and Forrester, 1983; Smith and van Belle, 1984), and Bootstrap (Smith and van Belle, 1984).
The SRA tool always performs 100 runs (though the underlying engine, available from Eco-tools, may be set to any number). This data may be downloaded as two tables, in comma-separated data files, corresponding to randomization performed with and without replacement. Several estimators are shown in Figure 1 (see below).
Estimator |
Description |
Samples (Qd) |
(Analytical) Number of samples (Quadrats) accumulated |
Individuals |
(Analytical) |
Sobs (Mao Tau) |
(Analytical) Number of species expected in the pooled Qd samples |
Sobs Mean |
Number of species in the pooled samples (mean among runs) |
Sobs SD |
(Analytical) |
Sobs 95% CI |
(Analytical) |
|
|
Singletons Mean |
Number of species with only one individual, mean among runs |
Singletons SD |
Standard deviation of Singletons, among randomizations |
Doubletons Mean |
Number of species with only two individuals, mean among runs |
Doubletons SD |
Standard deviation of Doubletons, among randomizations |
|
|
Uniques Mean |
Number of species that occur in a only one sample, mean among runs |
Uniques SD |
Standard deviation of Uniques, among randomizations |
Duplicates Mean |
Number of species that occur in a only two samples, mean among runs |
Duplicates SD |
Standard deviation of Duplicates, among randomizations |
|
|
ACE Mean |
Abundance-based Coverage Estimator, mean among runs |
ACE SD |
Standard deviation of ACE, among randomizations |
|
|
ICE Mean |
Incidence-based Coverage Estimator, mean among runs |
ICE SD |
Standard deviation of ICE, among randomizations |
|
|
Chao1 Mean |
Chao 1 richness estimator, mean among runs |
Chao1 95% CI |
Chao 1 log-linear confidence interval |
Chao1 SD |
(Analytical)Chao 1 standard deviation (by Chao's formulas) |
|
|
Chao2 Mean |
Chao 2 richness estimator, mean among runs |
Chao2 95% CI |
Chao 2 log-linear confidence interval |
Chao2 SD |
(Analytical) Chao 2 standard deviation (by Chao's formula) |
|
|
Jack1 Mean |
First-order Jackknife richness estimator mean among runs |
Jack1 SD |
First-order Jackknife std. deviation, among randomizations |
Jack2 Mean |
Second-order Jackknife richness estimator mean among runs |
Jack2 SD |
Standard deviation of Jack2, among randomizations |
|
|
Bootstrap Mean |
Bootstrap richness estimator, mean among runs |
Bootstrap SD |
Standard deviation of Bootstrap, among randomizations |
|
|
IRC |
Individual-based rarefaction curve |
Statistical feedback
The SRA tool performs two tests of the assumption of sample set homogeneity that underlies these methods. The tests only make sense if the samples are ordered in some way; if they are not, you can ignore the results of the tests.
-The first test is for a trend in the rate of observation per unit effort, which is most useful when your samples are ordered in time. A trend may indicate immigration, emigration, or perhaps depletion caused by the sampling process itself. A large trend will invalidate any estimates of species richness.
-The other test is for a trend in community composition, using canonical correspondence analysis with sample order as a predictor. This is relevant when your samples are ordered in either space or time (or both). A trend here indicates progressive change in community composition over time (e.g., seasonality) or space (e.g., an ecotone), and should also cause the researcher to re-examine his or her assumptions.
Rate of observation per unit effort test
result |
message |
% change ≥10, p<0.05 |
The data show a large and significant change in the number of records per unit effort over the sequence of samples (percent change = [value], p=[value]). If your samples are ordered in space (e.g., a transect), this raises a question about the assumption of homogeneity underlying the estimation of species richness. If the samples are ordered in time, and the change is negative, consider whether it might be due to the removal of individuals in the sampling process (depletion)." |
% change <10, p<0.05 |
The data show a statistically significant but nevertheless small change in the number of records per unit effort over the sequence of samples (percent change = [value], p=[value]). If your samples are ordered in space (e.g., a transect), this raises a question about the assumption of homogeneity underlying the estimation of species richness. If the samples are ordered in time, and the change is negative, consider whether it might be due to the removal of individuals in the sampling process (depletion). However, the trend is small enough that it will have little impact on the analyses below. |
% change <10, p≥0.05 |
No statistically significant trend was detected in the number of records per unit effort (percent change = [value], p=[value]). This is good! |
Community composition test
result |
message |
% var. explained ≥10, p<0.05 |
Canonical correspondence analysis of the data indicates a substantial change in community composition over the sequence of samples (percent variance explained= [value], p=[value]). If your samples are ordered in space (e.g., a transect), or spread out over an interval of time (e.g, showing seasonal effects), the assumption of homogeneity underlying the estimation of species richness may not be valid. |
% var. explained <10, p<0.05 |
Canonical correspondence analysis of the data indicates a significant but nevertheless small change in community composition over the sequence of samples (percent variance explained= [value], p=[value]). If your samples are ordered in space (e.g., a transect), or spread out over an interval of time (e.g., showing seasonal effects), you might want to question the assumption of homogeneity underlying the estimation of species richness. However, the change is small enough that its effect on the analyses below will be negligible. |
% var. explained <10, p≥0.05 |
Canonical correspondence analysis of the data indicates no substantial, significant change in community composition over the sequence of samples (percent variance explained= [value], p=[value]). This is good! |
Figures
The figures, except the octave and rank abundance plots, may be reproduced using the CSV output tables.

Figure 1 uses the estimates generated without replacement, but the confidence intervals for each graph are based on the standard deviations calculated in the with replacement graph (the without replacement SDs trend towards zero as the sample size approaches the total number of species sampled). Do the curves reach their asymptotes? A curve that plateaus does not imply that the underlying data is well-sampled and representative of the whole region of interest; rather, it only implies that the data set well represents a subset of data defined by the limits and biases of the sampling methods used to collect the data set.

Figure 2 - IRC. The black line is the individual-based rarefaction curve. The dark blue line below it is the sample rarefaction curve (exp τ ). The confidence interval shown in this figure (light blue) assumes that the true species richness is given by the Chao 2 estimator (as per EstimateS).The red line is S-hat Chao2, and the light pink region is the confidence interval for Chao 2.

Figure 3 - IRC. The black line is the individual-based rarefaction curve. The dark blue line below it is the sample rarefaction curve (exp τ ). The confidence interval shown in this figure (light blue) assumes that the true species richness is given by the upper bound of the 95% confidence interval for the Chao 2 estimator. This is a conservative approach, probably overly-conservative. The red line is S-hat Chao2, and the light pink region is the confidence interval for Chao 2.

Figures 4 and 5 show several estimators from the randomization runs. Figure 4 is taken from the without replacement table, Figure 5 from the with replacement table. The key for each figure may be used to pull the relevant information from the tables.

Figure 6 – Octave abundance plot. The data used to construct this plot is not output to the user.

Figure 7 – Rank abundance plot. The data used to construct this plot is not output to the user.
References
Burnham, K.P. and W.S. Overton. 1978. Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65: 623-633.
Burnham, K.P. and W.S. Overton. 1979. Robust estimation of population size when capture probabilities vary among animals. Ecology 60: 927-936.
Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scandinavian Journal of Statistics 11: 265-270.
Chao, A. 1987. Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43: 783-791.
Chao, A., Chazdon, R. L., Colwell, R. K. and Shen, T.-J. 2005. A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters 8: 148–159.
Chao, A. and S.-M. Lee. 1992. Estimating the number of classes via sample coverage. Journal of the American Statistical Association 87: 210-217.
Chao, A., M.-C. Ma, and M.C.K. Yang. 1993. Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80: 193-201.
Colwell, R. K and J. A. Coddington. 1994. Estimating terrestrial biodiversity through extrapolation. Phil. Trans Roy. Soc. Lond. B 345: 101–118.
Colwell, R. K. 1994–present. EstimateS: statistical estimation of species richness and shared species from samples. http://viceroy.eeb.uconn.edu/estimates. [Persistent URL: http://purl.oclc.org/estimates.]
Colwell, R. K., C. X. Mao and J. Chang. 2004. Interpolating, extrapolating, and comparing incidence-based species accumulation curves. Ecology 85: 2717–2727.
Gotelli, N. and R. K. Colwell. 2001. Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters 4: 379–391.
Guralnick, R.P. and J. Van Cleve. 2005. Strengths and weaknesses of museum and national survey data sets for predicting regional species richness: comparative and combined approaches. Diversity and Distributions 11: 349-359.
Heltshe, J. and N.E. Forrester. 1983. Estimating species richness using the jackknife procedure. Biometrics 39: 1-11.
Lee, S.-M. and A. Chao. 1994. Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50: 88-97.
Smith, E.P. and G. van Belle. 1984. Nonparametric estimation of species richness. Biometrics 40: 119-129.