The process of rarefaction is necessary in order to compare species number (S) from samples of different sizes.
Remember that
Logically, the sum of the Ni values must be equal to N.
# of Fish from three lakes |
||||
Species of fish |
North America |
Central America |
Argentina |
|
A |
12 |
|||
B |
5 |
|||
C |
4 |
33 |
||
D |
3 |
32 |
||
E |
1 |
34 |
||
F |
33 |
|||
G |
42 |
|||
H |
23 |
|||
I |
16 |
|||
J |
14 |
|||
K |
6 |
|||
L |
5 |
|||
total | 25 |
132 |
106 |
=N |
each cell is an Ni | ||||
# of species | 5 |
4 |
6 |
=S |
To compare all three lakes, we need to rarefy the samples from Central America and Argentina to the smallest sample, North America
The book does not say, but n must be THE SMALLEST SAMPLE SIZE
The criterion is that N > n, or you will not be able to do the combinatorials when N < n
Therefore, rarefaction always adjusts down, never up.
So, we can only ask "How may species would I have gotten in this sample if it had been as small as the smallest sample?
We will use n = 25 from the North American A sample and rarefy the North American B and Argentine samples
Central America |
||||||||
N |
n |
Ni |
N-Ni |
N-Ni n Factorial |
N n Factorial |
fraction |
1--fraction |
|
C |
132 |
25 |
33 |
99 |
1.82E+23 |
6E+26 |
0.0003 |
1 |
D |
132 |
25 |
32 |
100 |
2.43E+23 |
6E+26 |
0.0004 |
1 |
E |
132 |
25 |
34 |
98 |
1.36E+23 |
6E+26 |
0.0002 |
1 |
F |
132 |
25 |
33 |
99 |
1.82E+23 |
6E+26 |
0.0003 |
1 |
Total = |
4 species |
In the Central American lake sample, we do not get much of a correction (too small to show up). Why?
This situation is a bit different for the Argentine sample, where the sample is not so even, although the richness is greater.
Argentina |
||||||||
N |
n |
Ni |
N-Ni |
N-Ni n Factorial |
N n Factorial |
fraction |
1--fraction |
|
G |
106 |
25 |
42 |
64 |
4.01E+17 |
1E+24 |
3E-07 |
1 |
H |
106 |
25 |
23 |
83 |
1.08E+21 |
1E+24 |
0.0008 |
1 |
I |
106 |
25 |
16 |
90 |
1.16E+22 |
1E+24 |
0.0091 |
0.99 |
J |
106 |
25 |
14 |
92 |
2.2E+22 |
1E+24 |
0.0172 |
0.98 |
K |
106 |
25 |
6 |
100 |
2.43E+23 |
1E+24 |
0.1902 |
0.81 |
L |
106 |
25 |
5 |
101 |
3.22E+23 |
1E+24 |
0.2528 |
0.75 |
Total = |
5.53 species |
Here, there is a noticeable correction. Why?
The last example rearranges the Argentine data, but keeps the number of species (6) and total sample size the same (106). What it does is make species G more dominant at the expense of all other species (look at the Ni column here and compare with the previous Argentina table).
Argentina - with almost all fish from species G | ||||||||
N |
n |
Ni |
N-Ni |
N-Ni n Factorial |
N n Factorial |
fraction |
1 - fraction |
|
G |
106 |
25 |
80 |
26 |
26 |
1E+24 |
0.00 |
1.00 |
H |
106 |
25 |
9 |
97 |
1.01E+23 |
1E+24 |
0.08 |
0.92 |
I |
106 |
25 |
7 |
99 |
1.82E+23 |
1E+24 |
0.14 |
0.86 |
J |
106 |
25 |
5 |
101 |
3.22E+23 |
1E+24 |
0.25 |
0.75 |
K |
106 |
25 |
3 |
103 |
5.64E+23 |
1E+24 |
0.44 |
0.42 |
L |
106 |
25 |
2 |
104 |
7.42E+23 |
1E+24 |
0.58 |
0.24 |
Total = |
4.50 species |
I included this example for two reasons
The -1 from the ((N - Ni) - n) factorial (24 - 25) is the problem. I might just set it to 0 to do the calculations in the table because
there is no such thing as a factorial of a negative number and any combinatorial with this is 0 by definition.
So, we see that rarefaction is a bit more involved than the text makes out, but still a worthy exercise for the mind.
In addition, honesty makes me disclose that this is not the only way to rarefy a sample
For example, one might use a bootstrap approach (made possible by the speed of computers)
In this approach, one subsamples a large sample repeatedly and then calculates the parameter of interest based on the subsamples
- For the above example on the Argentine lake, one would make a subsample by taking the 106 individuals in the sample and randomly choosing only 25 of the 106 to be in the subsample. One would then count the number of species in the subsample and that would be one bootstrap estimate for species richness
Next, one would resample the original 106 individuals again, choosing another 25 at random (some of those in the second subsample could have been in the first) and recalculate the number of species
Repeat the subsampling many times (this is why a computer is necessary, 10,000 is a good number of subsamples), each time getting an estimate of the richness
Finally, average the 10,000 richness values for the 10,000 subsamples and use that average as your rarefied estimate of richness [ = E(S) in the formula above].
Last Updated on April 10, 2006