Chapter 16 Handout on Rarefaction Calculation

BIOL 4120

Principles of Ecology

Harned Hall 301

(615) 963 - 5782

 

Above: Eucalypt trees from Australia growing in Brazil to provide fiber for disposable diapers. The newly planted trees are in the foreground and the dark green band behind them is the forest after only 5 years! The native ecosystem, the Atlantic Forest, is one of the most diverse and threatened terrestrial ecosystems.

The process of rarefaction is necessary in order to compare species number (S) from samples of different sizes.

Remember that

  • N = total sample size
  • S = number of species
  • n = standard sample size used for comparison
  • Ni = number of individuals in the ith species

    Logically, the sum of the Ni values must be equal to N.

# of Fish from three lakes

Species of fish

North America

Central America

Argentina

 

A

12

   

B

5

   

C

4

33

 

D

3

32

 

E

1

34

 

F

 

33

 

G

   

42

H

   

23

I

   

16

J

   

14

K

   

6

L

   

5

       
total 

25

132

106

=N 
each cell is an Ni 
# of species 

5

4

6

=S

To compare all three lakes, we need to rarefy the samples from Central America and Argentina to the smallest sample, North America

The book does not say, but n must be THE SMALLEST SAMPLE SIZE

The criterion is that N > n, or you will not be able to do the combinatorials when N < n

Therefore, rarefaction always adjusts down, never up.

So, we can only ask "How may species would I have gotten in this sample if it had been as small as the smallest sample?

We will use n = 25 from the North American A sample and rarefy the North American B and Argentine samples

Central America

 

N

n

Ni

N-Ni

N-Ni n Factorial

N n Factorial

fraction

1--fraction

C

132

25

33

99

1.82E+23

6E+26

0.0003

1

D

132

25

32

100

2.43E+23

6E+26

0.0004

1

E

132

25

34

98

1.36E+23

6E+26

0.0002

1

F

132

25

33

99

1.82E+23

6E+26

0.0003

1

                  

Total =

4 species

In the Central American lake sample, we do not get much of a correction (too small to show up). Why?

  • This sample is very even, and if you reduce the sample size, all four species should be sampled, as all are about equally likely to be sampled

This situation is a bit different for the Argentine sample, where the sample is not so even, although the richness is greater.

Argentina

 

N

n

Ni

N-Ni

N-Ni n Factorial

N n Factorial

fraction

1--fraction

G

106

25

42

64

4.01E+17

1E+24

3E-07

1

H

106

25

23

83

1.08E+21

1E+24

0.0008

1

I

106

25

16

90

1.16E+22

1E+24

0.0091

0.99

J

106

25

14

92

2.2E+22

1E+24

0.0172

0.98

K

106

25

6

100

2.43E+23

1E+24

0.1902

0.81

L

106

25

5

101

3.22E+23

1E+24

0.2528

0.75

     

Total =

5.53 species

Here, there is a noticeable correction. Why?

  • The less common species are much more rare than are the most common, and so, they might not be sampled at all in a smaller sample.

The last example rearranges the Argentine data, but keeps the number of species (6) and total sample size the same (106). What it does is make species G more dominant at the expense of all other species (look at the Ni column here and compare with the previous Argentina table).

Argentina - with almost all fish from species G
 

N

n

Ni

N-Ni

N-Ni n Factorial

N n Factorial

fraction

1 - fraction

G

106

25

80

26

26

1E+24

0.00

1.00

H

106

25

9

97

1.01E+23

1E+24

0.08

0.92

I

106

25

7

99

1.82E+23

1E+24

0.14

0.86

J

106

25

5

101

3.22E+23

1E+24

0.25

0.75

K

106

25

3

103

5.64E+23

1E+24

0.44

0.42

L

106

25

2

104

7.42E+23

1E+24

0.58

0.24

                  

Total =

4.50 species

I included this example for two reasons

  • Reason 1 - notice that the effect here is much more drastic because the community is much less even. Now, you expect to get only 4.18 species when you sample only 25 individuals.
  • Reason 2 - Suppose that the number of species G was 82 (and there were two less of Species K, to keep the total the same). When you calculate this rarefaction, you run into an impossible situation. The combinatorial in the numerator for species G is impossible to calculate. It is 24 over 25. You can't calculate this, because it becomes

The -1 from the ((N - Ni) - n) factorial (24 - 25) is the problem. I might just set it to 0 to do the calculations in the table because

there is no such thing as a factorial of a negative number and any combinatorial with this is 0 by definition.

So, we see that rarefaction is a bit more involved than the text makes out, but still a worthy exercise for the mind.

In addition, honesty makes me disclose that this is not the only way to rarefy a sample

For example, one might use a bootstrap approach (made possible by the speed of computers)

In this approach, one subsamples a large sample repeatedly and then calculates the parameter of interest based on the subsamples

  • For the above example on the Argentine lake, one would make a subsample by taking the 106 individuals in the sample and randomly choosing only 25 of the 106 to be in the subsample. One would then count the number of species in the subsample and that would be one bootstrap estimate for species richness

Next, one would resample the original 106 individuals again, choosing another 25 at random (some of those in the second subsample could have been in the first) and recalculate the number of species

Repeat the subsampling many times (this is why a computer is necessary, 10,000 is a good number of subsamples), each time getting an estimate of the richness

Finally, average the 10,000 richness values for the 10,000 subsamples and use that average as your rarefied estimate of richness [ = E(S) in the formula above].

Last Updated on April 10, 2006