After
initially confirming the
validity of using machine learning algorithms for predicting the invasive potential of weeds, we chose to continue with the Trajan neural network simulator for the following reasons:
1. Trajan had the highest classification rate (approx. 70-74%).
2. The ability to perform an exhaustive search for the best neural network.
3. The ability to specify different types of variables (numeric, nominal, boolean).
Since we modified the set of input
variables for the training set after the initial experiments,
we specified another search for a set of neural networks with
best performance. The Trajan neural network simulator returned
a set of 10 networks (4 Radial Basis Function (RBF) and 6
Multilayer Perceptron (MLP) networks):
| # | Type | Error | Inputs | Hidden | Performance |
| 01 | RBF | 0.4754761 | 17 | 39 | 0.6451613 |
| 02 | RBF | 0.4628958 | 17 | 36 | 0.6451613 |
| 03 | MLP | 0.4342738 | 13 | 2 | 0.7096774 |
| 04 | MLP | 0.3698627 | 2 | 3 | 0.7741935 |
| 05 | MLP | 0.3179705 | 2 | 3 | 0.8387097 |
| 06 | RBF | 0.4221949 | 3 | 11 | 0.9032258 |
| 07 | RBF | 0.4037507 | 2 | 11 | 0.8709677 |
| 08 | MLP | 0.3279516 | 3 | 3 | 0.8387097 |
| 09 | MLP | 0.2834223 | 4 | 3 | 0.9032258 |
| 10 | MLP | 0.2310261 | 5 | 2 | 0.9677419 |
We chose the best network (#10) based on training subset accuracy of 95.2 %, cross verification error of 23.1%, and test subset accuracy of 83.3% (see table below).
This network also has the topology of 5 input nodes, 1 hidden layer with 2 nodes, and 1 output node (5-2-1):
The most effective 5 input attributes were determined to be:
- Native range: Latitudinal range where native in Europe/Asia
- Absolute range: Latitudinal range where native and exotic in Europe/Asia
- Number of counties: Number of counties with the first record of species appearing within the 1951-2000 time period
- Noxious elsewhere: How many states or provinces outside of ID and MT have declared the plant noxious
- Lifeform: Forb, tree, shrub, grass-like or a spore-bearing vascular plant.
The network was trained with Back Propagation (50 epochs) and Conjugate Gradient Descent (91 epochs) algorithms.
This network classified the training set instances as follows:
After selecting this best trained
neural network, we ran an analysis of the set of 120 exotics (our prediction set) reported to have been introduced after 1950.
This set also contains a group of 16 species that have already been declared as
noxious. This fact is particularly useful, since it allows for further assessment of the accuracy of the selected neural network model.
The list of 29 species predicted to have the potential to be noxious, along with a brief discussion, follows on the
next page.
|