Invaders Main | Noxious Weeds List | Links Database | Queries by: Name, List, Area, and Map

Project Map

Data Sets
    Training Set
    Prediction Set
    Initial Experiments
    Neural Network
Initial Experiments
• Trajan
For the first initial experiment we used Trajan Neural Network Simulator, by Trajan Software, Ltd. This package has the ability to construct a vast number of neural networks, and only retain neural networks with the best performance (usually networks with 50% performance and better). In addition, the software has the ability to effectively search for a good subset of input attributes, discarding unimportant ones. It is also possible to conduct a sensitivity analysis and gain some important insights into the usefulness of individual variables.

Before the search was initiated, we specified a ratio of 2:1:1 for the random selection of training, cross-verification, and testing sets. After converting the training set to an appropriate format, an exhaustive search for Linear, Radial Basis Function, and Multilayer Perceptron neural networks was performed. After the search, Trajan retained a set of 20 networks. The best two neural networks were Multilayer Perceptron networks with performance between 0.7096774 and 0.7419355:

TypeError InputsHiddenPerformance
MLP0.4091044 15 64 0.7096774

We decided to keep the last neural network, since it contained:
1. The smallest amount of cross verification error (0.4091044)
2. Cross verification classification rate 0.709677
3. Area under ROC curve 0.719823.
This neural network had a topology of all 15 attributes as input nodes, 64 hidden nodes, and one output node (topology: 15-64-1):

• Backprop
After the automated neural network search with Trajan, we decided to apply the same topology while using the backprop machine learning algorithm at the University of Montana Computer Science Department. For this particular experiment we ran the following (each with a ten-fold cross-validation):

1. Default option for the command file (result: Test Set Fraction Correct: Total results = 60/124 = 0.484).
2. Change the learning rate in the command file to 0.25 (result: Test Set Fraction Correct: Total results = 74/124 = 0.597).
3. Change percent validation in the command file to 0 (result: Test Set Fraction Correct: Total results = 54/124 = 0.435).
Looking at the results, it was evident that the most favorable prediction rate (.597) was achieved in using a learning rate value of 0.25 and percent_validation value of 0.1 (run #2). Below is the resulting confusion matrix:

Test Set Confusion Matrix Total results

Training Data
Correct 30 44
Incorrect 32 18
Total 62 62

Test Set Fraction Correct: Total results = 74/124 = 0.597

This result is a lot better than the other two backprop runs (over 10% improvement of the correct classification rate), being more consistent with initial results from Trajan.

• C4.5
The last experiment using this training data set was with the improved version of the ID3 decision tree building algorithm C4.5. This experiment consisted of building the decision tree and running the data set using the following two options:
1. Information Gain Ratio (with and without pruning—default options when pruning).
2. Information Gain (with and without pruning—default options).
Below are the final ten-fold cross validation results:
Before Pruning
After Pruning
SizeErrorsSizeErrors% pruned
Gain Ratio:

From these results is evident that using Gain considerably reduced the error, especially for the test runs. Pruning the tree did not reduce size significantly. The error increased slightly during training for both Gain and Gain Ratio. However, Gain produced smaller %error for the training and test runs in comparison with Gain Ratio. Although the error during training for Gain increased after pruning, the test error did not change. The conclusion of this experiment is that Gain is the most useful option, with reduced error and smallest tree after pruning.

• Summary:

All three experiments produced roughly similar classification results. Backprop and Trajan achieved a total correct classification rate of approximately 60 % (Trajan produced a 70% classification rate for the cross verification subset, average classification error for all subsets was about 60%). In addition, the decision tree algorithm C4.5 specified error just under 40 percent using Gain, thus having the ability of approximately 60% to generalize well on new data. These results are very encouraging considering the limited type of data in this training set and the modest number of training cases. According to the sensitivity analysis with Trajan, the 5 most important attributes proved to be:
    1. Whether or not a plant is annual.
    2. Number of counties in the five northwestern states reporting infestations.
    3. Whether or not a plant is perennial.
    4. Number of European/Asian countries where the plant is exotic.
    5. The native latitudinal range in Europe and Asia.

Based on the results from the experimental work, we performed a number of refinements in order to increase our prediction rate. These refinements consisted of introducing additional variables. One attribute that has shown significance in earlier studies (Reichard & Hamilton, 1996) is whether or not a plant is known to invade elsewhere (i.e. “Invades elsewhere: Yes/No”). Since not all exotic plants use the same method of reproduction, Rejmanec’s theory on seed analysis as an indicator for invasiveness could not be applied. The pines, which Rejmanec studied, reproduce only from seed. For many noxious weeds, vegetative reproduction from rhizomes is an important means of population increase and spread. A variable that addresses reproduction is appropriate(reproduction: seed, vegetative).

<< Methods Neural Network >>