Comparison Bn Supervised&Unsupervised Neural Networks Senait D Senay 2003

  • Published on
    07-Apr-2015

  • View
    96

  • Download
    3

Transcript

Centre for Geo-Information Thesis Report GIRS-2003-08

A Comparison Assessment Between Supervised and Unsupervised Neural Network Image Classifiers

Author: Senait Dereje Senay

?

January 2003

Supervisor: Dr. Monica Wachowicz

i

WAGENINGEN UR

Center for Geo-Information Thesis Report GIRS-2003-08

A Comparison Assessment Between Supervised and Unsupervised Neural Network Image ClassifiersSenait Dereje Senay

Thesis submitted in the partial fulfillment of the degree of Master of Science in Geoinformation Science at the Wageningen University and Research Center

Supervisor: Dr. Monica Wachowicz

Examiners: Dr. Monica Wachowicz DRS. A.J.W de Wit Dr. Ir. Ron van Lammeren January 2003 Wageningen University

Center for Geo-Information and Remote Sensing Department of Environmental Sciencesii

To my parents: Lt.Col. Yeshihareg Chernet and Ato Dereje Senay, And my brother: Daniel Dereje Thank you for everything you have been to me.

iii

Acknowledgements

I am indebted to my supervisor Dr. Monica Wachowicz, who gave me continuous professional support during all stages of undertaking the thesis. I would like to sincerely thank her for the invaluable advice and support she gave me. I am very grateful for Dr. Gerrit Epema, and Dr. Ir. Ron Van Lammeren who helped me in facilitating the field trip to the study area for ground control point collection and of course for the continuous moral support I have got from Dr. Gerrit Epema. I would sincerely like to thank Dr. Gete Zeleke and Mr. Meneberu Allebachew, who helped me by facilitating vehicle and other necessary data and support while I went to the study area for field data collection; without their help the field trip would not have been successful at all. I would also like to express my gratitude for Mr. Wubshet Haile and Mr. Getachew, who assisted me throughout the field work, enabling me to finish the field work with in a very limited time I had. I would like to extend my heart felt thanks to Mr. John Stuiver and Drs. Harm Bartholomeus, who supported me whenever I needed a professional held in preprocessing of data; without their support the data preprocessing stage of my thesis would definitely have taken more time. I would also like to thank the cartography section of Alterra who helped me in printing and scanning maps used in producing the report as well as in the analysis. I would like to extend my heartfelt thanks to my friends Achilleas Psomas () and Krzysztof Kozowski (Dzikuj), for all the friendly moral support, and invaluable friendship; thanks for making my stressful days easier. I gratefully thank Dawit Girma, for all the help I got whenever I needed it. I would also like to show my gratefulness to my uncle Mr. Tesfasilassie Senay for providing me a family atmosphere while I went for a fieldtrip. Yet I would not pass without expressing my gratitude and sincere thanks to my friends, Giuseppe Amatulli (Grazie), Mauricio Labrador-garcia and Sonia Barranco-Borja (Gracias), Nicolas Dosselaere (Dank u wel), Izabela Witkowska (Dzikuj), Fanus Woldetsion (Yekenyeley), Adrian Ducos (Merci) for creating a pleasant working atmosphere, and much more, which helped us during the difficult times of working on the thesis, and which is also unforgettable. , I wish you all the best in the future. Last but not least, I would like to extend my admiration to the whole GRS 2001 batch for the respect and friendship between us I wish you all the best, nothing but the best. It has been an honor and pleasure to know you. Finally, I would like to extend my heartfelt thanks for NUFFIC for covering my study costs and offering me this experience.

iv

Abstract

Neural networks are a recently emerged science, which developed as part of artificial intelligence. They are used in solving complex problems in various disciplines. The application of neural networks in remote sensing particularly in image classification has become very popular in the last decade. The motivation to use neural networks arose due to the limitations in using the conventional parametric image classifiers, as the source, data structure, scale and amount of remotely sensed data became highly varied. Fortunately neural networks are found to compensate the drawbacks; these conventional classifiers have towards image classification. Neural networks offer two kinds of image classification, supervised and unsupervised. In this study both neural networks were tested to evaluate, which will result in a better accuracy image classification and which method handle poor quality data better. Finally, a land cover map of southern part of Lake Tana area situated in North West part of Ethiopia is produced from the best classifier.

Key words: Neural Networks, neuron ANN, KWTA, LVQ, BP, Image classification

v

Abbreviations

ANN ASTER BP KWTA LVQ MIR MLNFF MLP NN NDVI SOFM SWIR TIR VNIR WTA

Artificial Neural Networks Advanced Spaceborne Thermal Emission and Reflection Radiometer Back Propagation Kohonens Winner Take All Learning Vector Quantization Middle Infrared Multi-Layer Normal Feed Forward Multi-Layer Perceptron Neural networks Normalized Difference Vegetation Index Self-Organizing Feature Maps Short Wave Infrared Thermal Infrared Visible and Near Infrared Winner-Take All

vi

Table of ContentsAcknowledgements .......................................................................................................... iv Abstract.............................................................................................................................. v Abbreviations ................................................................................................................... vi List of Figures................................................................................................................... ix List of Tables ..................................................................................................................... x 1 Introduction .................................................................................................................... 1 1.1 BACK GROUND ........................................................................................................... 1 1.2 STUDY AREA .............................................................................................................. 3 1.3 OBJECTIVES................................................................................................................ 5 1.4 RESEARCH QUESTIONS ............................................................................................... 5 1.5 RESEARCH OUTLINE................................................................................................... 5 2 Artificial Neural Networks (ANN) ............................................................................... 6 2.1 OVERVIEW OF THE MAIN CONCEPTS ............................................................................ 6 2.1.1 Biological concepts ............................................................................................ 6 2.1.2 Historical development ...................................................................................... 7 2.1.3 Basic neural network processor ........................................................................ 9 2.1.4 Neural networks and image classification ....................................................... 11 2.2 TYPES OF NEURAL NETWORKS .................................................................................. 12 2.2.1 Supervised neural network classifiers ............................................................. 13 2.2.1.1 Description of supervised neural network classifiers ............................... 13 2.2.1.2 Architecture and algorithm ....................................................................... 14 2.2.2 Unsupervised neural networks classifiers ....................................................... 18 2.2.2.1 Description of unsupervised Neural Network classifiers.......................... 18 2.2.2.2 Architecture and algorithm ....................................................................... 19 3 Methodology ................................................................................................................. 24 3.1 FIELD DATA ACQUISITION......................................................................................... 26 3.2DATA PREPROCESSING .............................................................................................. 27 3.2.1 Datasets ........................................................................................................... 27 3.2.1.1 ASTER ...................................................................................................... 27 vii

3.2.1.2 Landsat TM ............................................................................................... 29 3.2.2 Datasets preparation ....................................................................................... 30 3.2.3 Training and test sets preparation ................................................................... 32 3.3 SUPERVISED NEURAL NETWORK CLASSIFICATION .................................................... 34 3.4 UNSUPERVISED NEURAL NETWORKS CLASSIFICATION .............................................. 35 3.5 ACCURACY ASSESSMENT AND VALIDATION ............................................................. 37 3.6 SENSITIVITY ANALYSIS............................................................................................. 37 3.7 IMPLEMENTATION ASPECTS ..................................................................................... 37 4 Results and Discussion................................................................................................. 39 4.1 ACCURACY OF BACK PROPAGATION CLASSIFIER TRAINED WITH ASTER OR LANDSAT TM DATASETS................................................................................................................ 39 4.2 ACCURACY OF BACK PROPAGATION CLASSIFIER TRAINED WITH ASTER AND LANDSAT TM INPUT DATASET ....................................................................................... 40 4.3 ACCURACY OF KOHONEN/LVQ CLASSIFIER TRAINED WITH ASTER OR LANDSATDATASETS ...................................................................................................................... 41

4.4 ACCURACY OF KOHONEN/LVQ CLASSIFIER TRAINED WITH ASTER AND LANDSAT TM COMBINED DATASETS .............................................................................................. 42 4.5 VALIDATION OF THE RESULTS OBTAINED FROM THE BACK PROPAGATION SUPERVISEDNEURAL NETWORKS CLASSIFIER ..................................................................................... 43

4.6 VALIDATION OF THE RESULTS OBTAINED FROM THE KOHONEN/LVQ UNSUPERVISEDNEURAL NETWORK CLASSIFIER....................................................................................... 45

4.7 IMPROVING THE TRAINING DATA QUALITY ............................................................... 46 4.8 SENSITIVITY ANALYSIS ............................................................................................ 49 5 Conclusions ................................................................................................................... 51 6 Recommendation.......................................................................................................... 55 References ........................................................................................................................ 56 Appendices ....................................................................................................................... 59 APPENDIX1: DATASET PROJECTION INFORMATION ........................................................ 59 APPENDIX2: RESULTS OF INPUT SENSITIVITY ANALYSIS ............................................... 60 APPENDIX3: NEURAL NETWORK PARAMETERS USED...................................................... 62

viii

List of FiguresFigure 1: Map of Ethiopia ................................................................................................... 4 Figure 2. Overview of the study area .................................................................................. 4 Figure 3: Signal path of a single human neuron ................................................................. 6 Figure 4: The basic neural networks processor; the neuron, and its functions. .................. 9 Figure 5: Design of the Multi-layer Feed Forward (MLNFF) architecture ...................... 14 Figure 6: A Kohonen Self Organizing Grid - 2 Dimensional Output Layer .................... 19 Figure 7: Decreasing neighborhood of a winner neuron in a WTA output layer. ............ 21 Figure 8: Design of Learning Vector Quantitzation Architecture .................................... 22 Figure 9: Overview of the main procedures involved in the methodological process...... 25 Figure 10: ASTER bands superimposed on model Atmosphere. ..................................... 28 Figure 11: Landsat TM bands superimposed on model Atmosphere. .............................. 29 Figure 12: Study area after Lake Tana is masked out of the image. ................................. 30 Figure 13: Spectral signature of the six classes (before ASTER image rescaling) ......... 31 Figure 14: Spectral signature of the six classes (After ASTER image rescaling) ........... 31 Figure 15: Training/test data preparation procedure......................................................... 32 Figure 16: Design of Back Propagation Neural Network ................................................. 34 Figure 17: The design of the KWTA/LVQ network......................................................... 36

ix

List of TablesTable 1: Spectral range of bands and spatial resolution for the ASTER sensor ............... 28 Table 2: Spectral range of bands and spatial resolution for the TM sensor ...................... 29 Table 3: Training and test sets for ASTER dataset. .......................................................... 33 Table 4: Training and test sets for Landsat TM dataset .................................................... 33 Table 5: Training and test sets for the combination of ASTER and Landsat TM datasets33 Table 6. Training data set up for the Back Propagation neural network .......................... 35 Table 7. Training data set up for the Kohonen Winner Take All/LVQ network .............. 36 Table 8: Accuracy of the back propagation classifier using ASTER data ........................ 39 Table 9: Accuracy of the back propagation classifier trained with Landsat TM data ...... 40 Table 10: Accuracy of the back propagation classifier trained with ASTER and Landsat TM combined datasets. ............................................................................................. 41 Table 11: Accuracy of the Kohonen/LVQ classifier trained with ASTER data .............. 42 Table 12: Performance of the Kohonen/LVQ classifier trained with Landsat TM data ... 42 Table 13: Accuracy of the Kohonen/LVQ classifier trained with ASTER and Landsat TM data ............................................................................................................................ 42 Table 14: Confusion matrix for the Back propagation network classification using ASTER and Landsat TM images .............................................................................. 43 Table 15: Percentage accuracy of the classes of the supervised classified image using the ASTER and LandastTM data source ........................................................................ 44 Table 16: Confusion matrix for the unsupervised network .............................................. 45 Table 17: Percentage accuracy of the six classes of the unsupervised classified image .. 46 Table 18: classification result for ASTER-Landsat TM data into 5 classes ..................... 47 Table 19: Confusion matrix for the supervised classification with five output classes .... 47 Table 20: Percentage accuracy of the various classes of the supervised classified .......... 48 Image (with 5 classes)....................................................................................................... 48 Table 21: Result of Sensitivity analysis of the ASTER dataset ....................................... 49 Table 22: Accuracy of supervised and unsupervised neural network classifiers.............. 52

x

1 Introduction

1.1 Back groundArtificial Neural networks (ANNs) are systems that make use of some of the known or expected organizing principles of the human brain. They consist of a number of independent, simple processors the neurons. These neurons communicate with each other through weighted connections (REF1, 2002)1. The study of neural networks is also referred as Artificial Neural Networks or connectionism (Roy, 2000). Use of artificial neural networks for various applications is becoming common now days. The ease of using the newly developing system ranges from less subjectivity of our analysis to full automation of processes so that less manual interference is needed. Neural networks are a very new technology, though the basis of this technology dated back to the 40s when McCulloch, a neuro-physiologist and a young mathematician, Walter Pitts wrote a paper on how neurons might work, explaining their model, a simple neural network with electrical circuits(Anderson et al, 1992). Ever since, the science faced a lot of obstacles before becoming popular in use of different applications today. The idea of imitating human brain structure i.e. neurons, in order to invent thinking machines was proposed to be a moral issue in 1970s. Much criticism was extended towards the development of this science with an issue concerning how this neural networks development affects human beings. People were concerned what the world would look like with machines doing everything man can do. These movements ended up in reducing much of the funds assigned for the development of the science; hence, drawing the pace of development of neural networks backwards (Anderson et al, 1992). However, this did not last long and the interest renewed when different scientists showed that the idea of neural networks is not simply to model brains but to use them in a way

1

(Ref #) refers to references taken from the Internet; the path to sites is listed in the reference section.

1

that makes our way of life easier, in terms of computation, analysis of different applications and less manual involvement of complex processes. This gave promising lead to neural networks of today. A lot of applications apart from Artificial Intelligence and computer sciences attempt to make use of neural networks in their applications. This includes data warehousing, data mining, robotics, wave physics, remote sensing, GIS (Roy, 2000). Although remote sensing is not one of the primary fields to use ANNs for analysis, recently neural networks are being used in several applications of GIS and remote sensing. Some of the most common applications include: data fusion, land suitability evaluation, spatiotemporal analysis, and land cover classification of satellite images. However neural networks did not fully replace the conventional way of analysis in these applications; they are still being tested since the technology is not exhaustively tested on all kinds of remote sensing data ranges. Neural networks is of special interest to todays remote sensing where, the problem is no more absence or insufficiency of data but accumulation of multi-scale, multi-source, and multi-temporal data. It is of high importance to incorporate the information found in these data originated from different media and scale in order to achieve a better, higher accuracy classification. There are some limitations in using conventional parametric (statistical) classifiers like maximum likelihood, such as, the need for normal (Gaussian) distribution in our data, the absence of flexibility in the classification process, the inability to deal with multi-scale data without standardization of the data into the same scale, and the inefficiency of the image classification process in terms of time. These limitations motivated scientists to look for alternative where these drawbacks could be compromised. Neural networks are found to be one of the soundest choices since they are very appropriate for image classification due to their processing speed, easiness in dealing with high-dimensional spaces, robustness, and ability to deal with a varied array of data despite of the variation in statistical distribution, scale and type of data (Vassilas et al 2000).

2

Like parametric classifiers neural classifiers offer two kinds of classification, supervised and unsupervised neural network classification; both have their own advantages and disadvantages in regards to using them for image classification. However the advantages of one over the other depend on the type of data we have, time and expertise. In this study the ANNs are used in processing and classification of multispectral remote sensing data. This study aims at investigating the difference in accuracy of supervised and unsupervised neural network classifiers, and at evaluating significance in the difference between the two classifiers.

1.2 Study areaThe study area is located in the North West part of Ethiopia. In administrative sense the area is found in Amahara Regional State between Gojam and Gonder provinces. The area is of high importance in terms of irrigation, hydroelectric power, tourism, Its importance became indispensable especially after 1992, after Ethiopia has become landlocked, since then all fish resources come only from an inland water bodies. Wudneh (1998) stated that Lake Tana is the least exploited fish resource in the country; he also explained that the reasons are bad road connection with the capital city, Addis Ababa, and absence of the highly marketable fish species, Nile perch, in this lake. The study area has some patches of forest, although not very big these forest areas fulfill the fuel wood demand of Bahir Dar, which is the second largest city of Ethiopia. Wild coffee production is also an essential economical activity in the forested areas. Recently high human encroachment is noted in the forested areas. With the expanding fishery industry, high population growing rate, and deforestation of the meager forest resource remained in the area; degradation of this resourceful area can be easily forecasted unless management intervention is employed. In order to manage the area in sustainable manner, basic geographic information about the area is very important. This study will provide basic land cover map for the area.

3

Lake Tana

The study area

Figure 1: Map of Ethiopia Map copyright by Hammond World Atlas Corp. #12576

Figure 2. Overview of the study area

4

1.3 ObjectivesThe main goal of this study is three-fold Investigate the advantages and disadvantages of supervised and unsupervised neural network classifiers in the field of remote sensing, in particular land cover classification of Multispectral and multi-scale satellite images; Evaluate the difference in accuracy between supervised and unsupervised neural networks image classification of Multispectral and multi-scale satellite images. Produce the land cover map at of the study area located at the Amahara Regional State, Ethiopia

1.4 Research questionsIs there a significance difference in the accuracy of supervised neural network classification and unsupervised neural network classification? Which type of neural networks classification; supervised or unsupervised, will handle poor quality data better?

1.5 Research OutlineChapter 1 Gives introduction to the main theme of the thesis, it also describes the study area, objectives and research questions Chapter 2: deals with describing the basic concepts of neural networks including biological, historical and basic processor of neural networks Chapter 3: covers the methodological aspect of the study, detail procedures for the neural networks classification process is given. Chapter 4: reports the results obtained from the data analysis and processing it also includes discussion of the results Chapter 5: contains conclusions made out of the results obtained. Chapter 6: gives recommendations on how results from this study can be improved and/or applied. 5

2 Artificial Neural Networks (ANN)

2.1 overview of the main concepts2.1.1 Biological concepts The major source of inspiration for artificial neural networks creation is the human brain; arguably the most powerful computing engine in the known universe. Both from a computational and energy perspective, the brain have an enormously efficient structure (Alavi, 2002). The most basic element of the human brain is a specialized cell, which is called the neuron. The brain consists of some 100 billion neurons that are massively interconnected by synapses (estimated at about 60 trillion), and which operate in parallel. In order to understand the basic operation of the brain, it is necessary to know in detail about the neuron. This was originally undertaken by neurobiologists, but has lately become an interest of physicists, mathematician and engineers. As a background to this study it is enough to review that each neuron cell, consists of a nucleus surrounded by a cell body (soma) from which extends a single long fiber (axon) which branches eventually into a tree-like network of nerve endings which connect to other neurons through further synapses. This is illustrated in figure 3.

Figure 3: Signal path of a single human neuron

6

Information is transmitted from one neuron to another by a complex chemical process, based on sodium-potassium flow dynamics, whose net effect is to activate an electrical impulse (action potential) that gets transmitted down the axon to other cells. When this happens, the neuron is said to have fired. Firing only occurs when the combined voltage impulses from preceding neurons add up to a certain threshold value. After firing, the cell needs to rest for a short time (refractory period) before it can fire again (Alavi, 2002). The brain is understood to use massively parallel computations where each computing element (the neuron) in the system is supposed to perform a very simple computation(Roy, 2000). The basis of the Artificial Neural Networks also came from this understanding; where each node (the analog for the neuron in our brain) performs this simple computation, however building complex parallel computations with other neurons.

2.1.2 Historical development The story of neural networks can be traced back to a scientific paper by McCulloch and Pitts, published in 1943 that described a formal calculus of networks of simple computing elements (Anderson et al, 1992). Many of the basic ideas developed by them survive to this day. The next big development in neural networks was the publication in 1949 of The Book The Organization of Behavior by Donald Hebbs (Alavi 2002). Hebbs argued that if two connected neurons are simultaneously active then the connection between them should strengthen proportionally, which means the more frequently a particular neural connection is activated, the greater the weight between them. This has implications for the machine learning, since those tasks that had been better learnt had a much higher frequency (or probability) of being accessed. This gave a clear definition to the learning process by indicating that learning occurs by the readjustment of weight connections between neurons.

7

In the late 1950s, Rosenblatt (Alavi, 2002) developed a class of neural networks called the perceptron. He furthermore introduced the idea of layered networks. A layer is simply a one-dimensional array of artificial neurons. Most current problems to which ANNs are applied to, use multi-layer networks with different kinds of interconnections between these layers. The original perceptron, however, was simply one-layer architecture (Chen, 2000). As a result this architecture has not been able to deal with almost all of the complex problems in various fields of studies, which use neural networks. Furthermore, Rosenblatt developed a mathematical proof, the perceptron Convergence Theorem that showed that algorithms for learning (or weight adjustment) would not lead to ever increasing weight values under iteration (Alavi, 2002). However, this was followed by a demonstration in 1969 (by Minsky and Papert) of a class of problems where the Convergence Theorem was inapplicable. This work led to a considerable downsizing of interest in neural networks, which was to continue until the early 1980s (Alavi, 2002). In 1982, John Hopfield, a Nobel Prize winning Caltech physist, developed the idea of recurrent networks, i.e. one that has self-feedback connections. The Hopfield net, as it has come to be known, is capable of storing information in dynamically stable networks, and is capable of solving constrained optimization problems (such as the algorithm, which showed that it was possible to train a multi-layer neural architecture using a simple interactive procedure). These two events have proved to be the ones most responsible for the revival of interest in neural networks in the 1980s, up to the explosive growth industry that is today shared between physicists, engineers, computer scientists, mathematicians and even psychologists and neurobiologists (Alavi, 2002, Anderson et al 1992, Roy, 2000) All in all the neural networks science faced a lot of ups and downs before evolving to its present state; the fact that it involves modeling the human brain raised a lot of moral

8

issues that lagged the pace of its development. The whole historical development of neural network is given on the report Artificial Neural Networks Technology 1

2.1.3 Basic neural network processor An artificial neuron is a simple computing element that sums input from other neurons; a network of neurons is interconnected by adaptive paths called weights; each neuron computes a linear sum of the weights acting upon it, and gives outputs depending on whether this sum exceeds a preset threshold value or not (see Figure 6). A positive value of the weight increases the chance of a 1, and is considered excitatory; a negative value increases a chance of a zero and considered inhibitory (real biological neurons have this property too, but with analogue output values rather than binary ones)(Alavi, 2002).

U

Figure 4: The basic neural networks processor; the neuron, and its functions.

The basic functions of each neuron in the whole network are, to evaluate all the input vectors directed towards the neuron, to return or calculate the sum of all inputs, then

1

The report can be found at the site: http://www.dacs.dtic.mil/techs/neural/neural_ToC.html or the PDF version: ftp://192.73.45.130/pub/dacs_reports/pdf/neural_nets.pdf

9

compare the sum of the inputs to the threshold1 value at the neuron (node), and lastly determine the output through the non-linear function provided at the neuron (Chen, 2000). The output can be an input for the next node in the next layer or could simply be the final output depending on the architecture and learning rule of the network the neuron belongs to. These four distinctive functions must be carried out at a neuron level for the network to learn properly according to the specified function. Even though the mechanism seems simple at a neuron level the way all the neurons interact through the weight adjustment in the process of learning makes the whole set up complex which enables them to solve real complicated problems in organized way. The mathematical representation is given as follows,

yi

i

= f (z i ) ....................................................................... (1)ij ij i

z = w * x Where:i

.. (2)

represents a simple neuron processing a particular learning is assumed to be a real valued input is either a binary or a real valued output of the ith neuron is a non-linear function; the function is also called a node function represents the weight connected, explaining the strength of the xij input. are a series of inputs to the ith neuron. is the threshold value of the ith neuron.(Roy, 2000)

Zi Yi f Wij Xij I

The most important issues in neural networks are training and design of networks. Training involves determining the connection weights (Wij) and the threshold values (i) from a set of training samples. Network design is concerned with determining the layers,For a neuron to produce an output the sum of the weights of all the inputs should exceed the threshold value () see Figure 61

10

the number of neurons in each layer, the connectivity pattern between layers and neurons and the mode of operation e.g. feed back vs. feed forward (Roy, 2000).

2.1.4 Neural networks and image classification Applying the ANN technology in remote sensing is a very recent phenomenon. Among other reasons, the generation of extremely varied remote sensing data is the main reason for the consideration of ANNs for image classification. There are a lot of advantages ANNs offer to image classification. Primarily neural networks are characterized with much faster classification time compared to the conventional statistical image classifiers, the time that takes to train the networks depends on the size of training data presented to the network; classifying the whole image with an already learned network is generally much faster than any other image classifier available. The other very important feature of neural networks, which is very helpful in image classification, is the ability to incorporate other ancillary and GIS information with the spectral information in image classification process. This will enable us to use all the information we have about the area other than the spectral information from the image, which increases the accuracy of the classification result. Yet the major use of neural networks is possibility of using different scale data together in a single classification process. For instance, the thermal band of the Landsat TM is not usually used in the maximum likelihood image classification due to its different ground resolution than the visible and the infrared bands. This will not be a problem when it comes to neural networks since they have the ability to analyze multi-scale data. Neural networks learn the pattern and relation within the input vector, so the resolution or data structure of each individual input does not affect the image classification process. Similarly multi-date or temporally different data can also easily be analyzed in a single image classification process, hence the network maximizes the amount of information used in the classification by learning patterns from the different date images. Last but not least, the

11

neural networks have the ability to deal with data regardless of the statistical distribution of the dataset. This distribution-free nature of neural networks will allow us to deal with various remote sensing data that do not have the normal (Gaussian) distribution. Berberoglo et al (1999), Fauzi et al (2001), Kumar et al (2001) and Luo et al (2000) provide a background to ANN with a remote sensing context. These powerful analyzing natures of neural networks in image classification is grants a faster, more accurate and reliable land cover classifications from remote sensing data compared to the results obtained from any other image classifiers (Kumar et al, 2001). There are many kinds of neural networks, which range from the simple perceptron neural network to more developed many-layer networks many of them are used in image classification. Some of the commonly used neural networks are: Adaptive Resonance Theory (ART), Multi-layer Perceptrons (MLP), Reduced Coulomb Energy (RCE), Radial Basis Function (RBF), Delta Bar Delta (DBD), Extended Delta Bar Delta (EDBD), Directed Random Search (DRS), Higher Order Neural Networks (HNN), Self Organizing Map (SOM), Learning Vector Quantization, Counter-propagation, Probabilistic neural network, Hopfield, Boltzmann Machine, Hamming network, Bi-directional associative memory, Spatio-temporal pattern recognition and many others.(Roy, 2000; Anderson et al, 1992) The uses and advantages of these networks depend on how we want to use them and for what purpose. The most common areas are prediction, data association, data conceptualization, data filtering and classification.

2.2 Types of neural networksNeural networks are commonly categorized in terms of their training algorithms. Basically there are three types of neural nets. Fixed Weights Neural Networks, Unsupervised Neural Networks and Supervised Neural Networks(REF5, 2002). Fixed weight NN are not very common since there is no learning involved. Supervised NN and unsupervised NN involve training but in a completely different approach. Supervised training involves a mechanism of providing the network with the desired output either by

12

manually grading the networks performance or by providing the desired output with the inputs (Anderson et al, 1992). Unsupervised training deals with training the NN by itself. Where the network itself should recognize the patterns with the inputs and decides on an outcome with out any outside help.

2.2.1 Supervised neural network classifiers

2.2.1.1 Description of supervised neural network classifiersSupervised learning networks are the main streams of neural networks development. Most of applications using neural networks implement supervised neural networks. In supervised training, the training data consists of many pairs of input/output training patterns, i.e., both inputs and outputs are provided. The network analyses the inputs and compares the result with the desired output given with the input. The network is then enabled to calculate the error by comparison. The error will be propagated back to the network and a better training will take place. The more the number of iteration the less the error of the network will become. This is known as convergence; hence the output of the network converges to resemble the desired output provided. The network attempts to pick up the pattern provided by comparing the input and desired outputs and will approximate its outputs to the learned pattern. However these might not always happen, there are factors which disable a network from learning properly, the most important ones are: when the data provided is not sufficient or when it does not contain the kind of information or pattern needed to solve the problem involved. A test data, which has not been used in the training process, should always be set aside, to surely ascertain the accuracy of the learning of a network. Apart from quality and quantity of data, the design or architecture of a network and learning rule used for training affects the rate and extent of network learning. Finally, the type and size of data to be processed by the neural network is an important factor that should be considered while choosing architecture and learning rules.

13

2.2.1.2 Architecture and algorithmThe Multi-layer Normal Feed Forward (MLNFF), a kind of Multi-layer perceptron (MLP) is the commonly used architecture in supervised neural networks, however there are also many other architectures that can be used. The MLNFF architecture is going to be reviewed in this section because it is the architecture used in this study. The MLNFF should have at least 3 layers the input, the hidden layer and the output layer; it is very common to have one hidden layer, however the architecture can have more than one hidden layer according to the data size, type and the kind of application it is going to be used for. There is no maximum number of hidden layer limited for the architecture, however care should be taken while structuring the data, since too many layers induce over learning or memorization of the training data, which will make the network useless, to be used on new data. The basic network design of the MLNFF architecture is shown below. The more complex our data and relation between input and output classes gets the greater number of layers we need to solve the problem.

Figure 5: Design of the Multi-layer Feed Forward (MLNFF) architecture Source: Carol E. Brown and Daniel E. O'Leary 1995

14

Several algorithms can be used with this architecture to perform image classification. One the most used algorithms is the back propagation algorithm (Kulkarni, et al, 1999). Back propagation learning rule is the most popular, effective and easy-to-learn for complex multi-layered network, In fact this network is used more than all the other networks combined (Anderson et al, 1992). Its greatest advantage is the ability to provide non-linear function solutions to problems. The back propagation has been developed by many researchers through time; hence the algorithm representation has a slight difference in various literatures or sources. The learning procedure is given as follows: For a network, which has, 3 layers composed of the input, hidden and output layer; Let: L1 represents the input layer L2 represents the hidden layer L3 represents the output layer The number of neurons (processing elements, units) in the input layer represents the number of input data used. The number of neurons in the output layer represents the number of land cover classes into which the image is going to be classified. The number of processing elements to be assigned in the hidden layer has no clearly set rule for this study Kolomogrovs theory which states the number of neurons in hidden layer should be 2N + 1 where N is the number of nodes in the input layer (Rangesneri et al, 1998). The number of neurons assigned in the hidden layer affects the performance of the network directly. The presence of many neurons might lead the network to memorize the training set instead of learning. If memorization takes place the network will not generalize the pattern it learned, but it will only recognize the pattern from the training set which will make it useless since it will not be able to classify the whole image (Anderson et al, 1992). The net input and output for neurons in layer L2 and L3 is given by

15

net = out wi j

1

ij

.. (1)

Where neti is the net-input and outj is the output of the unit j in the preceding layer, wij is represents the weight between the units i and j.

out

j

= 1 /{ + exp (net i + ) 1

[

] } . (2)

Where outi is the output of neuron i and is a constant. The network works in two phases the training phase and the decision making phase. During the training phase weights between layers L1 L2 and L2 L3 are adjusted so as to minimize the error between the desired and the actual output. The back propagation learning algorithm is described below. 1. Present a continues valued input vector X = (x1, x2, ..xn)t layer L1 and obtain the output vector Y = ( y1, y2, ..., ym)t at layer L3. In order to obtain the output vector Y, calculation is done layer by layer from L1 to layer L3. 2. Calculate change in weight. In order to do this, the output vector Y is compared with the desired output vector or the target vector d, and the error is then propagated backward to obtain the change in the weight wij that is used to update the weight. wij for weights between layers L2, L3 is given by:

ww

ij

= E / wij .. (3)

This can be reduced toij

=

i

j

(4)

Where is a training rate coefficient (typically 0.01 to 1.0), j is the output of neuron k in layer L3 and i is given by:1

Algorithm information on Back propagation is modified from (Kulkarni, 1999)

16

= F net i

( ) . (5) net i d i oi i i i

=

o (1 o )(d o )j

In equation (7) oi, represents the actual output of neuron I in layer L3, and di represents the target or the desired output at neuron I in layer L3. Layer L2 has no target vector so equation (5) can not be used in layer L1. The back propagation algorithm trains hidden layers by propagating the output error back, layer by layer, adjusting weights at each layer. The change in weights between layers L1 L2 can be obtained as:

w

ij

=

oj

Hi

. (6)

where is a training rate coefficient for layer L1, (typically 0.01to1), oj is the output of neuron j in layer L2, and Hi :

Hi

= oi (1 oi ) d k wik . (7)k

in equation (7) , oi is the output of neuron i in layer L1, and the summation term represents the weighted sum of all d values corresponding to neurons in layer L3 that are obtained by using Equation (8). 3. Update the weights

w (n + 1) = w (n ) + wij ij

ij

(8) (after adjustment),

Where

w (n + 1) represents the value of the weight at iteration n+1ij

and wij(n) represents the value of the weight at iteration n. 4. Obtain error for neurons in layer L3.

17

= (oi d i ) (9)2

If the error is greater than some minimum min (user defined, depends on the accuracy needed), then repeat step 2 through 4: otherwise terminate the training process.

2.2.2 Unsupervised neural networks classifiers

2.2.2.1 Description of unsupervised Neural Network classifiersUnsupervised neural networks are networks, which organize the input vector into similar groups by self learning pattern of the input vector. This is a major limitation in supervised neural networks, where training with a lot of input and output example is necessary. Even though, learning a network with an existing example set gives reliability for the result, it will not be handy for some cases where training data is not available; in this case unsupervised or adaptive neural networks will be of a great help. The learning rule of unsupervised neural networks is supposed to perform learning in unsupervised or self-organizing manner (Chen, 2000). This leads to a relevant output by learning patterns from the redundant training data. In Unsupervised neural networks, only input vectors are presented to the network, and the network adjusts its own weights without any additional (external) help to decide what particular output to assign to a given input. Usually unsupervised neural networks classify input data into distinct or discrete groupings. Unsupervised neural networks can be ideal where seemingly uncorrelated data has to be classified and most importantly when there is no training (example) data available (Alavi, 2002). There are two major ways of unsupervised learning; Competitive Learning Networks, and Self Organizing Feature Maps (SOFM). Competitive Learning Networks involve a process in which output layer neurons compete among themselves to acquire the ability to fire in response to given input vectors (patterns). The basic learning rules to perform unsupervised learning are the Hebbian and the competitive rules, both inspired by neurobiological considerations (Chen, 2000). When an input pattern is presented, a winning output neuron K is selected and activations are reset, such that: 18

Y=K =1

and

Yj K = 0

The output layer is referred as Winner-Take-All. The other type of unsupervised network, which is only slightly different from competitive layer is Self-Organizing Feature Maps (SOFM), or sometimes known as Auto Associative networks (Anderson et al, 1992). The leading researcher into unsupervised learning, Tuevo Kohonen, developed this network. It relies on the use of competitive learning but with different emergent properties. Its unique feature is preserving the topology of the input vectors (pattern). The SOFM are intended to map a continuous high dimensional space into a discrete one- or twodimensional space, (Chen, 2000).

2.2.2.2 Architecture and algorithmIn this study a SOFM network will be used for the unsupervised neural network classification, hence the review focuses on the architecture and learning rules of a SOFM network. The typical architecture of unsupervised neural networks (SOFM) comprises two layers that are the input layer and the output layer (figure 8).

Figure 6: A Kohonen Self Organizing Grid - 2 Dimensional Output Layer Source: (REF4, 2000). In a SOFM network there are two kinds of weight connections (REF5, 2002): Feed forward, which is between input layer and the output layer, and a lateral feedback weight connection within the output layer. In the feed forward connection, the weight connection

19

is usually excitatory which activates the neurons in the output layer to be active so that their weight gets updated. In the other hand the Lateral feedback or the weight connection within the output layers is inhibitory, which lags the neurons to be activated. Therefore all the neurons do not get updated, instead only the winner neuron from the lateral feed back connection or competitive layer will get updated. This learning rule of the output layer of unsupervised neural networks is called Winner Take All (WTA). WTA learning rule is the most widely used learning rule for SOFM. As its name implies, only the winner neuron from the output layer will be activated which will get its weight updated. This rule is common both for SOFM and other Competitive Learning Networks. What makes SOFM unique is; it is not only the winner neuron that gets its weight updated but also its neighbors neurons get their weight updated, so that in this way as it is mentioned earlier, the SOFM will be capable of preserving topology of the output layer with respect to the input layer. This is very important when dealing with geo-spatial data where topology is very important. The mechanism in which the winner is selected is by looking into the distance between the neuron in the output layer and the input vector. There are many kinds of distances considered in this case; the widely used distance is Euclidean distance. The WTA learning rule procedure can be described as following: 1. Select a winner neuron, with the smallest Euclidian distance:

x wj

. (1)

Where Wi denotes the weight vector corresponding to the ith output neuron 2. Let i* denote the index of the winner and let I* denote a set of indexes corresponding to a defined neighborhood of winner i*. Then the weights associated with the winner and its neighboring neurons are updated by:

20

wj = x wj

(

)

. (2)

For all the indices j I * and n is a small positive learning rate. The amount of updating may be weighted according to a pre-assigned neighborhood function, ( j , i *) .

w j = ( j i *) x w j .. (3)for all j. For example, a neighborhood function ( j , i *) may be chosen as

(

)

w j = ( j, i *) x w j

(

)

2 represents the position of the neuron j in the output Where ( j , i *) = exp r j r i* 2 2

space. The convergence of the feature map depends on the proper choice of rj. One choice is that =1/t. The size of the neighborhood (or ) should decrease gradually as shown in the next figure:

Figure 7: Decreasing neighborhood of a winner neuron in a WTA output layer. 3. The weight update should be immediately succeeded by the normalization of Wi. The rate at which the weight of a winner neuron is updated depends on a small positive constant which is user defined. It is referred as alpha ().

21

In some cases another constant theta () is applied to the network to avoid neurons, which never get their weight updated. The the rate at which a neuron that did not win loses, it represents the losing rate. When there is no losing rate is set to zero. Although a design containing two layers is the mostly used architecture, it is possible to use the Kohonen layer or the Self Organizing Maps (SOFM) as a hidden layer by providing an extra output layer usually a Learning Vector Quantitization (LVQ) layer, which helps in unsupervised classification incase of complex data. The same person who developed the SOFMs, Tuevo Kohonen, created this architecture also. The architecture is very similar to the SOFMs; it is a form of supervised learning adapted from Kohonen unsupervised learning. It uses the Kohonen layer with the WTA transfer function, which is capable of sorting items into similar categories (Anderson et al, 1992). However there are some important modifications added to this architecture, which makes it more robust to handle classification and image segmentation problems.

Figure 8: Design of Learning Vector Quantitzation Architecture Source: Anderson et al, 1992 Learning Vector Quantization classifies its input data into classes that are determined by the user. Essentially, it maps an n-dimensional space into an m dimensional space. We can refer to this learning rule as a semi-unsupervised network, since it gives the freedom of giving the number of classes we want to group the input data into. That is, it takes n

22

inputs and produces m outputs. The networks can be trained to classify inputs while preserving the topology of the training set. This occurs by preserving the nearest neighbor relationships in the training set such that input patterns which have not been previously learned will be categorized by their nearest neighbors in the training data. The training mechanism for the LVQ network is the same as the Kohonen network, the transfer function WTA is used to process the input data in the hidden Kohonen layer, there will be only one winner in a layer for one input vector (for each iteration). The only extra step in the LVQ is involved in re-assigning of the output found from the Kohonen layer into another output layer for which the number of neurons is user defined (here the number of classes we want to classify the input vector into is given). The LVQ output re-assigns the Kohonen layer outputs by adjusting the connection of the weight between the output neurons and the Kohonen layer, i.e. is if a winner neuron from the Kohonen layer is not assigned the appropriate class (the network learns in which class the neuron should be classified after training) the connection weights entering the neuron are moved away from the training vector, so that it doesnt get classified into the wrong output class. (Anderson et al 1992). This network is of special interest for this study since it is the most appropriate network for image classification purpose due to its topology preserving nature.

23

3 MethodologyThe whole methodological process in this study is divided into three main procedures. The first procedure comprises: Preparation of data collected from field into an appropriate format to be used as input in the image classification process. Preprocessing of the main dataset to be used and standardization of data types so as to make them compatible during the classification procedure. Preparation of training and testing sets from the field data and subsets of the input dataset. The second procedure deals with performing the supervised and unsupervised neural network classifications, first by classifying the training dataset and then classifying the whole image once the classification accuracy is satisfactory The last step of the methodology deals with the validation of the neural networks classification. After the accuracy assessment, validation is carried out, in order to make a concrete conclusion on the analysis performed. To explain shortly how the whole process is executed: the primary datasets, ASTER and Landsat TM images, were pre processed and standardized. Training/test sets were prepared from field data and the primary datasets, then data transformation was carried out, this part of the process took considerable time because of the large size of the data used (the images). The image data were transformed into ASCII file format in order to make the data compatible with the neural networks processor. Then both supervised and unsupervised neural network classification is carried out. After evaluating the outcome of the training data, the whole image is classified by the networks learned from the training set. Then classification resulted from supervised and unsupervised classifiers are compared. The overall process is illustrated in Figure 11.

24

Aster Image

Landsat

NDVIStackLayer

NDVI

DatasetPrimary datasets Secondary datasets Training/test datasets Classification result

Mask

Field data Intermediate corrected data

Training dataset Transformation ASCII

Intermediate ASCII format file Main Actions (processes)

Training

Training set for Supervised NN classification

Training set for Unsupervised NN classification

Training

Accuracy assessment ASCII No Accuracy assessment Yes

Test Set

Training

Transformation

Dataset

Transformation Supervised Classification ASCII

ASCII

NoSensitivity analysis

Testing DatasetTransformation Comparison between supervised and unsupervised classification results 25

Yes ASCII ASCII

ASCII

Unsupervised Classification

Supervised NN Classification ResultTransformation

Unsupervised NN Classification Result

2 1 Figure 9: Overview of the main procedures involved in the methodological process

3.1 Field data acquisitionThe study area covers 1558 km2; it is found between 36.990 E and 37.400E longitude, and 11.48oN and 11.950 N latitude. Representative ground truth samples were marked on the image for all the output classes. The output classes are: Arable land Forest Settlement Shrub land and scrubland Swampy area Water These land classes were chosen based on the major classes used for the available topographic map of the area; the topographic map has also been used as a source of an additional control data during the validation stage. Ground truth sets were taken from the study area. Adequate ground truth was needed both for the training of the network and for testing (validation) after the classification was performed. GPS was used to mark the geographical position of the ground control (ground truth) points

26

3.2Data preprocessing3.2.1 DatasetsThere are two primary data sets used for the study. These are: Satellite image from TERRA satellite (ASTER). Satellite image from Landsat 5 satellite (Landsat T

3.2.1.1 ASTERASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) is an imaging instrument TERRA satellite. ASTER is used to obtain detailed maps of land surface temperature, emissivity, reflectance and elevation. It consists of three highperformance optical radiometers with 14 spectral channels. Its spectral cannels are found in the visible and near infrared (VNIR), the short wavelength infrared (SWIR) and the thermal infrared (TIR) bands (REF3, 2002). The major features of ASTER are: simultaneous earth surface images from the visible to thermal infrared, higher geometric and radiometric resolution in each band than current satellite sensors, near infrared stereoscopic image pairs collected during the same orbit, optics that allow the instrument axis to move as much as + or 24 degrees for SWIR and TIR cross talk direction from the nadir and highly reliable cryocoolers for the SWIR and TIR sensors (Vani, 2000)

27

Table 1: Spectral range of bands and spatial resolution for the ASTER sensor

ASTER Bands Band 1 Band 2 Band 3 nadir looking Band 3 backward looking Band 4 Band 5 Band 6 Band 7 Band 8 Band 9 Band 10 Band 11 Band 12 Band 13 Band 14

Wavelength (micrometers) 0.52 - 0.60 0.63 - 0.69 0.76 - 0.86 0.76 - 0.86 600 - 1.700 2.145 - 2.185 2.185 - 2.225 2.235 - 2.285 2.295 - 2.365 2.360 - 2.430 8.125 - 8.475 8.475 - 8.825 8.925 - 9.275 10.25 - 10.95 10.95 - 11.65

Resolution (meters) 15 15 15 15 30 30 30 30 30 30 90 90 90 90 90

Figure 10: ASTER bands superimposed on model Atmosphere. Source: Jet Propulsion Laboratory (JPL), ASTER homepage.

28

3.2.1.2 Landsat TMThe Thematic Mapper (TM) sensor is an advanced, multispectral scanning, Earth resources instrument designed to achieve higher image resolution, sharper spectral separation, improved geometric fidelity, and greater radiometric accuracy and resolution than the Multispectral Scanner (MSS) sensor. The TM data are scanned simultaneously in seven spectral bands. Band 6 scans thermal (heat) infrared radiation. All TM bands are quantized as 8 bit data (REF2, 1999) (Figure 11)

Table 2: Spectral range of bands and spatial resolution for the TM sensor

Landsat 5 Bands Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7

Wavelength (micrometers) 0.45 - 0.52 0.52 - 0.60 0.63 - 0.69 0.76 - 0.90 1.55 - 1.75 10.40- 12.50 2.08 - 2.35

Resolution (meters) 30 30 30 30 30 120 30

Figure 11: Landsat TM bands superimposed on model Atmosphere. Background image source: Remote sensing Basics lecture note, Wageningen University. 29

3.2.2 Datasets preparationBoth ASTER and Landsat TM images were geo-referenced1 according to the 1:50,000 m Topographic map from the study area. The large water body found in the area, southern part of the Lake Tana was removed from both images, since it is a known feature (Figure 12). Keeping this lake area would have increased data processing and analysis time significantly.

Figure 12: Study area after Lake Tana is masked out of the image.

The Short Wave Infra Red band of ASTER image (6 in number) has a very low DN number, which is very difficult to detect the variation. To solve this problem rescaling was performed on the whole range of bands from visible to SWIR. (Figure 13 and Figure 14)

1

Projection and datum information can be found in appendix 1

30

Figure 13: Spectral signature of the six classes (before ASTER image rescaling)

Figure 14: Spectral signature of the six classes (After ASTER image rescaling) Due to the large spatial extent of the study area, it is not possible to process the whole image at once, during the analysis and preparation of train and test sets. As the software 31

used for the neural networks analysis, ThinksPro accepts only ASCII files; it was necessary to subset the image into four sub-study areas. This avoided extra large ASCII files, which couldnt have been edited by notepad, word pad or MS access for preparation of input dataset.

3.2.3 Training and test sets preparationGround control points taken from the field work were merged with reference points taken from the image and the 1:50,000 topographic maps by visual interpretation and expert knowledge; this has increased the number of ground control points sufficiently for adequate training and test data. The technical procedure of training and test data preparation is shown on figure 15. Landsat TM image + Aster image GCPs + Points from Topographic Map Shape file + Points from Image

Stack

Shapearc Arc coverage Pointgrid Grid Import to Image Image ASCII Randomization Test data

Training data image

Mask Transformation Training data

Figure 15: Training/test data preparation procedure After the training and test data were prepared, different possible data combinations were tested both for the ASTER and Landsat TM images to find out which combination of bands could give better accuracy of neural networks classification. Although testing data quality for neural network classification was not the primary objective of this study, it

32

was necessary to find out the best band combination, in order to get the best out of the available information, since the classification was based on only spectral information. Therefore three pre-classification training and test set evaluation were made for the ASTER Image, the Landsat TM image and for the combination of ASTER and Landsat TM images respectively.(see tables below for more details)Table 3: Training and test sets for ASTER dataset.

No 1 2

No used 9 7

of

bands Type of bands used Visible 2 2 Near infrared 1 1 SWIR 6 4

Additional information NDVI NDVI

Total No of input 10 8

Table 4: Training and test sets for Landsat TM dataset

No

No of bands Type of bands used used Visible 3 3 Near infrared 1 1 Mid infrared 2 2 Thermal infrared 1 -

Additional information NDVI NDVI

Total

No

of input 8 7

1 2

7 6

Table 5: Training and test sets for the combination of ASTER and Landsat TM datasets

No

No band

of Type of bands used ASTER Landsat TM

Additional information

Total No of input

Near infrared Shortwave

Visible Near infrared

Mid infrared2

1

14

2

1

4

3 1

1

Thermal2 NDVI

infrared

Visible

16

The images were changed into an ASCII file in order to make them format compatible with the neural network processors of ThinksPro where the values from the spectral

33

bands were fed as an input vector to the neural network. The neural network will map the feature space of the input (image data) into a category space, which in our study consists of land cover classes. The dimension of the feature space equals to the number of spectral bands provided.

3.3 Supervised neural network classificationThe Multi-layer Normal Feed Forward (MNFF) Architecture was used for the supervised classification. This architecture is a typical example of Multi-layer Perceptron (MLP) architecture. The network comprises three layers, input layer, one hidden layer and output layer. Both the hidden and output layers have a BP1 learning rule. The network design is shown in Figure 16.

Figure 16: Design of Back Propagation Neural Network The assignment of the number of nodes in each layer architecture used for the Back Propagation Neural Network can be: The nodes and transfer and input functions vary according to the different input dataset tested. The number of nodes in the input layer equals the number of inputs The number of nodes in the hidden layer is assigned based on Kolomogrov theory 2N+1 where N is the number of input nodes (Rangsaneri et al, 1998)

All the network parameters used both for supervised and unsupervised NN classification are listed in Appendix 2

1

Back propagation of error

Learning direction

34

The number of nodes in the output layer equals to the number of output classes, which is 6 for this study.

Table 6 illustrates the five data sets created for the classification using the back propagation neural network. Since the supervised classification was needed for a comparison with the unsupervised classifier, a neural network that has proved to be a good image classifier was needed. Back propagation was chosen because it fulfills the above criteria. Both the hidden and the output layers of the supervised network were set to the back propagation learning rule. All the data set described in Table 3,4, and 5.were used for the supervised neural network classification. The percentage accuracy of each data combination ( input vector) was recorded in order to choose the best result for the final image classification.Table 6. Training data set up for the Back Propagation neural network

layers N0 of input

No of hidden

No of output6 6 6 6 6

NO of hidden

Learning rule

Learning rule

Architecture

Training set

Dataset

1 2 3 4 5

ASTER ASTER Landsat TM Landsat TM ASTER +LSTM

Multi-Layer NFF Multi-Layer NFF Multi-Layer NFF Multi-Layer NFF Multi-Layer NFF

Back prop. Back prop. Back prop. Back prop. Back prop.

Back prop. Back prop. Back prop. Back prop. Back prop.

1 1 1 1 1

10 21 8 8 7 17 17 15

16 33

3.4 Unsupervised neural networks classificationThe Multi-Layer Normal Feed Forward Architecture (MLNFF) was used for the unsupervised classification. The Kohonen Winner Take All (KWTA) and Learning Vector Quntitization (LVQ) Learning rules were used for the unsupervised NN classification. The network design comprises of 3 layers; input layer, 1 hidden layer with

nodes

35

nodes

the KWTA learning rule and output layer with LVQ learning rule. The network design is shown in Figure 17. Input layer

Hidden layer KWTA

Output layer LVQ

Figure 17: The design of the KWTA/LVQ network. The assignment of the number of nodes in each layer is similar to the supervised classification. Table 7 illustrates the five data sets created for the unsupervised image classification using KWTA/LVW Network. The hidden layer of the unsupervised network is Kohonen WTA, while the output layer is LVQ. These learning rules are chosen because they are topology preserving in their nature, which is very appropriate for the kind of data we are dealing with.Table 7. Training data set up for the Kohonen Winner Take All/LVQ network input No of output NO of hidden nodes No of hidden

Learning rule

Architecture

Training set

Dataset

of

1 2 3 4 5

ASTER ASTER Landsat TM Landsat TM ASTER +LSTM

Multi-Layer NFF Multi-Layer NFF Multi-Layer NFF Multi-Layer NFF Multi-Layer NFF

Kohonen WTA and LVQ Kohonen WTA and LVQ Kohonen WTA and LVQ Kohonen WTA and LVQ Kohonen WTA and LVQ

1 1 1 1 1

10 8 8 7 16

21 17 17 15 33

6 6 6 6 6

36

nodes

N0

3.5 Accuracy assessment and validationThe accuracy or performance of the supervised neural network was evaluated by the built-in testing mechanism of ThinksPro. While training a network, a set of pair of test set having input and desired output set, were given to the network for the evaluation of the correct learning percentage. Validation was then carried out by comparing a set of ground control points with the results of the neural network classifier. A confusion matrix and table of accuracy percentages were generated based on the validation results.

3.6 Sensitivity analysisSensitivity analysis is a method, which helps to determine the importance or contribution of each input towards the generation of the final output. This information will enable us to determine, which input is more important, or provides more information to the over all image classification process. It is stated in the ThinksPro guide that eliminating inputs that have little effect can improve the performance of the neural network on test data; since lowering the input dimension, can enhance generalization (Logical Designs, 1996) . Sensitivity analysis can be used as a decision making tool to separate useful inputs from noise For this study a built-in- procedure in ThinksPro (software for neural network processing) was used to carry out the sensitivity analysis. There are many ways of calculating the sensitivity analysis, the method used in ThinksPro is replacing each input by its average value over the training set and calculate its effect over the output, then the magnitude of the output change is then averaged over the whole training set, this is done for all the inputs in the training set. Finally the effect of each input is reported given in the log file (Logical Designs, 1996)

3.7 Implementation AspectsFive software packages were used for the implementation of the methodology described in the previous sections. They are listed below: Arc/info: was used to standardize projection information for all the dataset including preparation of the training and test sets;

37

Arcview: was used for the production and visualization, of training/test sets and land cover map. ERDAS imagine: was used for image pre-processing and transformation of image into the ASCII file format. ThinksPro: was used for neural network processing GPS: was used to retrieve geographical position of ground control points during the fieldwork

38

4 Results and Discussion4.1 Accuracy of Back Propagation classifier trained with ASTER or Landsat TM datasetsAs explained in the previous chapter different combinations of bands of ASTER were tested in order to find out the best input vector. The input vector plays an important role because it should provide the appropriate for the neural network perceptrons and as a result it generates a classification. Table 8 shows how the two classifications cases carried out on the ASTER image using the back propagation classifier had a very significant difference in accuracy. The first network trained with 3 VNIR, 6 SWIR and 1 NDVI inputs resulted in 66.60 %, where as the second network trained with the 3 VNIR, 4 SWIR and 1 NDVI (after the last 2 SWIR bands of the ASTER image are removed) resulted in 82.64% correct training.Table 8: Accuracy of the back propagation classifier using ASTER data

Case 1 2

Data type: ASTER 3 VNIR, 6 SWIR and 1 NDVI 3 VNIR, 4 SWIR and 1 NDVI

Correct % Training 66.60 82.64

Correct test 65.78 74.67

% Error training 0.134 0.223

Error test 0.137 0.258

The big leap in accuracy can be explained by the noise reduction on the input data. In other words the two last SWIR bands which were removed from the second neural network (case 2 in Table 8) can be considered as noise, since the recorded value or (DN) value of the bands were very poor. Most of the pixels of the image were represented as zero for these bands (more than 75 % was zero or blank). The presence noise in the input vector affects the overall accuracy of the neural networks performance. Table 9 shows the Landsat TM band combination datasets used to carry out the training and testing of the back propagation classifier. In the first case the back propagation 39

classifier was used to perform training and testing having as the input vector containing NDVI data and all bands of Landsat TM The accuracy resulted in 79.87% correct training and 73.73 correct test. In the second case, the back propagation classifier was used to perform training and testing having as the input vector containing NDVI data and all bands of Landsat TM except the thermal band. The Accuracy results were very low and resulted in 57.54% correct training and 56.57 % correct testing.Table 9: Accuracy of the back propagation classifier trained with Landsat TM data

Test 1 2

Data Source: Landsat TM 4 VNIR, 2 MIR, 1TIR and 1 NDVI 4 VNIR, 2 MIR and 1 NDVI

Correct % Training 79.87 57.54

Correct test 73.73 56.57

% Error training 0.057 0.146

Error test 0.078 0.151

There accuracy obtained from the network trained with the thermal band included, is significantly high in comparison to the same classifier trained without the thermal band. This confirms one of the most important advantages of using neural networks for image classification, the possibility to use a multi-scale data in a classification process. Usually the thermal bands are not used in many of the conventional parametric classification methods due to their low spatial resolution, which is different from the VNIR bands. The neural network overcomes this problem because of its multi-scale nature where different resolution bands can be used as an input for image classification. This obviously enables the use of the information in the input vector that would have been lost within the thermal band.

4.2 Accuracy of Back propagation classifier trained with ASTER and Landsat TM input datasetThe back propagation network trained with the combination of the two datasets and their NDVI derivatives showed a better result (85.07%) than any of the results obtained from the networks trained with only one of the datasets as shown on the table below.

40

Table 10: Accuracy of the back propagation classifier trained with ASTER and Landsat TM combined datasets.

Data source both ASTER and Landsat Correct % TM 7 VNIR, 6 SWIR, 1TIR and 2 NDVI Training 85.07

Correct test 77.14

% Error training 0.034

Error test 0.069

The result shows that the maximization of the already high accuracy training obtained from the ASTER and Landsat TM data individually was possible. This explains the ability of a neural network classifier to extract information from multi-source datasets. The advantage of this approach is two fold; it provides a quicker and efficient mechanism to use different datasets in a multispectral image classification, while offering a very high flexibility during the image classification process. Here to explain flexibility; for instance, in the image classification process; any particular input data within the input vector can be easily removed from the network if later considered a noise, or similarly additional data can be included to the network if needed at a later stage of classification after the process has started. This obviously avoids time loss for preparation of multiple input vectors with different input data sources. In other words instead of going over training and data set preparation every time we want to try more or less number of inputs.

4.3 Accuracy of Kohonen/LVQ classifier trained with ASTER or Landsat datasetsTable 11 shows the results of the Kohonen/LVQ classifier trained with ASTER data confirmed the previous results found using the Back propagation classifier for the same dataset. The Kohononen/LVQ classifier trained with the noise-reduced dataset of ASTER image gave a better result than the one with all SWIR bands of ASTER. This indicates that the noise redaction affected the unsupervised classifier as well. A significant decreasing in error and increasing in correct test percentage was observed for the training done after the last two SWIR bands of ASTER were removed.

41

Table 11: Accuracy of the Kohonen/LVQ classifier trained with ASTER data

Test 1 2

Data source: 1-2 ASTER 3 VNIR, 6 SWIR and 1 NDVI 3 VNIR, 4 SWIR and 1 NDVI

Correct % Test 27.42 42.16

Error test 0.242 0.193

Table 12 shows the correct training percentage of the Kohonen/LVQ classifier trained with the Landsat TM data. Once again the unsupervised classifier confirmed the result found from the supervised one. The data set with out the thermal band gave less result than the classifier trained with the thermal bandTable 12: Performance of the Kohonen/LVQ classifier trained with Landsat TM data

Test 1 2

Data source: Landsat TM 4 VNIR, 2 MIR, 1TIR and 1 NDVI 4 VNIR, 2 MIR and 1 NDVI

Correct % Test 37.03 29.19

Error test 0.210 0.236

4.4 Accuracy of Kohonen/LVQ classifier trained with ASTER and Landsat TM combined datasetsThe Kohonen/LVQ classifier trained with the combination of bands from the ASTER and Landsat TM datasets gave better result compared to the results obtained from the datasets individually (Table 13). The classifier also returned the lowest error value compared to all the other unsupervised trainings carried out; hence an indication that the network benefited from the merged input vector, which provided more information that helped to better detect patterns in the input.Table 13: Accuracy of the Kohonen/LVQ classifier trained with ASTER and Landsat TM data

Data source: ASTER and Landsat TM 7 VNIR, 6 SWIR, 1TIR and 2 NDVI

Correct % Test 47.15

Error test 0.176

42

In the unsupervised classification only correct test set percentage is given, correct training evaluation is not available since desired output is not given in the training of the unsupervised classification.

4.5 Validation of the results obtained from the back propagation supervised neural networks classifierThe validation was carried out by using a set of test ground control points that were not used in the training process. Validation was also carried for the classification of the entire image that resulted in the highest correct training percentage (since the most accurate classifier will be used for the final land cover classification of the study area) In this case. the classifier trained with the combined data from the ASTER and Landsat TM image. A total of 1264 points were used in the validation process. The confusion matrix for the supervised back propagation neural network classification is given in the table below.Table 14: Confusion matrix for the Back propagation network classification using ASTER and Landsat TM images

Classified as Correct % Ground control Agriculture (1) Forest (2) Settlement (3) Shrub (4) Swamp (5) Water (6) Y Agriculture 1 372 0 23 14 14 4 427 Forest 2 0 102 0 0 1 0 103 Settlement 3 0 0 0 0 0 0 0 Shrub 4 25 0 31 402 19 8 485 Swamp 5 3 1 6 0 134 11 155 Water 6 0 1 24 0 3 66 94 400 104 84 416 171 89 X = 1264 X

43

Accuracy assessment formula: example for class 1 (agriculture) Accuracy for each class = (1, 1)*100/ X Error of omission = 1-Accuracy (Producers Accuracy) Error of commission = (2+ 3+ 4+ 5+ 6) *100/ Y (users accuracy) Overall accuracy = (1, 1),(2, 2),(3, 3),(4, 4),(5, 5), (6, 6)* 100 X

The validation of the result of the back propagation classifier revealed that one of the classes, settlement is not classified at all, while all the other classes are classified with a very high accuracy. (Table 15) Forest is the best classified class in this case, with 98% class accuracy; and 0.97% error of commission, this ascertains that the land cover map, which will be produced from this classification, will be useful and basic for studies concerning forest cover in the study area. The limited number of settlement class ground control points used to train the network explains why the classifier could not recognize the pattern for the settlement class. The result also indicates that the quality and the size of the training data affect accuracy during training of the neural networks.Table 15: Percentage accuracy of the classes of the supervised classified image using the ASTER and LandastTM data source

Class

Class accuracy

Error of omission (Producers accuracy)

Error of commission (Users accuracy)

Agriculture Forest Settlement Shrub Swamp Water

93 98.1 0 96.6 78.4

7.00 1.92 100.00 3.37 21.64 25.84

12.88 0.97 0.00 17.11 13.55 29.79

74.2 Overall accuracy = 85.13%

44

4.6 Validation of the results obtained from the Kohonen/LVQ unsupervised neural network classifierFor ease of comparison of the results from the two classifiers, the same size of test points used to validate the supervised classifier result was used for the validation of the result from the unsupervised classifier. The validation was carried out for the classification that resulted in the high correct training percentage, i.e. from the dataset containing both ASTER and Landsat TM bands. The confusion matrix for the KWTA/LVQ network classification is given below.Table 16: Confusion matrix for the unsupervised network

Classified as Correct % Ground control Agriculture (1) Forest (2) Settlement (3) Shrub (4) Swamp (5) Water (6) Y Agriculture Forest 1 232 2 57 64 34 0 389 2 0 102 1 6 0 0 109 Settlement Shrub 3 54 0 1 5 1 3 64 4 88 10 41 193 78 1 411 Swamp 5 39 72 15 53 13 2 194 Water 6 15 5 5 13 4 55 97 428 191 120 334 130 61 X= 1264 X

Accuracy assessment formula: example for class 1 (agriculture) Accuracy for each class = (1, 1) *100/ X Error of omission = 1-Accuracy (Producers Accuracy) Error of commission = (2+ 3+ 4+ 5+ 6) *100/ Y (users accuracy) Overall accuracy = (1, 1),(2, 2),(3, 3),(4, 4),(5, 5), (6, 6)* 100 X

Table 17 shows that the unsupervised network classified some settlement class points unlike the Back propagation network, which did not recognize the class at all. This indicates absence of enough information on that particular class in the training data set. 45

This is highly probable since the location of the nature of the areas labeled settlement is very small spatially. For example, a village of 10 to 15 cottages was labeled as settlement in order to be able to get information on the amount of settlements in the area. However, due to the very limited number of training data available for this class, the back propagation network could not learn or recognize the pattern for this class.Table 17: Percentage accuracy of the six classes of the unsupervised classified image

Class

Class accuracy

Error of omission (Producers accuracy)

Error of commission (Users accuracy)

Agriculture Forest Settlement Shrub Swamp Water

54.21 53.40 0.83 57.78 10.00

45.79 46.60 99.17 42.22 90.00 9.84

40.36 6.42 98.44 53.04 93.30 43.30

90.16 Overall accuracy = 47.15%

The result from the KWTA/LVQ network did not give high overall accuracy. The unsupervised network classified some settlement class points unlike the Back propagation network, which did not recognize the class at all.

4.7 Improving the training data qualityAccording to the results obtained from the validation of the previous results, the neural networks could not learn the pattern for the output class settlement. This indicates that there is a high possibility that this occurred because of lack of information in the training dataset for this particular class. If this occurred because of the poor training points taken for the class, it means it had also lowered the overall accuracy of the classification. To confirm whether the training data quality affected the classification accuracy of neural networks, another classification was carried out with the entire sample data of the settlement class removed from the training data. The classification was carried out

46

using both supervised and unsupervised classifiers as described in the previous sections. The results are given below:Table 18: classification result for ASTER-Landsat TM data into 5 classes

No 1 2

Data type both ASTER and Landsat Correct % TM Supervised Back Propagation Unsupervised KWTA Training 90.15 -

Correct test 81.5 46.5

% Error training 0.031 -

Error test 0.074 0.176

With the removal the Settlement class from the training set, the desired output set and the network output increased the correct training percentage of the back propagation classifier while it did not affect the performance of the Kohonen/LVQ classifier. The accuracy obtained is 46.5% with 0.176 mean absolute error, this indicated that the Kohonen/LVQ layer was not affected by the removal of the settlement class training points. The training points used decreased to 1811, 84 points were removed because they were training points representing the Settlement class (Table 19); Respectively training points representing settlement class were removed from the test set as well.Table 19: Confusion matrix for the supervised classification with five output classes

Classified as Correct % Shrub Agriculture Ground control Agriculture (1) Forest (2) Shrub (3) Swamp (4) Water (5) 1 356 0 15 13 1 385 Forest 2 0 104 1 2 0 107 3 37 0 394 21 4 456 Swamp 4 5 0 6 133 1 145 Water 5 2 0 0 2 84 88

X400 104 416 171 90 X = 1811

Y

47

The results obtained from the classification were very satisfactory, with 100% class accuracy for the class Forest, and very reliable results for the classes shrub and water (Table 20). Although it will not be possible to get geographical locations and distribution of the settlements in the area, for other purpose, which do now necessarily include settlements in the objective the land cover that will be produced from this classifier will be adequately accurate.Table 20: Percentage accuracy of the various classes of the supervised classified Image (with 5 classes)

Class Agriculture Forest Shrub Swamp Water

Class accuracy

Error of omission (Producers accuracy)

Error of commission (Users accuracy) 7.53 2.80 13.60 8.28 4.55

89.00 100.00 94.71 77.78

11.00 0.00 5.29 22.22 6.67

93.33 Overall accuracy = 90.69

48

4.8 Sensitivity AnalysisThe result of the sensitivity analysis is given in table 23. The figures in the effect column show the average change in the output over the training set due to a particular input being tested. The effect normalized column is calculated so that, if all the inputs had equal effect, the normalized effect would be 1.0. Inputs with normalized effect larger than 1 contribute more than average to the network output.Table 21: Result of Sensitivity analysis of the ASTER dataset 1Sensitivity analysis for back propagation network

Input

Effect

Effect normalized1.264904 1.038093 1.476381 1.208798 0.981157 1.021417 1.471692 0.421041 0 1.116517

1 Visible green 0.343842 2 Visible Red 0.282187 3 Infrared 0.401328 4 SWIR 1 0.32859 5 SWIR 2 0.26671 6 SWIR 3 0.277654 7 SWIR 4 0.400053 8 SWIR 5 0.114453 9 SWIR 6 0 10 NDVI 0.303505 Sensitivity analysis for KWTA/LVQ network

Input1 Visible green 2 Visible Red 3 Infrared 4 SWIR 1 5 SWIR 2 6 SWIR 3 7 SWIR 4 8 SWIR 5 9 SWIR 6 10 NDVI

Effect0.37995 0.310228 0.383452 0.23123 0.258523 0.278438 0.278438 0.115615 0 0.287879

Effect normalized1.505496 1.229232 1.519371 0.916215 1.02436 1.103269 1.103269 0.458108 0 1.140679

We can see that the 8th and 9th input which are the last two SWIR bands of ASTER did not contribute much in both the supervised and unsupervised networks, especially the last SWIR band (the 9th input) returned 0 for the sensitivity analysis which means it did not count in the learning process at all. For the Landsat TM data inputs have their effect and normalized effect values very close 1.which showed that all bands of LandsatTM and the1

The result of the sensitivity analysis for the other inputs and networks is given in appendix 2

49

NDVI used contributed more or less the same proportion in the learning process both in the supervised and unsupervised networks. This indicates removing any input from this input vector will affect the accuracy of the output, since it would be removing information used to classify the image

50

5 ConclusionsNeural networks gave a good result for land cover classification of the ASTER and Landsat TM images, considering that only spectral information is used for the classification. Both the supervised and unsupervised neural networks classifiers confirmed that noise reduction in an input data affects the accuracy image classification significantly. It is very difficult to be able to compromise between data quality and quantity, since the more quality data we want we need to remove the more data we consider noise. However, noise reduction might lead to the removal of valuable data or information, which might be useful for the classification process. Therefore, careful examination of the input data is very important before deciding on noise reduction. The power of the neural networks to extract information from the multi-scale and multispectral datasets, in order to come up with a better classification result was observed both in the supervised and unsupervised neural network classifiers. It was possible to use the TIR band of the Landsat TM and the SWIR bands of the ASTER image, which have a different ground resolution than the rest of the bands of the images. The information extracted from thermal band increased the accuracy of the image classification significantly. The back propagation supervised neural network classifier proved to be a highly accurate classifier than the Kohonen/LVQ unsupervised classifier. The comparison between the result of supervised and unsupervised showed that the supervised neural networks gave a better class result for image classification however this does not give enough ground to conclude that the unsupervised neural networks are not useful for image classification. Because it is known that neural networks might not converge to the pattern into which the data should be classified due to absence of enough information in the input vector. Basically the presence of more data like DEM, soil type and other thematic information increases the accuracy of unsupervised networks image classification.

51

Research question: Is there a significance difference in the accuracy of supervised neural network classification and unsupervised neural network classification?In summary, the unsupervised neural network gave the most inaccurate results, which can not be used for image classification for this particular dataset and area. However,; it would be premature to reach this conclusion such as a failure of unsupervised classification since many other factors like availability of ancillary data affects the accuracy of unsupervised neural networks classification. Fauzi et al (2001) explained in their result that the adding of ancillary information like digital elevation model (DEM) is proved to increase the classification accuracy. This confirms that a digital elevation model is a valuable input that gives additional information in order to improve the accuracy of neural network classifier in image classification. Unfortunately due unavailability of DEM data it was not possible to see the significance it will make if used in the classification process for our study area. The supervised neural network resulted in a high accuracy classification result, which successfully can be used for making of the land cover map of the study area. To conclude the answer for the first research question of this study, the results of supervised and unsupervised neural network classification have a significant difference. The supervised neural network classifier proved to be robust, generalizing and accurate as illustrated in Table 22.Table 22: Accuracy of supervised and unsupervised neural network classifiers

Class

Agriculture Forest Settlement Shrub Swamp Water Over all accuracy

Class accuracy (%) (Supervised) Six classes 93 98.1 0

Class accuracy (%) (Unsupervised) Six classes

Class accuracy (%) (Supervised) Five classes

54.21 53.40 0.83 57.78 10.00 90.16 47.15

89.00 100.00 94.71 77.78 93.33 90.69

96.6 78.4 74.2 85.13

52

Research Question: Which type of neural networks classification; supervised or unsupervised, will handle poor quality data better?Both indicated that a less noisy data generalizes faster and with a better accuracy. This was detected from the classification done on the ASTER image; both back propagation and the KWTA/LVQ network gave increased correct training percentage after the last two SWIR bands1 were removed from their input vector. The ability to utilize multi-scale data in order to extract information that maximizes classification was detected in both networks with the Landsat TM training data; both networks gave better correct training percentage for the input with a thermal band included than the input without thermal band. When it comes to handling a poor quality dataset the unsupervised neural network indicates that it was less affected than a supervised network was. The additional classification carried out without using the settlement class has proved this clearly. The result of the back propagation network increased after the settlement class was removed, while this did not change or affected the unsupervised classifier, Kohonen/LVQ network, since it already learned all the information it can get with the unsupervised mode and was not forced to recognize a pattern enforced by a desired output. As a result it could not perform more, after what were considered a poor training pairs (the training points representing the settlement class) were removed (See Table 23). This can be explained by the mechanism of learning the two networks operate with. In the supervised classification a set of desired outputs are provided, which are supposed to correspond to the input data in a certain pattern. The supervised neural networks learn2 in such a way that their error is propagated back after every iteration, so that their output resembles the desired output according to some transfer function pre-assigned to the networks. This will restrict the networks from recognizing any existing pattern or relationship that is not supported by the provided desired output. Even though having training data is very good to get more accurate image classification, this does not always hold true, since either this

1 2

Why the last 2 SWIR bands are considered noise was explained in Chapter 4 The learning mechanism of the supervised and unsupervised networks is given in detail in Chapter 2

53

data might not be available or it might not be reliable. In the previous case where the training data was not available unsupervised neural networks can be provided as an alternative, in the later case where the training data was not reliable, unsupervised neural networks can be used to test the reliability of a training data. This concludes the second research question by phrasing that the unsupervised neural networks can utilize poor or less correlated data better than supervised networks.

54

6 RecommendationThe neural networks classification (both supervised and unsupervised) undertaken in this study was by using only spectral information from satellite images; since neural networks can accommodate different GIS and ancillary data in image classification process, the classification accuracy of the datasets can be maximized by using additional information like Digital Elevation Model (DEM), Soil Map, Geology map etc. during the image classification process. The other important point, which could not be covered in this study, was incorporating the five thermal bands of ASTER into the neural networks classification; it was not possible to use all the 14 bands of the ASTER image, due to the large size of the study area having more input layers would have slowed down the data preparation and processing time beyond the time available for this study. The thermal bands of ASTER might give more classification accuracy both in supervised and unsupervised cases. The application of fuzzy logic in the neural network image classification process is proved to increase image classification in other studies (Abuelgasim et al, 1999). Application of these systems is expected to increase the accuracy of the neural network classifiers significantly. More researching has to be done in this aspect to find out if the fuzzy logic/systems will increase the accuracy of image classification for the datasets used. Finally, detailed land cover maps and other kinds of GIS data like soil map, hydrology map etc. of the adjacent areas of the study area should be produced in order to have sufficient geographic information data, which can be used for sustainable resource management and development planning of the area.

55

ReferencesAbuelgasim, A.A., Ross,W.D. Gopal,S. and Wookcock, E.1999. Change Detection Using Adaptive Fuzzy Neural Networks: Environmental Damage Assessment after the Gulf War. Remote Sensing and Environment. 70:208223 Elsevier Science Inc. NY. Alavi,F.(2002)."A survey of Neural Networks: part I." http:www.iopwe.org/jul97/neural1.html. Anderson, D., and McNeill,G. (1992). Artificial Neural Networks Technology. Utica, N.Y, Kaman Sciences Corporation. Badran, f. M., C. and Crepon, M remote sensing operations. Paris, France, CEDRIC. Berberoglu, S., Lloyd, C.D.,Atkinson, P.M., and Curran,.2000. The Integration of Spectral and Textural Information Using Neural Networks for Land cover Mapping in the Mediterranean. Chen, Z. (2000). Data Mining and Uncertain Reasoning: an Integrated Approach. New York, John Wiley & sons, inc. Fauzi,A., Hussin,A.Y. and Weir,M. 2001. A Comparison Between Neural Networks and Maximum Likelihood Remotely Sensed Data Classifiers to Detect Tropical Rain Loggedover Forest in Indonesia.22nd Asian Conference on Remote Sensing. Singapore. Gahegan, M. G., G, and West G. (1999). "Improving Neural Network Performance on the Classification of Complex Geogrphic Datasets." journal of geographical systems 1: 322.

56

Kulkarni A. D., Lulla, K.1999. Fuzzy Neural Network Models for Supervised Classification: Multispectral Image Analysis.Geocarto international. Vol.14. No.4. Geocarto International Center. Hong kong. Kumar,M.,and Srinivas, S.2001. Unsupervised Image Classification by Radial Basis Function Neural Network (RBFNN).22nd Asian Conference on Remote Sensing.Singapore. Luo,j. and Tseng, D.2000. Self-Organinzing Feature Map for Multi-Spectral Spot Land Cover Classification.GISdevelopment.net.Taiwan. Logical Designs.1996.ThinksPro: Neural Networks for Windows. Users guide. REF1 (2002). Neuro-Fuzzy Systems. http://www.cs.berkeley.edu/~anuernb/nfs/, University of California at Berkeley. REF2 1999. Landsat Thematic Mapper. http://edc.usgs.gov/glis/hyper/guide/landsat_tm#tm8. REF3.2002. ASTER. http://asterweb.jpl.nasa.gov/ REF4 2000.Neuscence, Intelligence Technologies http://www.neusciences.com/technologies/intelligent_technologies.htm. REF5 (2002). Supervised and Unsupervised Neural Networks http://www.gc.ssr.upm.es/inves/ann1/concepts/Suunsupm.htm. Roy, A. (2000). "Artificial neural networks-A Science in Trouble." SIGKDD explorations 1(2): 33-38.

57

Vani, k.2000. "fusion of ASTER Image Data for Enhanced Mapping of Land Cover Features."GISdevelopment.net

http://www.gisdevelopment.net/application/environment/pp/envp0005pf.htm.Vassilas,N.,Perantonis,S.,Charou,E.,Varoufakis,S., and Moutsoulas,M.2000. Neural Networks for Fast and Efficient Classification of Multispectral Remote Sensing Data. 5th Hellenic conference on Informatics.University of Athens, Greece. Wudneh, T. (1998). Biology and Management of Fish Stocks in Bahir Dar Gulf, Lake Tana, Ethiopia. Wageningen Institute of animal Science. Wageningen, Wageningen University: 144.

58

AppendicesAppendix1: Dataset Projection Information

All the dataset used in this study are map-registered to Transverse Mercator Projection.

Projection: Datum: Spheroid: Unite of Measurement:

Transverse Mercator Adindan (30th Arc) Clarke 1880 (Modified) Meter

Meridian of Origin

39000 East of Greenwich

Latitude of Origin

Equator

False Easting:

500,000m

False Northing:

0 meters (nil northing)

Scale factor at Origin

09996

Grid

U.T. M. Zone 37

59

Appendix2: Results of Input Sensitivity Analysis2.1 Sensitivity analysis of the ASTER dataset inputs for the Back propagation classifier Input1 Visible green 2 Visible Red 3 Infrared 4 SWIR 1 5 SWIR 2 6 SWIR 3 7 SWIR 4 8 SWIR 5 9 SWIR 6 10 NDVI

Effect0.343842 0.282187 0.401328 0.32859 0.26671 0.277654 0.400053 0.114453 0 0.303505

Effect normalized1.264904 1.038093 1.476381 1.208798 0.981157 1.021417 1.471692 0.421041 0 1.116517

2.2 Sensitivity analysis of the Landsat TM dataset inputs for the Back propagation classifier Input1 Visible blue 2 Visible green 3 Visible Red 4 Near infrared 5 Mid infrared 6 TIR 7 Mid infrared 8 NDVI

Effect0.609463 0.624913 0.740876 0.514623 0.699164 0.67013 0.646624 0.581552

Effect normalized0.958399 0.982694 1.165049 0.80926 1.099457 1.053799 1.016836 0.914508

2.3 Sensitivity analysis of the combination of ASTER and landsat TM dataset inputs for the Back propagation classifier Input1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Visible blue Visible green Visible Red Near infrared Mid infrared TIR Mid infrared NDVI (from Landsat TM) Visible green Visible Red Infrared SWIR 1 SWIR 2 SWIR 3 SWIR 4 NDVI (from ASTER)

Effect0.488368 0.564402 0.592826 0.451129 0.517241 0.628186 0.558333 0.395887 0.549178 0.525436 0.396506 0.492817 0.431338 0.442284 0.407337 0.445065

Effect normalized0.990814 1.145074 1.202741 0.915263 1.049392 1.27448 1.132761 0.803186 1.114187 1.066019 0.804442 0.99984 0.87511 0.897318 0.826416 0.902959

60

2.4 Sensitivity analysis of the ASTER dataset inputs for the KWTA/LVQ classifier Input1 Visible green 2 Visible Red 3 Infrared 4 SWIR 1 5 SWIR 2 6 SWIR 3 7 SWIR 4 8 SWIR 5 9 SWIR 6 10 NDVI

Effect0.37995 0.310228 0.383452 0.23123 0.258523 0.278438 0.278438 0.115615 0 0.287879

Effect normalized1.505496 1.229232 1.519371 0.916215 1.02436 1.103269 1.103269 0.458108 0 1.140679

2.5 Sensitivity analysis of the Landsat TM dataset inputs for the KWTA/LVQ classifier Input1 Visible blue 2 Visible green 3 Visible Red 4 Near infrared 5 Mid infrared 6 TIR 7 Mid infrared 8 NDVI

Effect0.370968 0.355568 0.301042 0.436122 0.341115 0.433549 0.387201 0.334491

Effect normalized1.002597 0.960977 0.81361 1.178687 0.921915 1.171733 1.046469 0.904012

2.6 Sensitivity analysis of the combination of ASTER and landsat TM dataset inputs for KWTA/LVQ classifier Input1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Visible blue Visible green Visible Red Near infrared Mid infrared TIR Mid infrared NDVI (from Landsat TM) Visible green Visible Red Infrared SWIR 1 SWIR 2 SWIR 3 SWIR 4 NDVI(from ASTER)

Effect0.277849 0.245799 0.19504 0.275828 0.236521 0.229315 0.236521 0.21155 0.302894 0.252535 0.28774 0.291602 0.295414 0.304735 0.234143 0.21934

Effect normalized1.085128 0.95996 0.761721 1.077236 0.923722 0.895582 0.923722 0.826202 1.182941 0.986265 1.123756 1.138841 1.153728 1.190133 0.914438 0.856625

61

Appendix3: Neural network parameters used3.1 Parameters used for the Back Propagation neural network classifierError type batch size input layer input preprocessing sample arrangement # of hidden layer max nodes learning rule input function transfer function output nodes learning rule input function transfer function

case

Architecture

nodes

1 2 3 4 5 6

MNFF MNFF MNFF MNFF MNFF MNFF

MAE MAE MAE MAE MAE MAE

1 1 1 1 1 1

10 8 8 7 16 16

(Mean/SD) (Mean/SD) (Mean/SD) (Mean/SD) (Mean/SD) (Mean/SD)

Normal Normal Normal Normal Normal Normal

1 1 1 1 1 1

21 17 17 15 33 33

21 17 17 15 33 33

BPN BPN BPN BPN BPN BPN

RBF DP DP RBF DP DP

Sigmoid Sigmoid Sigmoid Sigmoid Sigmoid Sigmoid

6 6 6 6 6 5

BPN BPN BPN BPN BPN BPN

DP DP DP DP DP DP

sigmoid sigmoid sigmoid sigmoid sigmoid sigmoid

3.2 Parameters used for the Kohonen/LVQ neural network classifierError type batch size input layer input preprocessing sample arrangement # of hidden layer max nodes learning rule input function transfer function output nodes learning rule input function transfer function

case

Architecture

nodes

1 2 3 4 5 6

MNFF MNFF MNFF MNFF MNFF MNFF

MAE MAE MAE MAE MAE MAE

1 1 1 1 1 1

10 8 8 7 16 16

(Mean/SD) (Mean/SD) (Mean/SD) (Mean/SD) (Mean/SD) (Mean/SD)

Normal Normal Normal Normal Normal Normal

1 1 1 1 1 1

21 17 17 15 33 33

21 17 17 15 33 33

KWTA KWTA KWTA KWTA KWTA KWTA

L2 RBF L2 L2 L2 L2

WTA WTA WTA WTA WTA WTA

6 6 6 6 6 5

LVQ LVQ LVQ LVQ LVQ LVQ

RBF RBF RBF RBF RBF RBF

WTA WTA WTA WTA WTA WTA

62

Recommended

View more >