Krista Lagus
Self-organizing maps for studying the similarities in term usage
The research in data-oriented study of natural language use has grown remarkably in recent years. In the statistical approach the use of terms is studied e.g. by collecting information of the frequencies of contextual features of the terms. These features may be, for example, surrounding syntactic classes, morphological properties of surrounding terms, other terms in the close vicinity, or even the other terms in the whole document.
With standard methods it is easy to answer specific questions, such as, how typical is this term in connection to that other term. Or, what are the most typically occurring terms within two-term window of this term. However, the amount of possible specific questions is vast, and knowing what are the most interesting questions requires a good general understanding of the problem.
Forming such general understanding can be made much easier with proper tools that allow data exploration. Self-Organizing Maps (SOMs) can be used to explore similarities and differences of terms and their usage. The SOM organizes the given data, be it terms or whole documents, so that similar terms (or documents) appear near each other on the map display. Similarity is defined by the features used. The properties of each term can then be studied in more detail with exploration of the map. In addition, the map display can be used as a background for visualizing various properties of the data set.
Different kinds of maps are obtained by different selection of features. In the lecture, various examples of such maps and their use will be provided to illustrate the potential usefulness of the approach as a tool for terminological research.
Krista Lagus D.Sc.(Tech) Krista.Lagus@hut.fi Neural Networks Research Centre tel. +358-9-451 3276 Helsinki University of Technology P.O.Box 5400, FIN-02015 HUT, Finland http://www.cis.hut.fi/krista/