GML in the Classroom

Therese Røsholdt
M.Sc. Student
Østfold University College
Halden
Norway
therese.rosholdt@hiof.no

Biography

Therese is studying computer science at Østfold University College, Halden, Norway. She is especially interested in human-computer interaction and graphical programming. She is a member of the Project OneMap Team.

Gunnar Misund
Associate Professor
Østfold University College
Halden
Norway
gunnar.misund@hiof.no
http://www.ia.hiof.no/~gunnarmi/

Biography

Gunnar is teaching and researching at Østfold University College, Halden, Norway. His main research interests are digital (web) mapping and distributed computing. He is the founder of Project OneMap.


Abstract


This paper first explains one of the reasons for using GML (Geographic Markup Language) as a development tool; the need of a tool to bridge the gap between cartographers and software developers. It then goes on to describe the Digital Maps course that ran in the spring of 2003 at Østfold University College and the projects related to this course [DM] . Finally, a summary of the experiences made by the student using GML in their projects is added.


Table of Contents


1. Why GML?
2. The Digital Maps course
     2.1 Project OneMap
     2.2 GML editor
     2.3 Tiger/Line to GML conversion
3. GML experiences made during the projects
     3.1 Starting with GML
     3.2 XML Schema
     3.3 GML as a tool
Bibliography

1. Why GML?

Since computers entered the scene more and more trades and professions have found themselves faced with having to use some kind of software application and the internet. Today you will find that a lot of professionals, in addition to different office applications rely on one or more specialized software tools to do their job. This is also true for cartographers; cartography is a field that has taken modern technology into use in a large extent. GPS (Global Positioning System) technology is used for surveying, databases are used for storage and maps are edited using sophisticated software.

Problems always arise when software applications are developed; the clash of professions involved can be one cause. Dealing with cartography the case is often that the “data-people” know little or nothing about maps, and the “map-people” know little or nothing about programming or software development. The result of this problem gets displayed on the internet by a lot of web sites that include maps. For instance, the maps can be confusing and hard to read, navigation can be troublesome both within the site and considering the map, or the maps can be old and out of date.

One way of bridging the gap between the professions involved can be the use of GML . GML was partially created by cartographers and therefore follows several cartography traditions. Using GML as a development tool can then build on this traditional knowledge and it should give cartographers a better insight into what software developers are doing, and how they are doing it. Indeed, they can use GML themselves and be even more integrated in the development process than before.

2. The Digital Maps course

The Digital Maps course ran in the spring of 2003 at Østfold University College, with 6 students finishing the course. The lecturer of the course was Gunnar Misund, associate professor at Østfold University College. The main objective was to provide the students with a practical introduction to digital maps and geodata in general and in particular in the context of the Web. The course was carried out as individual student projects that all in turn were subprojects of a larger project called OneMap [MIS1] . Most of the student projects involved the use of GML .

The course ran over a total of seven weeks and the students had only this course to concentrate on. The first three weeks consisted of lectures held by Gunnar Misund, two excursions and deciding on the projects. The lectures covered subjects like GIS (Geographic Information Systems) , Project OneMap, cartography, Norwegian mapping standards, GML , XML (eXtensible Markup Language) and SVG (Scalable Vector Graphics) . The excursions took the students to the local city’s surveying department, and to the Norwegian Institute of Land Inventory [NIJOS] . The following weeks were dedicated to the projects. The class met once or twice a week; the students then presented progress and problems within their projects and subjects of interest were discussed and highlighted. During these weeks the students also had individual meetings and continuous mail contact with Gunnar Misund about their projects. The course used no textbook, but had its own homepage that supplied useful material and links.

There will be held more Digital Maps courses in the future, carrying on the work with Project OneMap. The Digital Maps course is part of the Computer Science master study at Østfold University College, and is linked to the specialization in Environmental Computing.

Section 2.1 describes Project OneMap; Section 2.2 and Section 2.3 describes three of the student projects carried out during the course.

2.1 Project OneMap

Project OneMap is a long term project fathered by Gunnar Misund. In short, the project aims to build a large, global map stored and processed in a scalable and redundant distributed architecture. The core idea is to build the map incrementally and uncoordinated by many submissions. The repository will serve researchers and organizations that are in need for free-of-charge geodata with global, consistent coverage. The framework relies heavily on the OGC (Open GIS Consortium) work, in particular GML and the web services WFS (Web Feature Service) and WMS (Web Mapping Service) . In addition, related XML technologies such as SVG will be used in for example user interfaces. By using GML (and SVG ) as part of the toolbox for OneMap it is the hope that the project will be one of the bridges between cartographers and software developers.

2.2 GML editor

One of the main objectives of OneMap is to make it possible to build a huge map in an incremental and uncoordinated manner with contributions from a wide variety of parties, but still guarantee a reasonable level of reliability and quality. The main problem is to integrate submissions with the existing geodata. The problem is addressed by using a framework based on the well known principles of peer review. To realize the peer review process which is to take place in the Clearinghouse of OneMap there was a need for an application which could merge the existing data in storage with the data that the user submits as "New".

The development of this application, the GML editor, was comprised by two student projects. Mats Lindh’s project “Realization of the first part of the OneMap Peer Review Process” [LIN] took care of the merging and provision of data for Henning Kristiansen’s project “OneMap.Submission.GUI” [KRI] which is a browser based GUI (Graphical User Interface) . The projects also crafted a specific OMPacket schema, which was in turn inspired by the GMLPacket format described in the GML 2.0 specification [GML20] . Mats and Henning worked in close contact during the development.

the_application_v2.png

Figure 1: GML editor GUI

The merging and provision of data was done to identify possible conflicts, geographic features that intersect with the area of the new data and to generally provide simple means of merging the data into one file. New submitted data is checked for conflicts with the already stored data (in terms of intersecting features) and is then submitted to the GUI . The user makes adjustments to the submitted data while having the original data in a separate layer. This requires a simple selection of which features to export to the GUI , and also the possibility of a simple selection of features that are "close" to the new data. These problems have been solved to a certain degree by implementing a straight-forward feature versus feature intersection check and solving the intersections for all features in the new and old data. This is again provided to the GUI in small parts (for easier editing and for not overloading the GUI with information) by a separate module. All the different applications (except for the OneMapResponseParser) accept the same form of input, XML based files conforming to the OMPacket schema. Figure 2 shows how this creates a quite flexible tool chain which may be reused in other projects in the future.

structure.png

Figure 2: Tool chain structure

When the projects started, the main plan was to use the GMLPacket format as the output format for the editor. Mats and Henning decided that they needed quite a few features that the GMLPacket format did not provide, hence the OMPacket was born. The most important thing the OMPacket provides is the possibility of including LinearRings directly, especially since this is the format that OneMap uses for all its geographic structures. It also makes it possible to store an OmLinearRing, which consists of one or more orderedPointSets, which in turn can have a number of properties. This was done to make it conformant with the internal OneMap format and to retain a number of different properties and information about the structures, even through the stages where the user makes his editions. They also wanted a format which provided the possibility of including the different multi*-structures, so those got added to the format too. Last they added another feature that the original schema did not provide; OMPacket has the ability to include several "Layers" in one file with their own boundingBox. This makes it possible to join OMPacket files in a flexible way, and still retain the information from the original packet. The several layers may also be used to implement different sorts of information in just one packet, i.e. buildings, land, roads, phone lines etc. The OMPacket is now a quite flexible and feature rich exchange format for GML compliant data. All the different structures from GML are accepted, so a parser can just run through the original file and spit out the different features when it finds them. The tools of the projects do, however, rely on the LinearRing property of the features only, so all the possibilities are in no way used at the moment. The main point was that the OMPacket format should be usable for most of the applications that require some sort of exchange of data with other units, and that an application may use a subset of the functionality provided by the OMPacket schema.

The GML editor is implemented with SVG and DOM (Document Object Model) scripting, and is designed as a lightweight client application able to run in a standard Web browser, such as the MS Internet Explorer. The server side of the system takes care of translating GML compliant documents into carefully structured SVG instances. After modifying the SVG document with a DOM supporting plug-in (e.g. Adobe SVG Viewer 3.0 [ADO] ), the changes are transferred back to the server, which applies the modifications to the original GML file. As indicated in Figure 3 , the OMPacket and the GMLPacket are currently the two schemas accepted as defining the valid input formats for the editor.

architecture.jpg

Figure 3: Architecture

A GML feature is defined by a set of properties. A feature property may fall in one of two categories, either as geometry or thematic (non-geometric) information. The editor is capable of treating modifications of both types of information. The geometric editing is based on moving, deleting and inserting points and groups of points. Thematic properties are edited in tables.

In the main modus the editor handles a GML file where the modifications and the original data are embedded as separate layers. The editor treats the layers independently, and is able to perform locking and visibility operations on each layer. All modifications are recorded in a history list, and it is possibly at any point in the editing process to revert to an arbitrary point in the process and undo/redo the corresponding changes of the document.

There is written a separate paper on the SVG part of the editor, “Distributed GML Management with SVG Tools” [MKL] .

The main rationale for designing and developing the GML editor was to provide a flexible and efficient tool for presenting and editing data represented on the XML format GML , as defined by the OGC specification GML 2.1 [GML20] . This version of GML has recently been replaced by the more complex set of GML 3.0 schemas (which are backwards compatible with 2.1) [GML3] . GML 3.0 is expected to be adapted and adopted as an ISO (International Standardization Organization) standard, ISO TC211/19136, hopefully some time during 2004.

2.3 Tiger/Line to GML conversion

In Project OneMap GML is used as the basis for all data. To be able to include the census data from USA there was the need for a method and a tool to convert from the TIGER/Line format [TL] used by the U.S. Census Bureau to GML .

Knut-Erik Johnsen conducted the project “Tiger/Line Conversion“ [JOH] . The goal was to make a parser that can take a file of the Tiger/Line 2002 format and translate it into a GML 2.0 compliant file. Since topology is coming in the near future in GML 3.0 and especially GML 4.0, the intention was not to make a full topological implementation. Instead the project set out to make two implementations, one that writes the GML file as straight out as the Tiger/Line file is read in, and one that sorts the GML file by featurename. The implementation, developed in Java, got limited to cover only the roads in the Tiger/Line format.

In the Tiger/Line format the data for each county is stored in a single compressed file which includes up to 19 files. In counties where there are no data for some of the file types, these files are not included. The Tiger/Line files contain data describing three major feature types, Line Features, Landmark Features and Polygon Features. Data about the different features are spread over multiple files. To find all the relevant information about an object, data from several files needs to be linked.

To successfully map from the Tiger/Line format to GML , the parser needs an internal data structure to model the structure used in the Tiger/Line files. Roads are there represented as chains and nodes. Since the data in the Tiger/Line files are arranged in strict topological order, the application builds up a network of edges and nodes, where each chain represents an edge and each start and stop coordinate is a node. This means that normal network algorithms can be used to traverse the data and make the GML . The main purpose of this part of the system is to build a FeatureCollection object. This object contains a Vector with all the features in a county, and a CoordinateCollection with information about all the coordinates and the bounding box of this county. The purpose of building such an object is to make a universal GMLmaker that can take any data that are modeled topologically. Hence it should not be hard to develop parsers for other formats than the Tiger/Line file. One big problem with this kind of modeling is the use of resources. In a large county with many roads and coordinates there will be a lot of objects in the system. The properties in particular will take a lot of memory, since they are stored as strings.

In Figure 4 it can be seen how the structure of the objects is built up. Since the files are structured in a very ordered way, the files can be traversed one at a time and properties just added to the complete chain. A complete chain is defined in a Line object. This Line object has a GeoProperty object containing the Line’s coordinates and srsName. It has a Vector containing all the feature identifiers for the chain, this can be alternate names to roads, such as Highways and Interstates which have other local names as well. It also has a Vector containing all information about address ranges on both sides of the chain. Finally, all other properties are put into a property Vector, which in some cases grows very large.

objectmodel.jpg

Figure 4: Object structure

The project spawned two different implementations of how the data taken from the FeatureCollection can be modeled in GML . The first implementation is the simplest one; it just dumps the complete chains in the order which they are parsed out on the file. This “spaghetti” modeling does not create any easy possibility to do clever operations on the data. If one for instance would like to go from one point to another, using different roads, one must traverse up and down in the DOM tree to find the next section of the road. Another approach is to model the topology in a different file. This makes the network traversals very easy if trying to do the same as above. However, this model has not been implemented in this project. The second implementation is a middle approach; it sorts the complete chains based on the feature names. This means that all roads having the same name with interconnecting coordinates will come in a continuous order. Both implementations follow the OMPacket schema defined by Mats Lindh and Henning Kristiansen.

The parser is not limited to handling one county at a time. There should be no problem merging counties if one can find a good solution to the memory problem mentioned earlier. The parser has problems now when the zip file containing the data is larger than 3 MB, and some counties are by themselves more than 20MB. Because of this another more memory efficient method to model the data needs to be found. When the memory problem is overcome, it should be easy to merge counties, states and in the most extreme, merge the entire USA into one file.

3. GML experiences made during the projects

A small survey was carried out among the three students responsible for the projects described in Section 2.2 and Section 2.3 . This was done to collect some of experiences the students had made with GML during the work on their projects.

3.1 Starting with GML

Only one of the students had encountered GML previously; this was also a project connected to Project OneMap, but at that time he only transferred GML data and converted it to SVG . All of the students had prior knowledge about XML , gained during other courses at Østfold University College.

Since the Digital Maps course took place in a time span as short as seven weeks it was of great interest to find out how the students found working with GML considered the time aspect. None of them thought GML was hard to understand, and none of them actually looked upon the short time available as a problem. They found the 2.0 specification which they used easy and clearly set out, but had some concern about how it would have been to fully grasp the 3.0 specification, partially because of its size, in the same amount of time.

On top of the lectures about cartography none of the students felt that GML took their understanding of maps any further, but then again they did not consider that a problem. The students found GML help on the internet useful in a varying degree.

3.2 XML Schema

Project OneMap uses the XML Schema as its schema language, but in all earlier work the students had been working with DTDs. So, what did they think about using the XML Schema language instead?

The students liked working with the GML 2.0 schemas; the main opinion was that schemas are both easier to understand and to implement than DTDs. They found that it gave them a good overview of the file structure and preferred the syntax of the schemas to the DTDs; string versus PCDATA and so forth. They also liked the schemas close connection with relational databases and standard database schemas. They considered studying the schemas a good way to familiarize themselves with GML .

3.3 GML as a tool

The students were enthusiastic about how easy they found it to implement geographical data in their projects with the use of GML .

The developers of the OMPacket expressed excitement over the uniform structure of GML documents; otherwise they would have had to learn a lot more about parsing a given format than just implementing a simple SAX-parser. Since their project extended the suggested GML Packet-format with a layer structure and other simple extensions, they thought it very handy to have an already existing infrastructure in place. This infrastructure made the development go much faster since it already had been someone who had thought of most of the problems that they encountered and which they otherwise would have had to find solutions to themselves.

Bibliography

[ADO]
Adobe SVG Viewer 3.0. Adobe Systems Incorporated. http://www.adobe.com/svg/overview/whatsnew.html.
[DM]
Digital Maps. Gunnar Misund, Østfold University College http://www.ia.hiof.no/digmap/.
[GML20]
OpenGIS® Geography Markup Language (GML) Implementation Specification, version 2.0. Open GIS Consortium, Inc., 2001-20-02 (OpenGIS Project Document Number 01-029) http://www.opengis.net/gml/01-029/GML2.html.
[GML21]
OpenGIS® Geography Markup Language (GML) Implementation Specification, version 2.1.2. Open GIS Consortium, Inc., 2003-17-09 (OpenGIS Project Document Number 02-069) http://www.opengis.net/gml/02-069/GML2-12.html.
[GML3]
OpenGIS® Geography Markup Language (GML) Implementation Specification, version 3.0. Open GIS Consortium, Inc., 2003-01-29 (OGC 02-023r4) http://www.opengis.org/techno/documents/02-023r4.pdf.
[JOH]
Tiger/Line Conversion. Knut-Erik Johnsen, Østfold University College .
[KRI]
OneMap.Submission.GUI. Henning Kristiansen, Østfold University College .
[LIN]
Realization of the first part of the OneMap Peer Review Process. Mats Lindh, Østfold University College .
[MIS1]
The One Map Project. Gunnar Misund and Knut-Erik Johnsen. In online proceedings from GML Dev Days, Vancouver, July 2002 http://www.gmldev.org/GMLDev2002/presentations/MakingMapsfromGML/johnsen/one_map_misund_johnsen.html.
[MKL]
Distributed GML Management with SVG Tools. Gunnar Misund, Henning Kristiansen, Mats Lindh, Østfold University College .
[NIJOS]
The Norwegian Institute of Land Inventory. http://www.nijos.no/English/index_e.htm.
[TL]
Tiger/Line Files 2002 Technical Documentation. U.S. Census Bureau http://www.census.gov/geo/www/tiger/tiger2002/tgr2002.pdf.

XHTML rendition created by gcapaper Web Publisher v2.0, © 2001-3 Schema Software Inc.