The One Map Project

Gunnar Misund
Associate Professor
Østfold University College, Halden, Norway
gunnar.misund@hiof.no
http://www.ia.hiof.no/~gunnarmi

Knut-Erik Johnsen
Master Student
Østfold University College, Halden, Norway
knut.e.johnsen@hiof.no

Abstract

This paper describes a long term project called One Map: background, objectives, current status and directions for the future. In short, the project aims to build a large, global map stored and processed in a scalable and redundant distributed architecture. The core idea is to build the map incrementally and uncoordinated by many submissions. The repository will serve researchers and organizations that are in need for free-of-charge geodata with global, consistent coverage. The framework relies heavily on the OGC work, in particular GML and the web services (WFS and WMS). In addition, related XML technologies such as SVG will be used in for example user interfaces.

Keywords: Distributed geodata storage, distributed geodata processing, web services, incremental map construction, world map, GML, XML, WFS, WMS, SVG.

1  Background

Dealing with geodata is not easy. Here are some of the reasons:

  1. Geodata is inherently difficult to obtain and/or too expensive (at least if you are interested in non-US data).
  2. If you are lucky to get your hands on some relevant geodata, you often have to find appropriate conversion tools and/or buy some expensive piece of GIS software.
  3. If your area/information of interest requires use of data from diverse sources and/or on different (and most likely incompatible) formats, you need to waste time to stitch the patches together and repair broken topological structures etc.
  4. If you take the risk to carry out some non-standard operations and analysis on your data, you may need to write a lot of software on your own, or try to learn yet another cryptic scripting language bundled with your GIS.
  5. When you finally have done your things with your data, you may encounter severe problems in exporting the results on an appropriate data format. If not, you or some other will experience the problems in 2).

The scene is gradually changing. Large vendors and geodata suppliers are joining efforts to develop and implement standards encompassing the entire geodata field, from storage formats to web services. The OGC work is rapidly gaining support and there is already a variety of products and data conforming to the different specifications. The international standardization organization, ISO, has been working in parallel, synchronizing threads from both regional (e.g. the CEN TC 287 standards, Europe) and national standardization work (e.g. the SOSI standard, Norway). Luckily, there are also signs of harmonization between OGC and ISO.

Many GIS people is of the opinion that the most important piece of standardization is GML. In many ways, GML (and the family of XML technologies in general) can be considered the glue that potentially could keep the fragmented GIS world together.

In order to investigate and exploit the new standards and technology, a long term project called One Map will be launched this August, initiated and coordinated by Østfold University College, Department of Environmental Computing, Halden, Norway. This paper describes the project. In section 2 we present some evidence of the rapidly growing market for geodata, and then give some examples of web sites meeting this demand. Some core challenges in global geodata management are identified in section 4. Based on a few strategy fragments, we then present a vision of a global, detailed world map constructed by and accessible to the public. In section XX we give the current status of the project, and close the paper with some remarks.

2  Demand

It is not easy to exactly identify, describe and assess the needs for publicly accessible global geodata. Here we give some examples of the diverse types of driving forces as a backdrop for this white paper.

2.1  Digital Earth

In 1998 US Vice President Al Gore gave a famous (at least in the geodata communities) speech titled The Digital Earth: Understanding our planet in the 21st Century [GORE]. Gore envisions a "...multi-resolution, three-dimensional representation of the planet, into which we can embed vast quantities of geo-referenced data." Digital Earth (DE) is seen as common ground for both users and producers of a wide variety of geodata:

"... A Digital Earth could provide a mechanism for users to navigate and search for geospatial information - and for producers to publish it. The Digital Earth would be composed of both the "user interface" - a browsable, 3D version of the planet available at various levels of resolution, a rapidly growing universe of networked geospatial information, and the mechanisms for integrating and displaying information from multiple sources."

Further, Gore emphasizes the non-bureaucratic aspect of a Digital Earth:

"... Obviously, no one organization in government, industry or academia could undertake such a project. Like the World Wide Web, it would require the grassroots efforts of hundreds of thousands of individuals, companies, university researchers, and government organizations...Like the Web, the Digital Earth would organically evolve over time, as technology improves and the information available expands."

The visionary Digital Earth white paper initiated a set of interesting activities and projects, with participants from both academia, vendors, and political bodies. However, there seems to be a long way to go before DE comes alive.

2.2  United Nations

The environmentally oriented UN activities, in particular United Nations Environmental Program (UNEP) and the Rio conference in 1992 on sustainable development, have spawned several important geodata projects. Eight chapters of the Agenda 21 deal with the importance of geodata. In particular, chapter 40 aims at decreasing the gap in availability, quality and standardization of geodata between nations [UN1]. The need for global mapping with public access is further emphasized in [UN2].

As a follow-up to the recommendations of the 1987 World Commission on Environment and Development, United Nations Environment Programme (UNEP) established a total of 14 environmental information centers throughout the world, designated to establish and maintain a large and heterogeneous Global Resource Information Database (GRID) [GRID]. The GRID network is "... facilitating the generation and dissemination of key environmental geo-referenced and statistical data-sets and information products, focusing on environmental issues and natural resources. GRID centers typically have the ability, expertise and specialized information technology (environmental data management, remote sensing/Geographic Information Systems) to prepare, analyze and present environmental data and information, which are the basis for reliable environmental assessments." Clearly, the GRID system is deeply depending on world covering geodata to function optimally.

2.3  Location Based Services

In the past years we have experienced a rapidly growing market for Location Based Services. Advances in wireless telecommunications and the availability of the Global Positioning System (GPS) make it possible for mobile devices such as PDAs and cellular phones to determine their geographical location. Based on these advances, a multitude of value adding services are being introduced to the market, such as direction finding, route planning and dynamic yellow pages. According to a market study by Strategy Analytics, Western Europe can expect over $9 billion in revenues by 2005 from location based services on mobile devices [SA]. The Finnish cellular giant Nokia expects location based services to be the fastest growing segment of the mobile phone market in the years to come [NOKI].

It should be obvious that the full potential of location based services would not be reached without available digital maps of the appropriate quality. For further details, see e.g. the ESRI white paper on of the role of geodata in location based services [ESRI].

2.4  Open GIS Consortium

During the last decade, there have been put considerable efforts to develop geodata standards and guidelines, both on national (ex. SOSI [KAR1]), regional (ex. CEN/TC 287 [CEN]) and international levels (ISO/TC211 [ISO]). One of the initiatives with considerable impact is the Open GIS Consortium (OGC) [OGC], founded in 1994. OGC is using approximately the same model as the successful Object Management Group (OMG) [OMG]. The international consortium has more than 220 members from companies, government agencies and universities "... participating in a consensus process to develop publicly available geoprocessing specifications. Open interfaces and protocols defined by OpenGIS© Specifications support interoperable solutions that "geo-enable" the Web, wireless and location-based services, and mainstream IT, and empower technology developers to make complex spatial information and services accessible and useful with all kinds of applications."

The OGC vision is "... a world in which everyone benefits from geographic information and services made available across any network, application, or platform". The vision is implemented by "... delivering spatial interface specifications that are openly available for global use". The work is highly consensus driven, and a set of specifications and recommendations is already being used by vendors and content providers to develop conformant applications and data.

2.5  European Commission

Traditionally, geodata has been produced and disseminated by public sector organizations such as national mapping agencies. The US policy regarding public sector information (PSI) is simple and clear. There are no government copyrights, fees are limited to recouping the cost of dissemination and there are no restrictions on reuse. This is however not the case in most European countries. PSI is in general difficult to locate and access, often overprized and burdened with strict limitations on reuse. This applies in particular to the geographical sector.

In order to facilitate commercial exploitation of PSI, the European Commission has launched the eContent Programme, aiming at "... supporting the production, dissemination and use of European digital content". One of three main strands of action is to "... improve access to and expand the use of public sector information" [EU1]. In the report "Commercial exploitation of Europe's public sector information", it is pointed out that geographical information is by far the single largest class of public sector information (PSI) [EU2]. Great Britain spend roughly 60% of their total PSI investment on the geographical sector. Further, the economic value of PSI geodata was EUR 36 billion in 1999.

3  Supply

It is convenient to separate digital geographic information in two main categories:

Maps
This the typical consumer product, as we know it from e.g. world atlases and street maps. This is the equivalent of "paper map", distributed as a digital image. The map is a static snapshot based on a selection from a richer set of underlying geographic data.
Geodata
The more advanced users, typically found in research, academic, governmental and commercial organizations, often need more than the image that traditional maps really are. In order to perform analyses and make decisions, they need "intelligent" geodata in the form of vector and/or raster representations. Use of geodata requires specialized software, often called Geographic Information Systems (GIS). Traditionally, these systems are quite complex to use, and quite expensive. However, the situation is currently changing dramatically. There is now a wide variety of easy-to-use and cheap (if not free) software aimed at geodata browsing and presentation, and, to a limited extent, analyses. In addition, many existing tools are being "geodata enabled", like spreadsheets, statistical programs and web browsers.

We take a closer look at a few selected services in both categories.

3.1  MapQuest

MapQuest is one of the most popular and comprehensive online map servers. More than 20 million maps are downloaded every day from this site, they have 10 million unique visitors per month and over 2000 business partners. Moreover, every month 1 out of 5 Internet users access MapQuest content [MAPQ]. In addition to generate nice looking maps, there are options for finding driving directions, road trip planning and dynamic yellow pages. Navigation is either by address, airport, zip code, city, area code or geographic coordinates, or by zooming and panning.

There also is a special services for downloading maps to a PDA (Personal Digital Assistant) device. MapQuest also offers a limited world atlas, with nation level topographic maps and key figures like area, population, currency etc. For US destinations, you may choose to display nearby business locations, by selecting the desired categories. It is also possible for the user to incorporate symbols to highlight a chosen location, e.g. a small house to mark the location of your residence. Many of the MapQuest services has truly global coverage. The services are in general user friendly and response time is quite adequate.

3.2  Norgesglasset

Norgesglasset is provided by the Norwegian Mapping Authorities, and is based on raster map series ranging from 1:2M to 1:5K [KAR2]. Thus, you are able to find your own house and the immediate surroundings. The coverage is national, and no data is provided outside the Norwegian border. Navigation is based on zooming/panning and address search. The service is fairly straight forward to use, but the user is barred from directly downloading of the map.

3.3  Digital Chart of the World

This is the public domain global geodata source. It is a comprehensive 1:1M scale vector base map. The 1500 Mbytes of information is organized in 17 thematic layers. The DCW was developed in the beginning of the 90s by an international effort from the agencies producing the Operational Navigation Charts (ONC) map series: the United States Defense Mapping Agency, the Australian Army Survey Directorate, the Canadian Directorate of Geographic Operations, and the United Kingdom Military Survey. They were supported in the DCW design process by more than forty participating agencies. DCW is online available on a variety formats, e.g. from Penn State University Library [PENN].

The National Imagery and Mapping Agency (NIMA) is offering a revised and updated DCW version, renamed to Vector Map Level 0 from their Geospatial Engine [NIMA]. From this site you also have access to a digital terrain elevation data (DTED level 0) with global coverage in addition to various other geodata.

3.4  Massachusetts Geographic Information System

At the state focused Massachusetts Geographic Information System (MassGIS) there are huge amounts of both raster and vector data for free download, ranging from statewide layers to local 1:5K features such as 3 meter (!) elevation contours [MASS].

3.5  NORUT

The Geographical Information Networks Project (GIN) from NORUT, Tromsø, Norway, has resulted in an infrastructure for managing maps stored on a set of distributed and homogenous geodata servers [NOR1]. By exploiting the parallel aspects of distributed storage and processing, they have demonstrated that it is possible to interact with huge sets of geodata in real time. The design and implementation is to a large extent conforming to relevant OGC specifications [OGC]. They have demonstrated the principles by developing both thin and thick map clients [NOR2]. The GIN project has been one of the main sources for inspiration for the ideas developed in this white paper. The XM project is a natural extension and enhancement of parts of the NORUT work.

4  Challenges

As indicated by the presentation of the demand and supply situation, there is obviously room for improvements on the geodata arena.

4.1  Maps

Currently, there is quite a few Web sites offering online maps, and the number is rapidly growing. Thus, it is possible, while not directly easy, to generate and download maps of a wide variety of levels of details and locations. The first problem facing the average map surfer is in fact to choose the appropriate service in the jungle of map sites.

MapQuest and Norgesglasset are typically representatives for two main categories of map services. MapQuest offers global coverage of most of it's services. While Norgesglasset offers street level detail, the best accuracy from MapQuest is city block level in their "main" areas, like the US. In more "remote" parts of the world, like Scandinavia, the best quality is typically on the regional level. This trade-off seems to be a typical problem when implementing online map services. The consequence for the user is that it is very difficult to make detailed maps from areas intersecting two regions or countries. One solution is to try to combine maps from different servers, but this can turn out to be an overwhelming task, calling for extensive post processing like cutting, pasting and resizing of screen shots. The use of different datums and map projections will in fact make it totally impossible to make maps by "cutting and pasting".

Very few of the main stream servers have options for choosing what kind of information to be displayed in the map, i.e. what kind of features or thematic layers to use. Consequently, the average user will not be able to make truly customized maps. This seriously reduces the benefits of using such services.

Surfing map sites can be a confusing and frustrating experience. There are as many interfaces as there are sites, and few are really well functioning and intuitive. Many servers offer too long response times, in particular during peek hours, and not seldom some fail to function at all.

Information on the origin of the map data, quality parameters and other types of metadata are rarely available. For the average user this is a minor problem, but in the cases of more advanced usage, this lack could make the service useless.

Based on these observations, we identify some steps that would imply significant improvements:

  1. Make it easier for the user to find the appropriate service.
  2. Offer large, contiguous and consistent coverage and detail level, preferably global scope down to house level.
  3. Provide options for selecting what kind of information to be displayed, from a rich set of themes.
  4. Make it possible to select datum and projection.
  5. Provide consistently acceptable performance, e.g. decent response times and availability.
  6. Design simpler and more intuitive user interfaces, preferably complying to some common look-and-feel "standard".
  7. Provide access to relevant metadata.

4.2  Geodata

Not surprisingly, the scene of online geodata access resembles that of map services. However, geodata management represents some additional challenges.

There is already vast amounts of geodata freely available on the Web. However, very few sites offers global coverage, the different Digital Chart of the World sites being notable exceptions. Disappointingly, some of the DCW servers offers the global data only in regional or national chunks, which makes it unnecessary difficult to use the data in cross regional or cross national applications. The majority of the sources offer only US related data. Third world geodata is hard to find. Likewise, there is not much geodata available with street level accuracy. An outstanding exception is the MassGIS site, offering statewide 1:5K coverage for many themes.

The data is offered in a multitude of more or less well documented and supported formats. In addition, some of these formats are proprietary and requires use of software from specific vendors. This is a major problem and is effectively barring widespread use of the data. The problem becomes almost unsolvable when trying to combine geodata on different formats. If the formats are based on differing conceptual models, it could turn quite impossible to convert from one format to another without loosing significant parts of the information.

As with the online maps, there is no predominant standard combination of datum and projection used by the geodata sites. This is another major obstacle when trying to fuse data from different sources.

In general, there is a considerable time lag between data acquisition and Web distribution. In addition, the updating frequency is low compared to the development of the market and changes in the real world. Hence, there is a lot of online but outdated information.

To further illuminate the problems, we revisit the Digital Earth (DE) initiative. A prerequisite for DE is a detailed global map. Gore emphasized the map aspect in his initial DE white paper: "In the first stage, we should focus on integrating the data from multiple sources that we already have...Next, we should endeavor to develop a digital map of the world at 1 meter resolution." We now suggest some reasons for the problems of realizing DE. The quotes are from a paper of Mike Goodchild, Digital Earth: A Research Agenda [Good] and from the original Gore white paper [Gore].

  1. Underestimation of complexity of information integration and fusion. Different formats, incompatible quality measures, multiple versions of the same physical features, inconsistencies between adjacent data sets, complex query and retrieval mechanisms etc. constitutes an immense Tower of Babel.
  2. Underestimation of complexity of processing and integration of existing geodata to work in an extreme multiscale setting. Goodchild: "The range of scales implied is over at least four orders of magnitude, from a resolution of 10km that would be appropriate for rendering of the entire globe, to the 1m resolution needed to render a local neighborhood. Cartographers have long struggled with relationships between maps at different scales, but not over this large a range." Further: "DE requires a consistent data structure and indexing scheme that can support zoom over 4 orders of magnitude. The scheme that is optimal for display will likely not be optimal for modeling Earth surface processes and compromises will be necessary."
  3. Lack of public domain data with global coverage and sufficient detail. This applies in particular to the traditional map data that would be the foundation for the DE. Goodchild: "The mapping needed for DE is not fully available. Although satellite imagery can be composited for large areas of the Earth, topographic information varies widely in scale and availability. Thus DE will require the development of a robust global spatial data infrastructure, and the appropriate organizations to coordinate it."
  4. Monolithic and bureaucratic project structures. Imagine if somebody tried to design and develop the Web in a top-down controlled, government funded and democratic manner; clearly, then the Web never would become a reality. As Gore points out: "Like the World Wide Web, it would require the grassroots efforts of hundreds of thousands of individuals, companies, university researchers, and government organizations."
  5. T3: Things Take Time. The idea establishing DE in two or three years was perhaps a little too optimistic. Gore did also warn of this: "Clearly, the Digital Earth will not happen overnight".
  6. The vast amounts of data constituting a detailed global map is a major challenge. Goodchild: "DE requires new techniques for overcoming the limitations of bandwidth. These include new methods of compression, and of progressive transmission of various forms of geographic data. Progressive transmission of vector data is an open research issue."

4.3  Strategy Fragments

We summarize the bottleneck discussion by proposing some strategy fragments for improved public access to global geographic information. Since maps basically are graphical representations of underlying geodata, the problems concerned with online map services are a subset of the geodata problems. Thus, we mainly focus on the latter.

1: One Portal
The first obstacle for the average users of geographic information is to locate and select which source(s) to use. In the EU eContent Programme, the finding and accessing problem is listed as one of the main barriers in exploiting public sector information [EU2]. In general, this is of course a common Web problem. As a choice for making online geodata services easy to find, currently the portal approach seems most appropriate. Other alternatives, like web spiders or automated web services will most likely fail to work properly within the context of geographic information, due to the notorious complexity of geographic information.
One multipurpose portal should be built to provide easy access to detailed geographic information in the form of maps or geodata.
2: User Friendliness
There are no de facto standards for look-and-feel of geodata interfaces. Bad and confusing design seriously reduces the quality of many present geodata services.

Careful attention has to be paid to the design of the different user interfaces in the geodata portal. Common best practices should be identified and followed, hopefully resulting in services that are easy to understand and use, even for first-timers.
3: Free Access
The majority of existing geodata is by far in the public domain, especially outside US. However, there are clear signs of changing policies in this field. An example on this is the European Union eContent Programme [EU1], which strongly promotes policies more aligned with the US traditions. For the national European geodata producers, this will most probably imply radically changed pricing and copyright policies in the years to come.
If possible, all information should be freely available for any use, no strings attached. If not, the user should be thoroughly prepared and informed of the nature of the limitations, prior of starting any process of interacting with the geodata.
4: Standards
Lack of widespread standards is recognized as a major problem when exploiting public sector information [EU2]. Geographical information is highly complex compared to some other PSI data, e.g. social data, thus the lack of standards, or the use of too many and non-compatible formats, is indeed a serious obstacle.

Current standards, de facto and/or de jure should be applied when not representing major impediments. This should not apply only to the geographical content, but also to the infrastructure as a whole, including services, software etc. For specific geodata issues, it is important to be fairly compliant with work of the OpenGIS Consortium and the ISO TC 211. On the more general level, the standards of World Wide Web Consortium (W3C) should be applied.
5: Redundant Distributed Storage
Obviously, the amount of data constituting a detailed global map would by far exceed the capacity of any single location storage device. As the storage technologies develops, the amount of data candidate for inclusion in a world map would likely grow at a similar rate. Hence, the only solution is distributed storage. To ensure full scalability, the storage servers should be distributed on the Web. There is already a number of such virtual storage systems operating, e.g. FreeNet [FREE].
The global map should be partitioned in chunks and distributed over a large set of Web servers. The same chunk should be stored on multiple servers to ensure a high level of fault tolerance.
6: Distributed Processing
Obviously, distributed storage implies distributed processing to a certain degree. The distributed network maintained by distributed.net has currently a combined processing capacity equivalent to 160000 desktop computers continuously working in parallel [DIST]. SETI@home uses a gigantic computers network to analyze deep space signals collected by advanced radio telescopes around the world [SETI]. So far, 3.5M computers have participated, and the grand total of CPU time spent is close to 900K years.
Query and retrieval processes should to a largest possible extent be delegated to the distributed storage servers.
7: Incremental Map Construction
Construction of a detailed world map, to cite Al Gore in his Digital Earth Vision, "... will not happen overnight" [Gore]. The only sustainable and scalable model is to let the map grow from a seed, based on a multitude of contributions.
Additions and revisions should be allowed to be carried out in smaller and uncoordinated steps. Any individual or organization should be welcomed to submit contributions. The submissions should be cleared by peer review processes before incorporated in the map.
8: One Map
The principles of distributed map storage and processing have been explored in the GIN project (section 3.5) [NOR1]. The approach in this project was to integrate independent geodata sources, originally implemented as stand alone databases, following standard interfaces and cataloguing principles defined by OGC [OGC]. However, there are some potential problems with the GIN approach.

Consider the following (constructed) case. We are interested in the Skagerak shoreline, i.e. as defined by the coast of Southern Norway and Southwest of Sweden. The GIN infrastructure would perhaps discover three relevant geodata sources, some Digital Chart of the World database with the entire global coastline at level of detail corresponding to 1:5M and the Norwegian respectively Swedish mapping authorities databases. The national databases have only national coverage, except for some overlap at border zones. Let us further assume that the local coastlines are modeled with different levels of accuracy, e.g. 1:50K versus 1:10K. In addition, two different projections are used, again based on different datums. There are now two choices; 1) Use the global shoreline with little detail, or 2) Merge the two local datasets, both with considerable more detail. By choosing the last alternative, geometrical inconsistency problems such as gaps and overlaps would most certainly arise. In addition we would probably experience problems related to different sets of attributes describing the two pieces of coastline. The alternatives are then consistent geodata with little detail or inconsistent data with better but varied accuracy.

This is just an indication of the many kinds of problems encountered when trying to merge inconsistent pieces of geodata. There are ongoing efforts in academia, industry and standardization organizations aiming to solve this very complex integration problem. However, the author is of the opinion that these integration problems can only be efficiently solved if consensus on a common geodata model is reached at some basic level, and if all data involved in a system is harmonized according to the chosen model.

To guarantee a consistent map, each thematic layer of the map must be modeled as with global extent. All generalized (less detailed) versions of a layer should be derived from one single layer, the one with the best quality.
9: Additional Services
Many believe the next Web technology wave will be caused by Web Services technology [HOL]. Thus, it is important not only to offer data and additional services to users, but also to other applications.
Additional services to users and applications should be available, following the emerging Web Services guidelines and standards.
10: Open Project
Clearly, construction of a huge geodata repository is not a task for any single organization or project. Thus, a project aiming at constructing a detailed global map should only act as a loosely coordinating umbrella organization allowing a multitude of collaborating contributors.
Construction and maintenance of XM should be loosely coordinated in a consensus driven project based on peer review and open content/open source principles.

5  One Map

As indicated in the previous sections, there is a substantial need for detailed geodata and maps with global coverage and free access. The required technology to handle huge and complex data sets seems to be more or less available, along with widely adopted standards and guidelines for storing and transferring geodata. However, relatively little progress have been made in actually implementing such services. Based on the strategy fragments in last section, we describe an infrastructure, One Map, (XM), based on Distributed Geodata Management (DGM). There are three main infrastructure components constituting XM, the Gateway (access), the Clearinghouse (construction/maintenance) and the Repository (storage).

5.1  Gateway

The Gateway will be the main entry point for the consumers of the XM services. Through the Gateway, users should be able to:

In addition, the Gateway should offer:

5.2  Clearinghouse

The Clearinghouse is the focal point of all activities concerned with building, updating and revising the XM geodata.

5.2.1  Distributed Geodata Management

Maps, both analogue and digital, are commonly constructed and maintained by large organizations such as national mapping agencies, defense departments and surveying companies. Typically, each map, or map series, is constructed during a project. Once finished, the map remains static until a revision. A revision may only include minor updates or imply a total reconstruction. Map management is typically governed by high level decisions by some governmental, federal, military or commercial body. In other words, this is a typical top-down and hierarchical process, where the very important decisions on where, what, when, in what detail and how to map is taken by a small number of high level decision units, counting relatively few people.

The XM Clearinghouse will support an orthogonal approach, a bottom up construction and maintenance process, termed Distributed Geodata Management (DGM) which is the core concept in the XM infrastructure. The main rationale for the strategy is based on some facts and assumptions:

We now take a closer look at the processes involved to construct and maintain the geodata Repository.

5.2.2  Geodata Clearing

The XM Clearinghouse will be the main interface between geodata producers and XM. It will coordinate and facilitate incremental construction and maintenance of the XM geodata stored in the distributed Repository. The Clearinghouse should be a consensus driven service, based on peer review principles and relatively self-organizing with minimal top-down coordination.

We use the term author of any person or party submitting or having submitted geodata to XM. Note that the author term will also apply to those submitting data already in the public domain. A sheet is a self contained geodata unit. Each sheet should only contain geodata belonging one single feature type. The feature types are described and defined in the XM Feauture Catalog. A sheet may contain either geometric or thematic information, or both. Sheets may have arbitrary geographic extent, ranging from micro scale to global coverage. The concept is illustrated in Fig. 1.

Data clearing
Fig. 1: Authors and sheets

There are basically four types of submissions, which all require more or less complicated referee processes:

1: New instance of existing feature type
As an example, this could be a newly built school building, and the inclusion in the Repository should be fairly straight forward. The referees are selected from authors of feature instances in the vicinity of the new building.
2: New feature type
If the feature type of the submitted sheet is not present in the Feature Catalog, the author will be notified and asked to give more detailed description of the new feature type. The information will be dispatched to a set of peers with a call for advice on if and how to include the new type in the Catalog.
3: Refinement of existing feature
In many cases, this may involve a fairly complex referee process, see the following example. Corrections are considered as special cases of refinement.
4: Updating of existing feature
Updates are temporally based changes. This could be a new annex to an existing school building. Both geometry and attributes of the original feature have to be changed. The Repository should be able to represent temporal changes in order to retrieve features based on point or interval in time.

5.2.3  Example

We briefly illustrate a typical refinement submission:

  1. An author submits a sheet with a part of the global covering feature type Coastline. The content is equivalent to the outline the Manhattan Island, NY, in a scale of approximately 1:70K, see Fig. 2A.
  2. The Clearinghouse requests the Repository and gets the existing version of the Coastline feature, with approximately 1:250K accuracy, Fig. 2B.
  3. By matching the submitted sheet with the existing data, Fig. 2C, the modification steps are identified and implemented. In this case, this means altering the old coastline to fit the new island, Fig. 2D.
  4. The Clearinghouse then makes a draft integration by "merging" the submitted sheet with existing data, Fig. 2E.
  5. The draft is dispatched back to the contributor and to the authors that are affected by the changes of the "old" coastline.
  6. The group of authors start a per review process which leads to consensus on if and how to accept the submitted sheet.
  7. The final version of the submission is sent back to the Clearinghouse, and is incorporated in the map.
  8. The Repository is updated by distributing patches to the affected storage servers.

Coastline Refinement
Fig. 2: Coastline Refinement

Please note that this process is far from trivial. Methods and tools are not readily available to fully support such integration processes, thus there is a need for some dedicated R&D efforts.

5.3  Repository

The vision of a public accessible virtual world map with great detail and large coverage inevitably raises some serious technical challenges, mainly concerned with the size of the information involved and the processing resources required to handle a massive number of complex request and retrieval transactions. An inherent characteristic of geodata is that there is a never ending increase in both demand and supply. This calls for a totally scalable infrastructure, being able to digest a constantly growing content and access frequency.

The volume of geodata constituting a 1 meter resolution world map is bound to exceed by far the capacity of the most powerful existing single location mass storages. The only solution to this problem would be distributed storage, i.e. that the data in some way is tiled to manageable chunks that could be stored on a set of servers coordinated over the Web. An important requirement is that it would be possible to retrieve the scattered information without loosing precision or consistency. The distributed storage infrastructure is theoretically fully scalable, i.e., by constantly adding new storage servers, there will be practically no limit to the amount of data managed by the system. If in addition the chunks are stored redundantly on multiple servers, the infrastructure would become highly fault tolerant.

Redundant Distributed Storage
Fig. 3: Redundant Distributed Storage

A typical use of the map would be zooming in on a smaller area, selecting type of information to be displayed, retrieving the data in a certain resolution etc. All these queries and retrievals are inherently computationally intensive. Thus, the majority of processing should be carried out by the distributed servers. In addition, the distributed infrastructure will in practice act as a parallel computing environment, thus potentially speeding up the query and retrieval requests.

5.4  Project Profile

The One Map Project will basically be designed as an open inter-academic project. Non-academic participants are also welcomed provided they are willing to follow the project guidelines. One Map will be consensus driven and based on formal and informal peer review processes. The structure of the World Wide Web Consortium (W3C) may serve as a good example to follow [W3C].

All content developed in the project should be considered Open Content, e.g. distributed under the Open Content License v1.0 [CONT]. Likewise, all software developed in the project should be considered Open Source, following e.g. the OpenSource guidelines [SOUR].

Relevant open standards and guidelines, both de jure and de facto should be followed, provided they will not represent major impediments in reaching the overall goals. In the case of lack of standards, guidelines or best practice, the project should take initiatives in the direction of establishing such.

The project will be broken down in a number of work packages called threads, which will be loosely synchronized. One Map is considered an ongoing effort to achieve its main goals. Thus, there will be no detailed plans for when to start or complete the different threads.

The One Map Project will support a bottom up construction and maintenance process termed Distributed Geodata Management. Any person or party will be allowed to contribute geodata in order to expand, refine or revise the virtual world map. To ensure consistency and quality, each geodata submission will be subject to a peer review process before accepted. The reviewers will typically be a relevant selection of "owners" of previously accepted data.

6  Current Status

The "official" project start is scheduled to early autumn 2002. A pre project has been running for six months with the main goal to investigate new technologies and carry out smaller feasibility tests. One of the results of this work is an alpha version of the Gateway and the Repository.

6.1  Gateway 0.1

The first version of the Gateway is a barebone JSP (Java Server Pages) based thin web client. It is a simple web map browser, where the main map window is a SVG plugin. Interaction is limited to zooming, manipulation of colours via the legend and simple feature query. A smaller overview window is also supplied in order to simplify the zooming process. In addition to the SVG rendering, the user may also download the corresponding GML file.

A zoom action will spawn an http request to the server, which will respond with on-the-fly generation of an SVG file for the main map window (and also an overview file if needed). The SVG file is generated by XSLT convertion of a corresponding GML file. This GML file is constructed based on the request parameters defining area of interest, a set of feauture types and resolution level, similar to the GetMap request in the OGC Web Map Services protocol.

The request parameters are retrieved by accessing the SVG DOM with Java Script procedures. Manipulation of legend colors and simple feature queries are also implemented in the same way.

Gateway client
Fig. 4: Gateway client

6.2  Repository 0.1

The Repository is conceptually a server. The server function is distributed on a set on sub servers. The sub servers are of two categories. The "Assistant" servers accept requests from a central "Dispatcher" server, which again delegates the data retrieval process to "Storage" servers. The central server treats requests from user clients, currently limited to Gateway 0.1.

When the Dispatcher receives a request from a client, it will decide which Assistant to forward the request to. The decision is based on availability (if the Assistant is online or not) on current workload (idle Assistants are preferred) and performance parameters (mainly processor speed, internal and external memory).

All Assistants carry an inventory of all the Storage servers. Each Storage server stores one or more of geodata fragments. All together, the Storage servers constitutes the "One Map", i.e. one global, multiscale, seemless and consistent map.

The communication between the Dispatcher, Assistants and Storage servers is SOAP (Simple Object Access Protocol) based. In addition is in fact all information content and messages passed as XML documents.

Repository architecture
Fig. 5: Repository architecture

The current hardware configuration consists of a main server and three sub servers. Each pysical server is hosting both Assistants and Storage servers in order to simulate a larger and distributed network. The main server is a mid range PC workstation, and the sub servers are older low end desktop PCs. The One Map repository software is implemented in Java, and all required server software is also Java based. This makes it trivial to install and run the servers on different platforms. Currently our servers runs on both Linux and Windows.

6.3  Data

The initial population of One Map is the (free) full resolution version of the DCW (Digital Chart of the World) from [GSHHS]. Currently only the coastlines are available. The original data set consists of around 10 million points and 200.000 polygon features.

The population procedure started with writing a parsing utility for transforming the original data to GML. The parser is a combination of Unix scripts and Java software. The GML data was then generalized to a set of files corresponding to 20 precision levels, ranging from 1 meter resolution as the best resolution to around 5 degrees as the coarsest. In order to ensure consistency through the different levels and after updating procedures, it is important to select an appropriate generalization method. In this case we used a very simple "grid approximation" procedure. Each point in the dataset is "snapped" to a a grid corresponding to the wanted resolution. Redundant (equal) points are then removed, and features with an extent less than the resolution are removed. Some additional thinning steps are also carried out. This is an extremely simplistic form of generalization, but has a number of very interesting properties.

Resolution Levels
Fig. 6: Resolution Levels

Each resolution file was then partioned into a set of smaller files, where each file size is below a given threshold. The partition is quadtree based, and the resulting files was organized into a quad tree directory/file structure. The quadtrees were then partioned into subtrees which again was distributed to the Storage servers.

Geodata Fragments
Fig. 7: Geodata Fragments

7  Some Remarks

The GML specification from OGC is one member of a large family of related standards, formats and tools. The full potential of GML will only be released when combining it with matching XML based technology. Using GML software just as a final formatting utility will not make wonders. The One Map Project is an example on an environment built on principles of XML and web services. So far, we have experienced that the XML setting has been significantly speeding up the development process. The main reason for this is that we use same tools and patterns for a multitude of different purposes, from data parsing and conversion to user interfaces. It also makes it easier to reuse software, both home made and public domain.

There is one (obvious) potential pitfall with XML oriented development, and that is the performance issue. XML documents are inherently verbose (and ascii!), and may cause severe overhead in both storage needs and processing time. In the One Map project we try to avoid these problems as far as possible. The distributed storage strategy is for example one way to deal with the volume problem. By exploting the parallell potential in the distributed retrieval and processing we also are able to speed up processing.

The open nature of XML and the availability of open source tools makes this an ideal basis for the One Map Project, which in many ways may be considered a "grass root" project. The use of expensive and complicated GIS software would make it difficult to realize the project. So far, not one penny has been spent on software licenses. All tools used, from editors to vizualition systems are open source and/or free. We hope this will encourage people to participate in the project.

8  References

[CEN] The Geographic Information European Prestandards and CEN Reports, European Committee for Standardization,
http://comelec.afnor.fr/servlet/ServletForum?form_name=cForumPage&session_id=0.4245179346392154&file_name=Z13C%2FPUBLIC%2FWEB%2FENGLISH%2Fpren.htm
[CONT] OpenContent License v1.0, opencontent.org,
http://opencontent.org/opl.shtml
[DIST] distributed.net, distributed.net,
http://www1.distributed.net/index.html.en
[ESRI] What Are Location Services? The GIS Perspective, Environmental Systems Research Institute, Inc.,
http://www.esri.com/library/whitepapers/pdfs/gis_and_location.pdf
[EU1] THE eCONTENT PROGRAMME, European Commission,
http://www.cordis.lu/econtent/home.html
[EU2] Commercial exploitation of Europe’s public sector information - Executive summary, Pira International Ltd., University of East Anglia and KnowledgeView Ltd.,
ftp://ftp.cordis.lu/pub/econtent/docs/2000_1558_en.pdf
[FREE] The Free Network Project, The Free Network Project,
http://freenetproject.org
[Good] Digital Earth: A Research Agenda, Michael Goodchild,
http://www.digitalearth.net.cn/de99paper/Class1/Michael%20F.%20Goodchild.doc
[Gore] The Digital Earth: Understanding our planet in the 21st Century, Al Gore,
http://www.digitalearth.gov/VP19980131.html
[GRID] GRID centres around the world, United Nations Environment Programme,
http://www.grida.no/about/nodesjs.htm
[GSHHS] GSHHS, A Global Self-consistent, Hierarchical, High-resolution Shoreline Database ,
http://www.ngdc.noaa.gov/mgg/shorelines/gshhs.html
[HOL] The Wide World of Web Services: The Next Frontier, Steve Holbrook,
http://www.developer.ibm.com/multimedia/holbrook.pdf
[ISO] ISO/TC 211 Geographic information/Geomatics, International Organization for Standardization,
http://www.isotc211.org
[KAR1] SOSI Standarden (in Norwegian), Statens Kartverk.
http://www.statkart.no/standard/sosi/html/welcome.htm
[KAR2] Norgesglasset (in Norwegian), Statens Kartverk,
http://ngis2.statkart.no/norgesglasset/default.html
[MAPQ] MapQuest, MapQuest,
http://www.mapquest.com
[MASS] Massachusetts Geographic Information System, Massachusetts Executive Office of Environmental Affairs,
http://www.state.ma.us/mgis
[NIMA] Geospatial Engine, National Imagery and Map Agency,
http://geoengine.nima.mil
[NOKI] Mobile Location Services, Nokia Mobile Phones/Nokia Networks,
http://nds1.nokia.com/press/background/pdf/mlbs.pdf
[NOR1] Geographical Information Networks, NORUT Information Technology Ltd.,
http://www.itek.norut.no/gin
[NOR2] GIN Demos, NORUT Information Technology Ltd.,
http://www.itek.norut.no/gin/demo.htm
[OGC] Open GIS Consortium, Open GIS Consortium, Inc.,
http://www.opengis.org
[OMG] Object Management Group, Object Management Group,
http://www.omg.org
[PENN] Digital Chart of the World Server, Penn State University Libraries,
http://www.maproom.psu.edu/dcw/
[SA] PRESS RELEASE 25 February 2000, Strategy Analytics,
http://www.strategyanalytics.com/press/PRCR005.htm
[SETI] SETI@home, SETI@home,
http://setiathome.ssl.berkeley.edu
[SOUR] The Open Source Definition, opensource.org,
http://opensource.org/docs/definition_plain.html
[UN1] Information For Decision-making, United Nations, Agenda 21, chapter 40, United Nations Conference on Environment and Development, Rio de Janeiro, 1992,
http://www.igc.org/habitat/agenda21/ch-40.html
[UN2] Programme for the Further Implementation of Agenda 21, United Nations General Assembly,
http://www.un.org/documents/ga/res/spec/aress19-2.htm
[W3C] World Wide Web Consortium, World Wide Web Consortium,
http://www.w3.org