Annotating Mobile Multimedia Messages with Spatiotemporal Information

Gunnar Misund and Mats Lindh

Faculty of Computer Science, Østfold University College, Halden, Norway

Abstract

Mobile messaging started with Short Messaging Service (SMS). Multimedia Messaging Service (MMS) was recently introduced, allowing users to send and receive messages composed of text, audio, images and even movie clips. In this paper, we propose a new step in the evolution of mobile messaging by introducing the Annotated Multimedia Messaging Service (AMMS). AMMS is in short an MMS document augmented with metadata that facilitates efficient structuring, storage, search and retrieval of mobile media content. We currently restrict the metadata to describe time and place, and show how to include this information in an MMS document by using the existing specification without changes. We suggest some applications that may be realized on the foundations of this simple, yet powerful, concept, both in the personal market and in the professional domain. A prototype framework called Been-There-Done-That has been implemented, where users can generate AMMS documents and upload them to an Internet server. The messages are accessible through a web based map application.

Keywords: Annotated Multimedia Messaging Service, Metadata, Mobile Content Creation, Mobile Positioning, Smart Phones, Web Map Service.

I. Introduction

The mobile phone has become an integral part of everyday life in many countries. In 2002, the number of cellular subscribers in Taiwan reached 22.6 million, meaning that there is more than one handset for each person (Possi, 2004). From being simple voice call devices, cell phones have developed into advanced multimedia communication tools. High-end mobile phones, often called smart phones, have the storage and processing capabilities like a five years old desktop computer (P900, 2003). Smart phones run full-blown operating systems facilitating development of a multitude of third party solutions, such as games and infotainment applications. Protocols and standards facilitate access to Internet services, and enable the users to roam freely, independent of local operators. At last, but not least, contemporary handsets offer a range of multimedia features, like built-in cameras, camcorders and audio recorders.

The Short Messaging Service (SMS), which allows subscribers to send and receive short textual messages, was first demonstrated in 1992 (Karuturi, 2002) and was after a while embraced by the users and ignited an explosion in network traffic. In 2002, approximately 30 billion SMS messages were sent globally each month (GSM, 2004). The SMS protocol was later extended to include transfer of small pictures, sounds and animations. This version of SMS is known as Enhanced Messaging Service (EMS).

SMS and EMS were originally designed for use in Global System for Mobile Communications (GSM) networks, also referred to as second-generation (2G) mobile networks. It is currently the most widely used system, and the bandwidth is around 10 kbps. The specification of a faster network system, General Packet Radio Service (GPRS), was released in 1997 as an additional GSM service. The speed is typically 150 kbps, or more than 250 kbps in the case of the enhanced GPRS service (EDGE). GPRS is available from most GSM operators, and is often characterized as 2.5G. The third-generation networks (3G) offer a significant jump in bandwidth, ranging from 150 kbps when driving a car and up to two mbps for stationary usage (Oney, 2001). 3G networks were first rolled out in Japan 2001 (CNN, 2001), and presently there are more than 132 million 3G subscribers worldwide (UMTS, 2004).

The advent of faster networks, together with the boost in capabilities and performance of handsets, led to the specification of the Multimedia Messaging Service (MMS). MMS is a natural evolution from text messaging, and makes it possible to exchange messages with rich media content such as graphics, images, audio and video (3GPP, 2003). An MMS message has a number of slides and may be viewed as a “PowerPoint-style” presentation on the mobile device. The first MMS service was launched in Norway in 2002 (ITU, 2004). MMS usage is growing fast. In second quarter of 2004, 20 million MMS messages were sent in Norway, or approximately five messages for each inhabitant, as opposed to 6 million in first quarter (VG, 2004).

Most MMS applications assume that the users are producing simple MMS documents, like a single image or a single video clip, optionally followed by a text note. Typically, the users are invited to take advantage of some central server, offering functionality for storage, structuring and presentation of mobile snapshots or small video clips. An example of such “your personal mobile album on the web” is the award-winning Foneblog from NewBay (NewBay, 2003). The majority of these types of services are structuring the committed MMS documents according to one single metadata property, however perhaps the most significant one; namely the time of capture.

In this paper, we take advantage of the new and coming generations of mobile handsets, high-speed networks and open formats and standards. Our main contribution is a simple enhancement of the MMS specification (CMG, 2002) which we have termed Annotated Multimedia Messaging Service (AMMS). We review some of the previous work on multimedia and metadata relevant to MMS applications in Section II. The specification of the Annotated Multimedia Messaging Service is given in Section III. We then outline some scenarios based on the AMMS concept. In Section V we describe a proof-of-concept implementation called “Been-There-Done-That”, and finally give some concluding remarks.

II. Mobile Multimedia and Metadata

Metadata is essentially additional information about some artifact. Metadata facilitates organizing, searching, browsing, and sharing, see for instance (Naaman, et al., 2004a), where the role of metadata in searching large image repositories is investigated. Metadata management has for a long time been an important aspect of the work in museums, libraries and archives. The rise of the World Wide Web has added another important arena for metadata management. The problems of searching and browsing the massive amounts of information on Internet are the main focus in the large field of research and development referred to as the “The Semantic Web”. The cornerstone of this effort is the Resource Description Framework (RDF), in essence a tool for describing and interchanging metadata about web resources. RDF is frequently used together with a multipurpose, general metadata specification called The Dublin Core Metadata Initiative (DCMI, 2003). For an overview of the semantic web, RDF, DCMI and related topics, see (Miller, 2004). The semantic web relies heavily on the use of Extensible Markup Language (XML) as the underlying metalanguage.

The authors are of the opinion that consistent use of metadata is a key to successful deployment of future MMS applications. One of the reasons for this is that it is far more difficult to handle multimedia content than traditional (well-)structured information such as text-only data. However, little, if any at all, attention has been paid to MMS metadata management. As a backdrop for our proposed specification of annotated multimedia messages, we outline some relevant specifications and then review some work concerning generation of media metadata.

Many metadata standards and frameworks may be applied to multimedia. However, most of them are designed for use in museum collections, libraries and archives of analogue media items. In the following, we briefly outline a few research and standardization initiatives supporting digital media metadata.

Not surprisingly, the major bulk of media metadata work seems to be applied to still images, as for instance the “DIG35 Specification” from International Imaging Industry Association (I3A, 2001). This standard defines a flexible XML based framework to describe location, date and time of capture, focus distance, light levels, GPS location, image type, copyright, subject matter, etc.

Another large area of metadata research concerns audio and video, where the major projects are conducted under the Moving Picture Experts Group (MPEG) umbrella. MPEG is a working group of ISO/IEC in charge of the development of standards for coded representation of digital audio and video. Their MPEG-7 specification, “Multimedia content description interface” (ISO/IEC, 2002), offers a rich XML vocabulary for providing metadata covering both content and context. MPEG-7 is an approved ISO standard. It is highly complex, and it is difficult to find examples of real-life use. Another video metadata specification, far more simple in use than MPEG-7, is the “Dublin Core Application Profile for Digital Video” from the Video Development Initiative (ViDe, 2001).

The above-mentioned initiatives all target a single media type, and there are not many projects covering mixed media. The MMS is indeed an example of mixed media, combining audio, video, images and text. MMS uses a subset of Synchronized Multimedia Integration Language (SMIL) as encoding and presentation (3GPP, 2003). SMIL is an XML grammar similar to HTML. It is essentially a way of choreographing rich, interactive multimedia content for real-time presentation over the web and over low bandwidth connections (W3C, 2001).

SMIL comes in two flavors, SMIL 1 and SMIL 2, the latter an extension of SMIL 1. Both versions offer metadata capabilities. SMIL 1 has a meta element with two attributes, a name and a content. There may be several meta elements in one document. The element is generic in the sense that users may define their own metadata properties. Each property has a corresponding meta element with a given name/content pair. The following XML snippet states that there is a metadata type termed “Creator”, and that the value of the type is “Mats”:

This is a very simplistic way of modeling metadata, and it is not supporting interoperability. Without detailed knowledge of the application that generated the SMIL document, it is impossible for external components to treat the annotations properly. In addition, this is an example of “bad” XML modeling. There is widespread agreement on not using the attributes of the elements to carry essential information, but rather embed it in the content of the elements.

SMIL 2 offers a more advanced metadata model. In addition to the SMIL 1 meta element, there is also a metadata element. This element is a container for well-structured information formatted according to the Resource Description Framework (RDF) model and the element set defined by the Dublin Core Metadata Initiative (DCMI). This enables a growing number of Semantic Web based applications to read and “understand” the metadata information.

It is one thing to find an appropriate metadata specification or standard; another issue, perhaps a more difficult one, is to generate the actual metadata. In traditional media collections, annotations are added after the time of creation, and typically in an archiving context. For personal media applications, such as private digital image collections, growing with hundreds of entries each month, this might become an overwhelming and daunting task.

Recent work has been focusing on using time and location of capture as major metadata components for digital images, see for instance (Naaman, et al., 2004b), where two systems for metadata based browsing in large photo collections are compared. Two of these projects work with advanced frameworks for semi-automatic metadata generation for digital images, the LOCALE system from Stanford University, California (Naaman, et al., 2003), and the MMM (Mobile Media Metadata) from Berkeley, California (Davis, et al., 2004). In the LOCALE project, the users have digital cameras connected to GPS devices so that each image is stamped with time and position. In addition, each snapshot is labeled with an arbitrary string that the user may or may not fill in. When empty labeled images are uploaded to the storage server, the system automatically assigns a label based on other photographs taken in the same area. The MMM system manages metadata for images from mobile camera phones. The initial annotations are username, date, time and the cell ID retrieved from the GSM service provider. Then the system searches a central storage server for images with “similar” metadata, and suggests additional annotations. The process is iterative such that the metadata may be refined in a user/system loop.

The major bulk of research and development in media metadata management is focusing on the personal market segment, typically snapshot collections. In addition, only simple multimedia content is considered, the overwhelming majority being single images. Work on complex multimedia documents, such as messages composed of images, video clips, voice notes and text is non-existing in the literature, as to the knowledge of the authors. Further, examples on work targeting professional applications seem to be scarce.

III. Annotated Multimedia Messaging Service

An MMS message is coded by using a subset of SMIL (see Section II). In order to ensure interoperability among different network operators and handset vendors, a group of major companies has published a conformance document describing the subset of SMIL allowed in MMS (CMG, et al., 2001). One of the optional components of an MMS document is the meta element, as defined in the SMIL 1 specification. The conformance document does not include the richer RDF based metadata module in SMIL 2.

In this work, we restrict the metadata to describe the most essential information, where and when. Several different technologies make it possible to retrieve the position of a given cellular unit (Chan, 2003). There are reasons to believe that all mobile phones in the near future will be location enabled. As an example, the introduction of the enhanced 911 system (E911) implies that in the near future, all US wireless carriers must provide precise location information for all emergency calls from mobile phones (FCC, 2004).

We define an annotated multimedia message (AMMS) as an MMS document augmented with spatiotemporal metadata. One of our design goals is to take full advantage of existing MMS applications and services, and the only way to do this is to make sure that the AMMS is fully compliant with the current conformance document. As pointed out in Section II, the metadata capability of SMIL 1 is inferior to SMIL 2, but to keep the AMMS conformant, our only option is to use the meta element and its attributes to carry the necessary spatiotemporal information. The three proposed meta elements are defined as follows:

Name Content

time The time of the recording, given in the same notation as used by existing e-mail applications.

place A text string describing the place where the MMS was composed. This name may later be resolved through a gazetteer service to an actual position.

position A set of one or more coordinate pairs describing a point or a track somewhere in the world. Currently we assume geographic coordinates and WGS84 datum. One or more white space characters delimit each coordinate pair, and a comma separates each two coordinates.

An example of an AMMS follows, where we have expanded the sample document from the MMS conformance specification with spatiotemporal annotations:

<smil>

<head>

<root-layout width="352" height="144" />

</layout>

</head>

<body>

</par>

...

</par>

</body>

</smil>

When embedding the metadata as specified, any MMS compatible device may treat the AMMS as a standard MMS, by just ignoring the additional meta elements. On the other hand, all AMMS enabled applications must understand the metadata and be able to process the spatiotemporal content.

We strongly recommend that the next version of the MMS conformance document should include the SMIL 2 metadata module, which would bring MMS metadata management a significant step forward.

IV. AMMS Applications

Annotated multimedia messages may be used in wide variety of application. We are of the opinion that additional metadata, in particular spatiotemporal information, may lay the foundation for novel and interesting usage. Up until now, MMS services have in general targeted the private sphere, in particular picture messaging and mobile gaming. However, by enhancing MMS messages with significant metadata, many professionals might consider this as a useful tool in their daily work.

Apart from picture messaging, i.e. when subscribers take a picture and upload it to a central server, the majority of MMS applications pushes content to the users, like games and advertisements. In the following overview of AMMS scenarios, we will focus on the user as a content creator, rather than a content consumer.

Blogging

The most obvious usage of the AMMS is as input to a web log. A web log is a semi-interactive personal diary that a user keeps on his or her website. The blog (as they are called in short form) is updated with small notes about what goes on in that person’s life (Blogger, 2004). Recently the traditional blogs have been extended to handle richer content by adding multimedia elements, as outlined below.

Photo Blogs

A photo blog builds on the same concept as a regular blog. The photo blog does however put images in focus, and an entry in the blog is rather a single photograph with comments. Photographer Emese Gaal (Gaal, 2001) hosts one prominent example.

Moblogs

While regular blogs usually are operated from a personal computer, there is a trend to extend the blogging activity to include messages from mobile phones. The main point is still the same; persons create posts about what happens in their every day lives, take pictures of views and events that interest them and so on. The main difference is that everything is operated from a cell phone, but the blog is still viewed from a personal computer. The concept is not widespread at the moment, but is apparently catching momentum. Nokia has developed a tool called Lifeblog, a sort of a scrapbook used to keep track of photos, videos, text message and other mobile media uploaded to a website (Nokia, 2004). Joi Ito maintains an informal site about the status of mobile blogging (Ito, 2004). One (of many) commercial actors in the field is New Bay Software with their award-winning FoneBlog service (Newbay, 2003).

Location Blogs

The natural evolution of blogs and the ever-increasing development efforts put into these projects have resulted in location aware blogs. In addition to being able to store multimedia content, they offer functionality for storing the location of where the content was recorded. The JPEG image format makes it possible to store the geographical coordinates of the moment of capture, provided manually or by the device in real time. Several systems exploit this option, for instance WWMX (Toyoma, et al., 2003) and WaveBlog (WaveMarket, 2003). In addition a system called Blogmapper has been developed, which provides additional location services to already existing blogs (Harlan, 2003).

AMMS Blogs

Obviously, the AMMS concept may support advanced blogging applications, combining the functionality of traditional blogs, moblogs and location blogs. In Section V, we present an enhanced mobile blogging framework, called Been-There-Done-That.

Field reports

While advanced blogging mainly targets the personal market segment, it is easy to apply the AMMS concept to various professional domains. A typical application area is field reporting. We briefly outline two candidate cases.

Power line maintenance

Recently a prototype has been presented where power line malfunctions are reported using a helicopter, a PDA (Personal Digital Assistant) and a digital camera. The camera is transferring images to the PDA, and a description of the situation is recorded using a registration form on the PDA. Finally, the complete report is transmitted to the headquarter using a GPRS enabled network (Langdahl, 2004).

In our opinion, this procedure would be greatly simplified using the AMMS concept. First of all, only one recording device would be needed, and the standardized AMMS message would make it simpler to build a modular system, in particular at the server/browser end. However, it would be necessary to extend our proposed simple metadata capabilities to include error messages relevant to the power line case. This could easily be achieved by using the SMIL 2 specification and its RDF/DCMI extension (see Section II).

Biodiversity research

A group at Stanford University, California, has used a location blogging tool to archive records of plants in the surroundings of the Rocky Mountain Biological Laboratory (Garcia-Molina, et al., 2004). The browser uses a wide range of metadata to facilitate easy access to the photographs of the plants and their environment. However, the majority of the additional information is added manually in retrospect, including the location. Clearly, a more automated approach, for example by using the AMMS protocol, would improve and simplify the registration procedure.

Where R U

As an educated guess, a large number of all mobile conversations contain one or two questions of the type “Where are you?” By applying the AMMS principles, such questions could be answered very precisely. Assume the recipient of a request takes a snapshot of her surroundings, and then sends it as an AMMS to a “Where R U” server, together with the calling number of the asking unit. Then the server would retrieve the location and produce an appropriate image map with the help of a WMS server (OGC, 2002). Then it would forward the map and the snapshots to the person asking. Clearly, a map and an image (or more) of the location would, in general, be much more informative than a verbal description.

Mobile journalism

Journalists have gradually entered the digital domain, and are now using digital cameras (stills and video), digital audio recorders and digital communications such and GSM, GPRS and WLAN. However, the current way of working is characterized by transferring disparate and unsynchronized items covering the same event. Considering the fact the capabilities of handheld devices soon will satisfy the newspaper (and to some extent radio and television network) standards with regard to technical quality; it is easy to image the mobile unit as the main, not to say the single, tool for a field reporter. In fact, taking full advantage of the SMIL capabilities, a news AMMS might provide a full story of an event, including the location. When received at the news desk server, the story could be automatically converted to fit various channels, such as conventional newspapers, online newspapers, radio and television, and, of course, be accompanied with an informative map produced by a WMS server.

Collaborative mapping

While most of today’s maps are generated and provided by central mapping authorities, small groups of people or individuals see the need for a more “grass-root” oriented mapping. Building and updating maps from a collection of GPS points and annotations is a conceivable concept. Figure 1 shows a map of the walkways in a park in Gävle, Sweden, generated by the experimental Been-There-Done-That application presented in Section V.

Figure 1: Collaborative mapping

Another, and more elaborated example, is the Amsterdam Real-Time project. During a two months project, a number of Amsterdam citizens were equipped with portable devices connected to GPS receivers, and their movements were traced (Figure 2):

“This way an ever-changing, very recent, and very subjective map of Amsterdam will come about” (Waag, 2002).

Figure 2: Amsterdam realtime (from Waag, 2002)

One of many rationales for the community mapping approach is the fact that in some parts of the world, it is difficult and/or expensive to obtain good and accurate geospatial data. Collaborative mapping makes it possible to generate highly customized maps, taking in account the specific location and particular needs of the persons involved.

Mobile games

One of the major areas of large revenue expectations is mobile gaming. By adding the spatiotemporal aspect, the expectations should grow even larger, in particular if one does not restrict the domain to the large bulk of “shoot-and-kill” applications. In the following, we briefly outline an application under development by the Norwegian Automobile Association (NAF) together with a major GIS vendor, Bravida Geomatikk (Høseggen, 2004). For a long time NAF has provided their members with a “Michelin Guide” type book of maps and roadside information like restaurants, tourist attractions, scenic routes etc. Gradually, the content of this guide has become digitally available, and the organization wants to exploit the digital content beyond the services offered in the hardcopy version. One proposal is to design a suite of “back seat games”, mainly targeting children in various ages during longer car trips. The idea is to combine local based information, quizzes, track visualization, communication with buddies in cars on the same (or other) route, and similar “keep-the-kids-in-the-backseat-happy-and-active-with-some-meaningful-challenges” activities.

An AMMS application connected to various databases, like attractions, historical information, etc. could be the backbone is such a system.

V. Been-There-Done-That

For real-life testing of the AMMS concept, we have developed a framework called Been-There-Done-That. The infrastructure consists of three main modules, as illustrated in Figure 3. A client application runs on a smart phone linked to a Bluetooth GPS unit. Bluetooth is a short-distance radio communication protocol (Kardach, 2000). The phone client assists the user in composing annotated messages, and sends them to a server-side dispatcher. The dispatcher processes the messages and stores the content for later use. Finally, the user may access the stored messages through a web browser interface.

Figure 3: Been-There-Done-That architecture

The modules are relatively easy to implement, and highly interchangeable, due to extensive use of open standards and specifications. We give some details in the following sections.

Smart Phone Client

The smart phone client provides an interface where the user can compose and send annotated messages. The implementation consists of two simple applications, an AMMS manager called BTDT and a location acquisition program called Where. The software is implemented on a Sony-Ericsson P900 (P900, 2003). This device runs the Symbian operating system, and C++ is used as programming language.

BTDT

BTDT is responsible for composing the annotated message and transferring it to the dispatcher. The user may choose one of two methods for composing the message. In manual modus, the user specifies the media elements (images, movie clips, audio recordings or text notes) by browsing the file system, as illustrated in Figure 4. In order to provide the spatiotemporal metadata, the user has to select a file that contains the appropriate annotations.

Figure 4: Manual selection of multimedia elements

The message may also be generated in a more automatic way, which can be viewed as a recording session. The user pushes the start button, and then generates the multimedia elements by using the built-in standard applications (camera, camcorder, audio recorder). The spatiotemporal annotations are created by the Where program (see next section). When the user pushes the send button, the program traverses the file system on the phone and retrieves all media files that have been created since the application was started. The files can be text, images, sound or video, depending on the media capabilities of the phone. This “two-click” process is illustrated in Figure 5.

Figure 5: Message recording

When it comes to media creation, different smart phones offer completely different services for both users and developers, concerning available media types and types of multimedia applications. By using such a simple method as checking the creation date of a file and comparing it to the time when the application was started, one is able to solve a set of complex problems concerning media retrieval across a variety of platforms.

When the message composition is completed, it is transferred to the dispatcher using a regular TCP/IP connection. On current GSM-based phones, this transmission takes place by GPRS. The BTDT application also provides functions for setting options such as the URL address of the dispatcher.

Where

The Where application is responsible for the spatiotemporal component of the AMMS. The location can be specified by a place name or as a point or track of geographic coordinates. Coordinates may be provided either manually or from an optional Bluetooth GPS module. The program may easily be extended to retrieve location information from other sources, such as the network operator. Date and time of day is obtained either from the operating system or from the GPS data. The application generates the appropriate SMIL meta elements, and stores them in a text file. In the case where a place name has been given as location, it is assumed that the dispatcher or another component will use a gazetteer service to look up the geographic position.

Dispatcher

The dispatcher works as a server towards the smart phone clients and provides a central storage system for the annotated messages. When a message arrives on the dispatcher, the content is examined and processed according to a given set of rules. The positional data is retrieved from the meta elements. If the location is specified as a textual description, the geographic position is obtained by calling an external gazetteer service. Then the Dispatcher examines each multimedia entry and associates a single coordinate or a collection of coordinates (a track) with the submitted media.

The chosen method of transportation can be any regular TCP/IP-connection. This provides us with a flexible interface separating the dispatcher and the clients. In our case, a client is a smart phone, but it could be any unit capable of generating an AMMS and sending it over TCP/IP.

Browser

The browser is an online, web based map interface where users may inspect and retrieve the content of the messages stored in the repository. The main component is a zoomable and clickable map of the world, where areas that have messages assigned to them are highlighted. By clicking on a hot spot, the user is presented with a list of all messages from that particular area. If there are several local areas with information, they will be grouped into one larger area, depending on the size of the current view in the browser. The users are then able to retrieve the information they are looking for by simply selecting it from the list of available media. Predefined areas of interest may be created in order to facilitate easy navigation. The various elements of the browser are shown in Figure 6. Here the user has zoomed in on some hotspots in Rome, and selected two images and a soundtrack from St. Peter’s cathedral.

Figure 6: Message browsing

The background map is fetched from external map servers using the Web Map Service (WMS) protocol (OGC, 2002). Hence, the map creation process is completely separated from the rest of the infrastructure. Further, it is easy to customize the map, either by choosing another map service or request another set of layers. The browser is open for public use (OneMap, 2004).

Lessons Learned

Been-There-Done-That have been in use during a five months test period. Approximately one hundred AMMS messages have been transmitted from various locations in Norway, Sweden and Italy, using five different service providers. The operating conditions have ranged from sitting indoors at a table to standing in an open fishing boat in rough weather.

The smart phone client has shown to be relatively stable, taking in account that it is a rapidly developed prototype. During the field-testing, we learned that extreme attention has to be paid to make the handset user interface as simple as possible. A mobile user has neither the time nor the patience to interact with a complex interface. Ideally, composing and sending AMMS messages should be as easy as sending text messages.

The GPRS communication has functioned very well on most occasions, even when sending messages in the megabyte range. However, we have experienced that network speed varies greatly, depending on operator, location and time.

The Bluetooth GPS has most of the time been carried in a belt holster, and has worked satisfactory. When starting the unit, it typically takes a minute or two to get a lock on the required number of satellites. Problems have occurred in dense forests or in city areas with tall buildings, a phenomenon, however, common for all GPS receivers. We have experienced that the Bluetooth communication between the phone and GPS unit has been slightly unstable, even when being very close (the distance between Bluetooth devices should not exceed 10 meters).

The dispatcher module has been working flawlessly during the test period, and so have the browser and the underlying map services. Since we have paid little attention to implementing an efficient and appealing user interface, this module would obviously benefit from improvements, preferably based on extensive user centered research.

VI. Final Remarks

We have demonstrated that it is simple and straightforward to extend the MMS format to include spatiotemporal information, and have outlined how to include even richer metadata. The proposed format, Annotated Multimedia Messaging Service (AMMS), is fully MMS compatible according the current MMS conformance document. We have also tested the concept successfully by implementing a test bed called Been-There-Done-That.

It is worth noting that the proposed annotation of mobile media content might have been formatted differently and transmitted in other ways then in the context of the MMS framework. However, we believe that incorporating the annotations in an existing, well known and widely used format, offers significant benefits.

By extensive use of open standards and specifications, in particular from the 3rd Generation Partnership Project (3GPP) and Open Geospatial Consortium (OGC), we have shown that it is easy for software and content vendors to implement location aware solutions based on the AMMS approach. Two students developed the core parts of the software in a half semester project in their first year of their master’s program. The students had no prior knowledge of mobile units, wireless protocols or geographic information.

Currently, there are several projects at Østfold University College aiming at extending the work presented in this paper. We are in particular working on expanding the metadata management by adopting the SMIL 2 specification, and leveraging current research in semi-automatic metadata generation, as reported in (Davis, et al., 2004; Naaman, et al., 2003; Naaman, et al., 2004a). In addition, we are developing applications in the area of mobile gaming and collaborative mapping (Section IV). We welcome comments, contributions and proposals for collaboration. Source code and additional documentation are available on request from the authors.

Acknowledgements

The work presented in this paper is part of Project OneMap, a long-term effort contributing to the fusion of standard web technologies and geographic content, often referred to as the GeoWeb (Misund, et al., 2002; OneMap, 2004). The authors would like to thank the OneMap team members for interesting discussions. We are grateful for the financial support from Østfold University College. The work is partially based on the student projects of Arne Enger Hansen (Hansen, 2004) and Christer Stenbrenden (Stenbrenden, 2004). One of the authors, Gunnar Misund, supervised them both.

References

[1] 3G (3G Today), 2004, Over 132 million reported 3G CDMA subscribers. http://www.3gtoday.com/subscribers/.

[2] 3GPP (The 3rd Generation Partnership Project), 2003, 3GPP TS 23.140 Multimedia Messaging Service (MMS); Functional description. http://www.3gpp.org/ftp/Specs/archive/23_series/23.140/.

[3] Grace Agnew, Markus Buchhorn, Dan Kniesner, Jean Hudgins, Douglas King, Mary-Frances Panettiere and Manjula Patel, 2001, ViDe User's Guide: Dublin Core Application Profile for Digital Video. http://www.vide.net/workgroups/videoaccess/resources/vide_dc_userguide_20010909.pdf

[4] Blogger, 2004, blogger.com, 2004. http://www.blogger.com/about/.

[5] Neil Chan, 2003, Introduction to Location-Based Services. http://www.giscentrum.lu.se/www_summeruniversity/projects2003/Chan.pdf.

[6] CMG, Comverse, Sony Ericsson, Logica, Motorola, Nokia and Siemens, 2002, MMS Conformance Document Version 2.0.0. http://www.ia.hiof.no/~gunnarmi/MMS_Conformance_v2_0_0.pdf.

[7] CNN International, 2001, DoCoMo unveils 3G, with caution. http://edition.cnn.com/2001/BUSINESS/asia/09/30/tokyo.docomo3Gdebut.

[8] Marc Davis, Simon King, Nathan Good, and Risto Sarvas, 2004, From Context to Content: Leveraging Context to Infer Media Metadata, Proceedings of 12th Annual ACM International Conference on Multimedia (MM 2004). Forthcoming 2004. http://fusion.sims.berkeley.edu/GarageCinema/pubs/pdf/pdf_63900590-3243-4FA0-845E4BF832AA8BCC.pdf

[9] DCMI (The Dublin Core Metadata Initiative), 2003, Dublin Core Metadata Element Set, Version 1.1: Reference Description. http:// dublincore.org/documents/dces/.

[10] FCC (Federal Communications Commission), 2004, Enhanced 911. http://www.fcc.gov/911/enhanced.

[11] Emese Gaal, 2001, Emese’s photo blog. http://www.sciencemeetsart.com/emese/blog/.

[12] Hector Garcia-Molina, 2004, BioAct! http://shark.stanford.edu:4230/cgi-bin/flamenco/bio/Flamenco?username=default.

[13] GSM Association, 2004, SMS (Short Messaging Service). http://www. gsmworld.com/technology/sms/index.shtml.

[14] Arne Enger Hansen, 2004, A Location Bound Media Client for Sony Ericsson P800/P900. Project Report, Østfold University College, Faculty of Computer Sciences, Norway.

[15] Jason Harlan (Map Bureau), 2003, Blogmapper. http://www.blogmapper.com/.

[16] Stein Høseggen, 2004. Mobile gaming based on tourist information. Personal communication.

[17] I3A (International Imaging Industry Association), 2001, DIG35: Metadata – A smarter way to look at digital images. http://www.i3a.org/i_dig35.html.

[18] ISO/IEC, 2002, ISO/IEC 15938: Multimedia content description interface. Forthcoming 2004. http://www.iso.org.

[19] Joi Ito, 2004, Joi Ito's Moblogging, Blogmapping and Moblogmapping related resources. http://radio.weblogs.com/0114939/outlines/moblog.html.

[20] ITU (International Telecommunication Union), 2004, Shaping he future mobile information society: The case of the Kingdom of Norway. http://www.itu.int/osg/spu/ni/futuremobile/general/casestudies/norwaycaseE.pdf.

[21] James Kardach, 2000, Bluetooth Architecture, Intel Technology Journal, 2^nd Quarter 2000. http://www.intel.com/technology/itj/q22000/articles/art_1.htm.

[22] Subrahmanyam Karuturi, 2002, What is SMS? http://www.funsms.net/what_is_sms.htm.

[23] Bjørn Inge Langdahl, 2004, Registrering av nettfeil ved bruk av PDA og GPS fra helikopter. ItEnergi 2004 (in Norwegian). http://www.itenergi.com/2004/program/.

[24] Eric Miller, 2004, The Semantic Web. http://www.w3.org/2004/Talks/0120-semweb-umich.

[25] Gunnar Misund and Knut-Erik Johnsen, 2003, The OneMap Project. http://www.ia.hiof.no/~gunnarmi/omd/gmldev_02/.

[26] Mor Naaman, Andreas Paepcke and Hector Garcia-Molina, 2003, From Where to What: Metadata Sharing for Digital Photographs with Geographic Coordinates, Proceedings of CoopIS/DOA/ODBASE 2003.

[27] Mor Naaman, Susumu Harada, QianYing Wang, Hector Garcia-Molina and Andreas Paepcke, 2004, Context data in geo-referenced digital photo collections, Proceedings of the 12th annual ACM international conference on Multimedia.

[28] M. Naaman, S. Harada, Q. Wang, and A. Paepcke, 2004, Adventures in space and time: Browsing personal collections of geo-referenced digital photographs. Technical report, Stanford University. Submitted for Publication. http://dbpubs.stanford.edu:8090/pub/2004-26.

[29] NewBay Software, 2003, FoneBlog - A Web Site for Your Mobile Phone. http://www.newbay.com/whitepapers/FoneBlog%201.0%20White%20Paper.pdf.

[30] Nokia, 2004, Nokia Lifeblog. http://www.nokia.com/nokia/0,1522,,00.html?orig=/lifeblog.

[31] OGC (Open GIS Consortium, Inc.), 2002, Web Map Service Implementation Specification (OGC 01-068r3). http://www.opengis.org/docs/01-068r3.pdf.

[32] Project OneMap, 2002, Project OneMap. http://www.onemap.org.

[33] Project OneMap, 2004, Been-There-Done-That Browser. http://www.onemap.org/ch/geometa/browse.

[34] J. T. Oney, 2001, Wireless Protocols. http://www.nvcc.edu/home/joney/Wireless%20Protocol.ppt.

[35] P900 (Sony Ericsson), 2003, P900 Overview. http://www.sonyericsson.com/p900/index.htm.

[36] Petri Possi (UMTS World), 2004, UMTS / 3G History and Future Milestones. http://www.umtsworld.com/umts/history.htm.

[37] Christer Stenbrenden, 2004, A Location Bound Media Server, Project Report, Østfold University College, Faculty of Computer Sciences, Norway.

[38] Symbian (Symbian Ltd.), 2004, Symbian OS - the mobile operating system. http://www.symbian.com/.

[39] Kentaro Toyama, Ron Logan, Asta Roseway and P. Anandan, 2003, Geographic Location Tags on Digital Images, Proceedings of ACM Multimedia 2003, http://wwmx.org/docs/wwmx_acm2003.pdf.

[40] VG, 2004, MMS tar helt av. (in Norwegian) http://www.vg.no/pub/vgart.hbs?artid=251070.

[41] W3C (World Wide Web Consortium), 2001, Synchronized Multimedia Integration Language (SMIL 2.0). http://www.opengis.org/docs/01-068r3.pdf.

[42] Waag Society, 2004, Amsterdam Realtime. http://www.waag.org/realtime/.

[43] WaveMarket, 2003, WaveBlog. http://www.waveblog.com/.