Logo Utrecht University

Question Based Analysis

MSc Thesis Topics

An empirical user study of GIS expert strategies in solving geo-spatial tasks

What student may need to learn:

  • CTA framework;
  • Ability to design and conduct controlled psychological studies;
  • Insight into behavioral measurement tools;
  • Expert strategies for GIS-based analysis;
  • Behavioral correlates to cognitive processes;

Summary:
Geo-spatial analysis with GIS is a complex task that requires comprehensive domain-specific problem-solving skills from GIS experts. Geo-spatial analysis with GIS (Vahedi, Kuhn, & Ballatore, 2016) starts with an identification of an overall objective (e.g., a phenomenon to be analyzed). The objective is decomposed into a set of analytic steps each leading to a workflow with one or more analytic tools applied on datasets. The goal of this thesis is to investigate (1) overall problem-solving strategies that enable experts to efficiently solve a GIS task, (2) inherent task properties, tacit and explicit expert knowledge, and cognitive processes that facilitate experts in defining and executing those strategies. The outcome of this study will advance the understanding of human factors that should be addressed for the automation of GIS-based problem-solving.
Methodologically, this study will employ Cognitive Task Analysis (CTA; Zachary, Ryder, & Hicinbothom, 1998; Seamster, Redding, Cannon, Ryder, & Purcell, 1993; Wilson, Fernandez, & Hadaway, 1993) framework from the domains of Experimental Psychology and Cognitive Science. CTA entails observation of human experts solving a complex task in a controlled laboratory setup using a variety of behavioral measurement tools such as screen and audio recording, mouse tracing, interview questionnaires, etc.

Requirements:
This study will be conducted as part of a running research project focusing on geospatial concepts and question-answering. However, the student should be able to demonstrate a good ability to independently design and conduct a laboratory experiment with human participants for given research objectives. The student should have a good understanding of or, at least, quickly able to learn human factors facilitating expert problem-solving. Finally, knowledge of GIS-based analysis is necessary.

Contact:
Simon Scheider (s.scheider@uu.nl), Enkhbold Nyamsuren (e.nyamsuren@uu.nl)

References:

  1. Behzad Vahedi, Werner Kuhn, and Andrea Ballatore 2016. Question-based spatial computing. A case study. In Geospatial Data in Changing World , pages 37–50. Springer
  2. Zachary, W. W., Ryder, J. M., & Hicinbothom, J. H. (1998). Cognitive task analysis and modeling of decision making in complex environments.
  3. Seamster, T. L., Redding, R. E., Cannon, J. R., Ryder, J. M., & Purcell, J. A. (1993). Cognitive task analysis of expertise in air traffic control. The international journal of aviation psychology, 3(4), 257-283.
  4. Wilson, J. W., Fernandez, M. L., & Hadaway, N. (1993). Mathematical problem solving. Research ideas for the classroom: High school mathematics, 57, 78.

An empirical annotation study of geo-analytic questions

What student may need to learn:
Learn the core concept of spatial information, natural language processing (NLP), annotation methods, geo-analytic question categories, and Python

Summary:
In order to understand the structure of spatial questions in GIS, it is necessary to analyze such questions in terms of syntax and semantics. The master thesis contributes to this end by designing a study in order to test question annotations. The candidate should then apply linguistic methods to annotate and categorize questions and find common patterns in a question corpus. The thesis can address the following research questions:

  • What are the types of spatial concepts used in these questions and how can they be grouped?
  • Which syntactic patterns are used to express a given concept? For example, (”How far”, ”How close”, ”What is the distance?”, ”what is the nearest?”’)
  • How reliably can these categories be found and annotated in the corpus?

The goal of this thesis is to evaluate the question categories of geo-analytic questions. Our own question categories are designed based on the core concepts of spatial information and typical GIS question types, such as the location of field (“What areas do have slope larger than 10% in Spain”), and point distribution (“What are the locations of the individual crimes in Amsterdam”). A geo-analytic question corpus is provided. A group of around five GIS experts might be sufficient for annotating these questions. Based on the understanding of the geo-analytic questions, each expert will annotate the question categories and the syntactic components corresponding to the categories. By comparing the annotation results using statistical packages in Python, we analyze the inter-annotator agreement of categories for each question and a mapping relation between the syntactic components and the categories.

Requirements:
It is helpful to have a GIS background and some interest in NLP and annotation techniques

Contact:
Simon Scheider, (s.scheider@uu.nl), Haiqi Xu (h.xu1@uu.nl)

References:

  1. Artstein, R. (2017). Inter-annotator agreement. In Handbook of linguistic annotation (pp. 297-313). Springer, Dordrecht
  2. Bergman M.K. (2018) Testing and Best Practices. In: A Knowledge Representation Practionary. Springer, Cham (the entire book is worth reading)
  3. Kuhn, W. (2012). Core concepts of spatial information for transdisciplinary research. International Journal of Geographical Information Science, 26(12), 2267-2276.

An Ontology of Geographic Units of Measure for GIS-based analysis

What student may need to learn:
Application of Semantic Web technologies and ontologies in a GIS context. Semantic annotation of spatial data. Design and development of applied ontologies. Existing ontologies within and outside of the geographic domain.

Summary:
GIS-based analysis and transformation of data require from a human analyst an ability to interpret what measurements are contained in data and what analytical transformations are applicable to them. To facilitate automated geographic question answering, it is necessary to capture this expert knowledge in a machine-interpretable form such as ontology. This study will focus on the design and implementation of an applied ontology of units of measurements that commonly occur in spatial data and tools for GIS-based analysis. First, the ontology should describe the units in, at least, three different levels: conceptual level (e.g., distance), unit level (e.g., meter), and scale level (e.g., ratio). The ontology should be tailored to describing spatial data and GIS-based analysis. Second, the ontology should define how different units may be related to each other such as via transformation (e.g., a transformation of meter to meter square). The ontology should be built from existing general ontologies of units of measurement (e.g., Rijgersberg, van Assem, & Top, 2013; Gkoutos, Schofield, & Hoehndorf, 2012) and ontologies for geographical data (e.g., Sun et al., 2019). Therefore, a quality review of existing literature and ontologies is an important first phase of this study. The second phase should be devoted to the design and implementation of ontology. The third phase of the study should be devoted to ontology validation (e.g., using the ontology to annotate existing spatial datasets such as PDOK.

Requirements:
Familiarity with spatial data, GIS-based analysis. Basic knowledge of Semantic Web and ontology technologies such as OWL and RDF.

Contact:
Enkhbold Nyamsuren (e.nyamsuren@uu.nl), Simon Scheider, (s.scheider@uu.nl).

References:

  1. Rijgersberg, H., van Assem, M., & Top, J. (2013). Ontology of units of measure and related concepts. Semantic Web, 4(1), 3-13.
  2. Gkoutos, G. V., Schofield, P. N., & Hoehndorf, R. (2012). The Units Ontology: a tool for integrating units of measurement in science. Database, 2012.
  3. Sun, K., Zhu, Y., Pan, P., Hou, Z., Wang, D., Li, W., & Song, J. (2019). Geospatial data ontology: the semantic foundation of geospatial data integration and sharing. Big Earth Data, 3(3), 269-296.
  4. PDOK spatial dataset portal. https://www.pdok.nl/datasets
  5. OM ontology. https://github.com/HajoRijgersberg/OM
  6. UP ontology. https://bioportal.bioontology.org/ontologies/UO

Geodata source retrieval in PDOK with multilingual/semantic keywords

What student may need to learn:
Learn PDOK, Python, learn linguistic APIs (WordNet, Google translate), learn querying in a meta-database (SPARQL).

Summary:
The goal of this thesis is to help improve geodata source retrieval by exploiting natural language processing (NLP) techniques on the keywords and texts associated with a given geodata source in the Dutch data repository PDOK. Texts can be automatically translated via Google translate and queries can be expanded using WordNet. Evaluation involves generating a manual gold standard for retrieval of data for a defined set of data queries and then testing the quality of the automated keyword-based approach, comparing it with the standard search available in PDOK.

Requirements:
It is helpful to speak Dutch in order to understand PDOK. Some programming skills in Python or some query language is needed.

Contact:
Simon Scheider (s.scheider@uu.nl)

References:

  1. Purves, Ross, and Christopher Jones. “Geographic information retrieval.” SIGSPATIAL Special 3.2 (2011): 2-4
  2. Mandl, Thomas, et al. “GeoCLEF 2007: The CLEF 2007 cross-language geographic information retrieval track overview.” Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, Berlin, Heidelberg, 2007
  3. Gong, Zhiguo, Chan Wa Cheang, and U. Leong Hou. “Web query expansion by WordNet.” International Conference on Database and Expert Systems Applications. Springer, Berlin, Heidelberg, 2005

Geodata source retrieval in PDOK using the CCDT ontology

What student may need to learn:
Learn PDOK, Python, learn the Core Concept Data Types Ontology (CCDT), learn querying in a linked database (SPARQL).

Summary:
The goal of this thesis is to help improve geodata source retrieval by exploiting the core concept data types ontology applied to the Dutch data repository PDOK. Core concept data types provide a way to describe both the geodata type and the semantics of geodata in an application-independent way. They also provide a way to answer questions and to retrieve geoinformation in a precise way. Evaluation involves generating a manual gold standard for retrieval of data for a defined set of data queries and then testing the quality of the automated CCDT based approach, comparing it with the standard search available in PDOK.

Requirements:
It is helpful to speak Dutch to understand PDOK. Some programming experience in Python and SPARQL.

Contact:
Simon Scheider (s.scheider@uu.nl)

References:

  1. Scheider,S.,Meerlo,R.,Kasalica,V.andLamprecht,A.-L.(2020a) Ontology of core concept data types for answering geo-analytical questions. Journal of Spatial Information Theory. URL:https://www.josis.org/index.php/josis/article/view/555.In press
  2. Purves, Ross, and Christopher Jones. “Geographic information retrieval.” SIGSPATIAL Special 3.2 (2011): 2-4
  3. Janowicz, Krzysztof, Martin Raubal, and Werner Kuhn. “The semantics of similarity in geographic information retrieval.” Journal of Spatial Information Science 2011.2 (2011): 29-57(http://www.josis.org/index.php/josis/article/viewArticle/26)
  4. Manso-Callejo, Miguel, Mónica Wachowicz, and Miguel Bernabé-Poveda. “The design of an automated workflow for metadata generation.” Research Conference on Metadata and Semantic Research. Springer, Berlin, Heidelberg, 2010

Assessing the potential of question-based querying systems in Spatial Data Infrastructures of organizations

What student may need to learn:
Theory and practice of Spatial Data Infrastructures

Summary:
The goal of this thesis is to assess the potential of question-based querying systems in Spatial Data Infrastructures (SDIs). An SDI is a framework for the management and distribution of an organization’s spatial data. Question-based querying techniques could be a tremendous improvement in an SDI’s data retrieval system. However, an SDI serves a variety of user groups, each of which may have different preferences. Additionally, organizations may show reluctance to adopt question-based querying systems, which could pose challenges for system implementation. Understanding the benefits and drawbacks of such systems in the context of an SDI is pivotal to their successful development. The preferences and expectations of users of organizational SDIs should be identified using either or both interview- and survey questionnaire techniques and studied using qualitative and quantitative analysis methods. The results should be used to generate recommendations for the development and implementation of question-based querying systems in the context of SDIs.

Contact:
Simon Scheider (s.scheider@uu.nl), Eric Top (e.j.top@students.uu.nl)

References:

  1. Scheider, S., Nyamsuren, E., Kruiger, H., & Xu, H. (2020). Geo-analytical question-answering with GIS. International Journal of Digital Earth, 1-14
  2. Behzad Vahedi, Werner Kuhn, and Andrea Ballatore 2016. Question-based spatial computing. A case study. In Geospatial Data in a Changing World , pages 37–50. Springer
  3. Bernard, Lars, et al. “The European geoportal––one step towards the establishment of a European Spatial Data Infrastructure.” Computers, environment and urban systems 29.1 (2005): 15-31
  4. Konrad Höffner, Sebastian Walter, Edgard Marx, Ricardo Usbeck, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo. Survey on challenges of question answering in the semantic web. Semantic Web, 8(6):895–920, 2017.
  5. Wei Chen. Developing a Framework for Geographic Question Answering Systems Using GIS, Natural Language Processing, Machine Learning, and Ontologies. PhD thesis, Ohio State University, 2014

Extracting spatio-temporal extents of online geodata resources

The amount of geodata sources available on the Web exceeds our capacities of comprehension. Even as GIS experts, we frequently loose overview of the available sources that are useful for a specific region of analysis, a specific time horizon, or a specific subject. In order to support GIS analysts to find the right resources, it is necessary to extract meta-information about a source.
For example, in order to know whether Open Street Map (OSM) can be used for the analysis of urban accessibility or touristic infrastructure in a city like Utrecht, we need to find out whether there is OSM data available in this region and for the themes of touristic points of interest or road infrastructure. How can we best describe online geodata resources using linked open data in such a way that we can know it might answer a specific question?
This master thesis topic focuses on the developing and testing semantic descriptions of a common online data source such as OSM to facilitate question answering (QA) with data. Some examples of research questions are:

  • How to define or measure the spatio-temporal extent of a certain theme, e.g. touristic monuments?
  • How can this information be automatically extracted?
  • How to search for the themes within an extent?

We expect a linked dataset that contains the spatio-temporal extents of multiple themes in an online geodata source. The data set should be tested by querying extents and checking whether the underlying data source really covers the given request. The thesis should also address a reusable method of extraction.

Contact:
Simon Scheider (s.scheider@uu.nl)

References:

  1. Haklay, M. (2008). How good is OpenStreetMap information? A comparative study of OpenStreetMap and Ordnance Survey datasets for London and the rest of England
  2. Ribeiro, A. and Fonte, C.C., 2015. A METHODOLOGY FOR ASSESSING OPENSTREETMAP DEGREE OF COVERAGE FOR PURPOSES OF LAND COVER MAPPING. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences, 2
  3. Girres, J.F. and Touya, G., 2010. Quality assessment of the French OpenStreetMap dataset. Transactions in GIS, 14(4), pp.435-459
  4. Mooney, P., Corcoran, P. and Winstanley, A.C., 2010, November. Towards quality metrics for OpenStreetMap. In Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems (pp. 514-517). ACM.