Wrangle-OpenStreetMap-Data

On the particular project, I am using data mungling techniques to assess the quality of OpenStreetMap’s (OSM) data for the center of Singapore regarding their consistency and uniformity. The data wrangling takes place programmatically, using Python for the most of the process and SQL for items that need further attention while in the PostgreSQL.

The dataset describes the center of Singapore, covering an area from Clementi on the west, to Bedok on the east and from Serangoon on the north, to Sentosa Island on the south. The size of the dataset is 96 MB and can can be downloaded from here

About the project

Scope

OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. The creation and growth of OSM have been motivated by restrictions on use or availability of map information across much of the world, and the advent of inexpensive portable satellite navigation devices.

On the specific project, I am using data from https://www.openstreetmap.org and data mungling techniques, to assess the quality of their validity, accuracy, completeness, consistency and uniformity.
The biggest part of the wrangling takes place programmatically using Python and then the dataset is entered into a PostgreSQL database for further examination of any remaining elements that need attention. Finally, I perform some basic exploration and express some ideas for additional improvements.

Skills demonstrated

Assessment of the quality of data for validity, accuracy, completeness, consistency and uniformity.
Parsing and gathering data from popular file formats such as .xml and .csv.
Processing data from very large files that cannot be cleaned with spreadsheet programs.
Storing, querying, and aggregating data using SQL.

The Dataset

OpenStreetMap's data are structured in well-formed XML documents (.osm files) that consist of the following elements:

Nodes: "Nodes" are individual dots used to mark specific locations (such as a postal box). Two or more nodes are used to draw line segments or "ways".
Ways: A "way" is a line of nodes, displayed as connected line segments. "Ways" are used to create roads, paths, rivers, etc.
Relations: When "ways" or areas are linked in some way but do not represent the same physical thing, a "relation" is used to describe the larger entity they are part of. "Relations" are used to create map features, such as cycling routes, turn restrictions, and areas that are not contiguous. The multiple segments of a long way, such as an interstate or a state highway are grouped into a "relation" for that highway. Another example is a national park with several locations that are separated from each other. Those are also grouped into a "relation".

All these elements can carry tags describing the name, type of road, and other attributes.

For the particular project, I am using a custom .osm file for the center of Singapore which I exported by using the overpass API. The dataset has a volume of 96 MB and can be downloaded from this link

Data Preparation

Imports and Definitions

%matplotlib inline

import xml.etree.cElementTree as ET
from collections import defaultdict
import re
import pprint
from operator import itemgetter
from difflib import get_close_matches

#For export to csv and data validation
import csv
import codecs
import cerberus

#For reverse geocoding
from geopy.geocoders import GoogleV3
geolocator = GoogleV3()
from geopy.exc import GeocoderTimedOut

#OSM downloaded from openstreetmap
SG_OSM = '../Helper/Singapore.osm'
#The following .csv files will be used for data extraction from the XML.
NODES_PATH = "../Helper/nodes.csv"
NODE_TAGS_PATH = "../Helper/nodes_tags.csv"
WAYS_PATH = "../Helper/ways.csv"
WAY_NODES_PATH = "../Helper/ways_nodes.csv"
WAY_TAGS_PATH = "../Helper/ways_tags.csv"

#Regular expressions
PROBLEMCHARS = re.compile(r'[=\+/&<>;\'\"\?%#$@\,\.\t\r\n]')

#A list to save elements need further attention
PROBLEMATICS = []

Parsing the Data

The size of the dataset allows me to parse it to memory to speed up the processing.
In the case of a significant bigger XML, I would have to use the iterparse() function instead.

tree = ET.parse(SG_OSM)
root = tree.getroot()

Data Assessment

An initial exploration of the dataset revealed the following problems:

Abbreviations of street types like ‘Av’ instead of ‘Avenue’ and ‘Rd’ instead of ‘Road’.
All lowercase letters like ‘street’ instead of ‘Street’.
Postcodes including the first letter (S xxxxxx) or the whole name (Singapore xxxxxx) of the country.
Postcodes omitting the leading ‘0’ (probably because of declared as integers at some point before their import to OpenStreetMap.)
Multi-abbreviated amenity names.

The problems in the amenity names were to a small extent, and they were corrected directly in the database, the rest resolved programmatically using Python on the biggest part and a subtle portion of them needed further assessment, resolven in the database.

Auditing Street Types

To audit the street names I should extract them from the XML.
The street names appear in two forms in the dataset:
In Node and Way elements, in the form of: "< tag k="addr:street" v="street_name"/>"

<node id="337171253" lat="1.3028023" lon="103.8599300" version="3" timestamp="2015-08-01T01:38:25Z" changeset="33022579" uid="741163" user="JaLooNz">
    <tag k="addr:city" v="Singapore"/>
    <tag k="addr:country" v="SG"/>
    <tag k="addr:housenumber" v="85"/>
    <tag k="addr:postcode" v="198501"/>
    <tag k="addr:street" v="Sultan Gate"/>
    <tag k="fax" v="+65 6299 4316"/>
    <tag k="name" v="Malay Heritage Centre"/>
    <tag k="phone" v="+65 6391 0450"/>
    <tag k="tourism" v="museum"/>
    <tag k="website" v="http://malayheritage.org.sg/"/>
</node>

In Way elements that have the "< tag k="highway" ..../>", and the 'v' attribute is one of ['living_street', 'motorway', 'primary', 'residential', 'secondary', 'tertiary'], as "< tag k="name" v="street_name"/>".

<way id="4386520" version="23" timestamp="2016-11-07T12:03:39Z" changeset="43462870" uid="2818856" user="CitymapperHQ">
    <nd ref="26778964"/>
    <nd ref="247749632"/>
    <nd ref="1275309736"/>
    <nd ref="1275309696"/>
    <nd ref="462263980"/>
    <nd ref="473019059"/>
    <nd ref="4486796339"/>
    <nd ref="1278204303"/>
    <nd ref="3689717007"/>
    <nd ref="246494174"/>
    <tag k="highway" v="primary"/>
    <tag k="name" v="Orchard Road"/>
    <tag k="oneway" v="yes"/>
</way>

def chk_for_street(element):
    '''Extracts adrresses from elements.
    
    Args:
        element (element): An element of the XML tree
        
    Returns:
        str: If the element has an address it returns it as a string , otherwise it returns nothing.
        
    '''
    highway_types = [
        'living_street', 'motorway', 'primary', 'residential', 'secondary',
        'tertiary'
    ]
    tag = element.find("./tag[@k='addr:street']")
    if tag is None:
        if element.tag == 'way':
            tag = element.find("./tag[@k='highway']")
            try:
                if tag.get('v') in highway_types:
                    return element.find("./tag[@k='name']")
            except AttributeError:
                return
    else:
        return tag
    return

def get_street_names(tree):
    '''Creates a dictionary for all elements in a given tree.
    
    Args:
        tree (ElementTree): An ElementTree object for which I want to find the street names
        
    Returns
        dict: A dictionary with the following stracture: {element_id:street_name}
        
    '''
    result = {}
    for path in ["./node", "./way"]:
        for element in tree.findall(path):
            try:
                result[element.get('id')] = chk_for_street(element).get('v')
            except (AttributeError):  #chk_for_street() returns nothing
                continue
    return result

street_names = get_street_names(root)

#Sample of the dictionary
pprint.pprint(dict(street_names.items()[:10]))

{'173767759': 'Lornie Road',
 '178436854': 'Upper Weld Road',
 '201275515': 'Jalan Novena',
 '241208738': 'Race Course Road',
 '241208739': 'Race Course Road',
 '260092923': 'Jalan Novena Utara',
 '334112229': 'Joo Chiat Road',
 '388268000': 'Waringin Park',
 '46818743': 'Bukit Purmei Avenue',
 '9590561': 'Merchant Road'}

I am searching for multiple versions of the same street type. The different versions include different abbreviations, like Street/St, or different letter cases, like Avenue/avenue.

Although most of the Singaporean street names end with the street type (e.g., “Serangoon Road” or “Arab Street”) it is also common to end with a number instead (e.g. “Bedok North Avenue 1”).
Thus, by using regular expressions, I extracted the last word that does not contain numbers from the street name.

st_types_re = re.compile(r'[a-zA-Z]+[^0-9]\b\.?')

The result will be a dictionary with the format: {street_type:(list_of_street_names)}
I am also adding not expected street names to the "PROBLEMATICS" list for further assessment.

def audit_st_types(streets):
    '''Extracts the "street type" part from an address
    
    Args:
        streets (dict): A dictionary containing street names in the form of {element_id:street_name}
        
    Returns:
        dict: A dictionary of street types in the form of 
        {street_type:(street_name_1,street_name_2,...,street_name_n)}
    
    '''
    result = defaultdict(set)
    for key, value in streets.iteritems():
        try:
            street_type = st_types_re.findall(value)[-1].strip()
        except (IndexError):  #One word or empty street names
            PROBLEMATICS.append((key, 'street name', value))
        result[street_type].add(value)

    return result

streets = audit_st_types(street_names)
#Sample of the dictionary
pprint.pprint(dict(streets.items()[:7]))

{'Fyover': set(['Ophir Fyover']),
 'Melor': set(['Jalan Melor']),
 'Pelatina': set(['Jln Pelatina']),
 'Riang': set(['Jalan Riang']),
 'Rise': set(['Ascot Rise',
              'Binchang Rise',
              'Binjai Rise',
              'Cairnhill Rise',
              'Canning Rise',
              'Clover Rise',
              'Dover Rise',
              'Goldhill Rise',
              'Greenleaf Rise',
              'Holland Rise',
              'Matlock Rise',
              'Mount Sinai Rise',
              'Novena Rise',
              'Oxley Rise',
              'Siglap Rise',
              'Slim Barracks Rise',
              'Telok Blangah Rise',
              'Toa Payoh Rise',
              'Watten Rise']),
 'Satu': set(['Jalan Satu', 'Lengkong Satu']),
 'Taman': set(['Jalan Taman'])}

It would be easy to populate the list of Common Street Types with some profound values like Street or Avenue, but guessing does not take into account any local peculiarity. Instead, I searched the dataset for all the different types and used the 12 with the most occurrences (From 13th position, abbreviations start to appear).

def sort_street_types(street_types):
    '''Counts the number of appearances of each street type and sorts them.
    
    Args:
        street_types (dict): A dictionary of street types in the form of 
        {street_type:(street_name_1,street_name_2,...,street_name_n)}
        
    Returns:
        list: A sorted list of tupples where each tupple includes a 
        street type and the number of occurences in the dataset.
    '''
    result = []
    for key, value in street_types.iteritems():
        result.append((key, len(value)))
        result = sorted(list(result), key=itemgetter(1), reverse=True)
    return result

street_types = sort_street_types(streets)
#print a samle of the list
street_types[:15]

[('Road', 576),
 ('Avenue', 145),
 ('Street', 139),
 ('Drive', 87),
 ('Lane', 79),
 ('Geylang', 42),
 ('Crescent', 42),
 ('Walk', 40),
 ('Park', 39),
 ('Close', 37),
 ('Link', 35),
 ('Terrace', 30),
 ('Ave', 29),
 ('Hill', 25),
 ('Place', 23)]

def populate_expected(street_types, threshold):
    '''Populates the Expected list
    
    Args:
        street_types (list): A sorted list of (street_type, #_of_appearances).
        threshold (int): The number of the top elements I want to put in the "expected" list.
        
    Returns:
        list: Returns a list of the x most frequent street types (x defined by "threshold)
    
     '''
    
    expected = []
    for i in street_types[:threshold]:
        expected.append(i[0])

    return expected

EXPECTED = populate_expected(street_types, 12)
EXPECTED

['Road',
 'Avenue',
 'Street',
 'Drive',
 'Lane',
 'Geylang',
 'Crescent',
 'Walk',
 'Park',
 'Close',
 'Link',
 'Terrace']

To find the street names that need correction, I used the “get_close_matches()” function from the difflib module to find “close matches” of the 12 Common Street Types. This is what I found:

def find_abbreviations(expected, data):
    """Uses get_close_matces() to find similar text
    
    Args:
        expected (list): A list of the expected street types.
        data (list): A list of all the different street types.
        
    Retturns: nothing
        
    """
    
    for i in expected:
        print i, get_close_matches(i, data, 4, 0.5)

find_abbreviations(EXPECTED, list(streets.keys()))

Road ['Road', 'road', 'Rd', 'Ria']
Avenue ['Avenue', 'Aenue', 'Avebue', 'Ave']
Street ['Street', 'street', 'See', 'Stangee']
Drive ['Drive', 'Grove', 'Grisek', 'Bridge']
Lane ['Lane', 'Lana', 'Lateh', 'Layang']
Geylang ['Geylang', 'Pelangi', 'Selangat', 'Selanting']
Crescent ['Crescent', 'Cresent', 'Cres', 'Green']
Walk ['Walk', 'walk', 'Wajek', 'Wakaff']
Park ['Park', 'park', 'Parkway', 'Paras']
Close ['Close', 'Cross', 'Circle', 'Flyover']
Link ['Link', 'Minyak', 'Bingka', 'Limu']
Terrace ['Terrace', 'Terrance', 'Ter', 'Tenteram']

Now, I can map the different variations to the one it meant to be and correct all the different abbreviations of street types.

mapping = {
    'road': 'Road',
    'Rd': 'Road',
    'street': 'Street',
    'Ave': 'Avenue',
    'Avebue': 'Avenue',
    'Aenue': 'Avenue',
    'park': 'Park',
    'walk': 'Walk',
    'link': 'Link',
    'Cresent': 'Crescent',
    'Terrance': 'Terrace',
    'Ter': 'Terrace'
}

def update_street_type(tree):
    '''Corrects the dataset's street name according to the mapping
    
    Args:
        tree (ElementTree): An ElementTree object for which I want to clean the street names
    
    Returns: nothing
           
    '''
    changes = {}
    for path in ["./node", "./way"]:  #"elements" do not have street names.
        for element in tree.findall(path):
            try:
                tag = chk_for_street(element)
                street_name = tag.get('v')
            except (AttributeError
                    ):  #In case element doen't have "street name" attribute
                continue
            try:
                street_type = st_types_re.findall(street_name)[-1].strip()
            except (IndexError):
                #Leaves the problematic street names as is.
                #They are already in the PROBLEMATICS list.
                street_type = street_name

            if street_type in mapping:
                tag.attrib['v'] = tag.attrib['v'].replace(street_type,
                                                          mapping[street_type])

                if street_name not in changes:
                    changes[street_name] = [tag.attrib['v'], 1]
                else:
                    changes[street_name][1] += 1
    counter = 0
    for key, value in changes.iteritems():
        counter += value[1]
        if value[1] == 1:
            print key + ' ==> ' + value[0]
        else:
            print key + ' ==> ' + value[0] + " " + "(" + str(value[
                1]) + " occurrences" + ")"
    print str(counter) + " street names were fixed"
    update_street_type.called = True #Function attribute to track if a function has been called.

update_street_type.called = False

update_street_type(root)

Greenwood Ave ==> Greenwood Avenue (2 occurrences)
Smith street ==> Smith Street
Eunos Ave 7A ==> Eunos Avenue 7A
Arumugam Rd ==> Arumugam Road (3 occurrences)
Yarwood Ave ==> Yarwood Avenue (2 occurrences)
Eunos Ave 6 ==> Eunos Avenue 6
Eunos Ave 5 ==> Eunos Avenue 5
Bukit Batok East Ave 6 ==> Bukit Batok East Avenue 6 (2 occurrences)
Roseburn Ave ==> Roseburn Avenue
Read Cresent ==> Read Crescent
Sophia Rd ==> Sophia Road
Bedok North road ==> Bedok North Road (4 occurrences)
Chee Hoon Ave ==> Chee Hoon Avenue (2 occurrences)
Holland Grove Ter ==> Holland Grove Terrace
Serangoon Aenue 1 ==> Serangoon Avenue 1
Hua Guan Ave ==> Hua Guan Avenue (3 occurrences)
Elite Park Ave ==> Elite Park Avenue
Malcolm Rd ==> Malcolm Road
Eng Neo Ave ==> Eng Neo Avenue (2 occurrences)
Bukit Timah Rd ==> Bukit Timah Road (2 occurrences)
Stockport Rd ==> Stockport Road
Greenmead Ave ==> Greenmead Avenue
Ross Ave ==> Ross Avenue
Nanson road ==> Nanson Road
Raeburn park ==> Raeburn Park
Upper Wilkie Rd ==> Upper Wilkie Road
Tai Thong Cresent ==> Tai Thong Crescent
Clementi Ave 2 ==> Clementi Avenue 2 (3 occurrences)
Clementi Ave 1 ==> Clementi Avenue 1
First Hospital Ave ==> First Hospital Avenue
Wareham Rd ==> Wareham Road
31 Lower Kent Ridge Rd ==> 31 Lower Kent Ridge Road
Towner Rd ==> Towner Road
Greenleaf Ave ==> Greenleaf Avenue
1013 Geylang East Ave 3 ==> 1013 Geylang East Avenue 3
Lakme Terrance ==> Lakme Terrace
Ubi Ave 1 ==> Ubi Avenue 1
Daisy Ave ==> Daisy Avenue
Bayfront Avebue ==> Bayfront Avenue
Eunos Ave 5A ==> Eunos Avenue 5A
Commonwealth Cresent ==> Commonwealth Crescent
Pine walk ==> Pine Walk
Elite Terrance ==> Elite Terrace
Sian Tuan Ave ==> Sian Tuan Avenue
Tai Keng Ave ==> Tai Keng Avenue
Orchard Rd ==> Orchard Road
Wilmonar Ave ==> Wilmonar Avenue
Parkstone Rd ==> Parkstone Road
Kent Ridge Cresent ==> Kent Ridge Crescent (9 occurrences)
Vanda Ave ==> Vanda Avenue
Gloucester road ==> Gloucester Road (5 occurrences)
Tanjong Pagar Rd ==> Tanjong Pagar Road
Hougang Ave 3 ==> Hougang Avenue 3
Hougang Ave 1 ==> Hougang Avenue 1
Chempaka Ave ==> Chempaka Avenue
Greendale Ave ==> Greendale Avenue
83 street names were fixed

Auditing Postcodes

Postcodes in Singapore consist of 6 digits with the first two, denoting the Postal Sector, take values between 01 and 80, excluding 74 (https://www.ura.gov.sg/realEstateIIWeb/resources/misc/list_of_postal_districts.htm).
I am searching the dataset for this pattern, correcting whatever can be addressed automatically and adding the rest to the "PROBLEMATICS" for further examination.

def fix_pcodes():
    """Tries to find an integer between 01 and 80, excluding 74 in the postcode field and
    if needed change the field value accordingly
    
    Args: No args
    
    Returns: Nothing
    """
    postcode_re = re.compile(
        r'(([0-6][0-9])|(7([0-3]|[5-9]))|80)[0-9]{4}')# all integers between 01 and 80, excluding 74 
    for element in root.findall(".//*[@k='addr:postcode']/.."):
        tag = element.find("./*[@k='addr:postcode']")
        postcode = tag.attrib['v']
        try:
            new_tag = postcode_re.search(postcode).group(0)
            if new_tag != postcode:
                tag.attrib['v'] = postcode_re.search(postcode).group(0)
                print postcode + ' ==> ' + tag.attrib['v']
        except (AttributeError):  # If you cannot extract a valid postcode, add the element to PROBLEMATICS
            PROBLEMATICS.append((element.get('id'), 'postcode', postcode))
    fix_pcodes.called = True #Function attribute to track if a function has been called.

fix_pcodes.called = False

fix_pcodes()

Singapore 408564 ==> 408564
S118556 ==> 118556
S 278989 ==> 278989

Postcodes were much more consistent than the street types with 3 problems fixed programmati- cally and 8 pending further inspection.

Importing Dataset to Database

After performing the most of the cleaning through Python, I can store the dataset in the database to examine the PROBLEMATIC elements and explore it further.
I am using PostgreSQL to present a generic solution although a lightweight database like SGLite might be more appropriate.
Initially, I am exporting the data in .csv files using the schema below, creating the tables and importing the .csvs.

Exporting dataset to .CSVs

SCHEMA = {
    'node': {
        'type': 'dict',
        'schema': {
            'id': {'required': True, 'type': 'integer', 'coerce': int},
            'lat': {'required': True, 'type': 'float', 'coerce': float},
            'lon': {'required': True, 'type': 'float', 'coerce': float},
            'user': {'required': True, 'type': 'string'},
            'uid': {'required': True, 'type': 'integer', 'coerce': int},
            'version': {'required': True, 'type': 'string'},
            'changeset': {'required': True, 'type': 'integer', 'coerce': int},
            'timestamp': {'required': True, 'type': 'string'}
        }
    },
    'node_tags': {
        'type': 'list',
        'schema': {
            'type': 'dict',
            'schema': {
                'id': {'required': True, 'type': 'integer', 'coerce': int},
                'key': {'required': True, 'type': 'string'},
                'value': {'required': True, 'type': 'string'},
                'type': {'required': True, 'type': 'string'}
            }
        }
    },
    'way': {
        'type': 'dict',
        'schema': {
            'id': {'required': True, 'type': 'integer', 'coerce': int},
            'user': {'required': True, 'type': 'string'},
            'uid': {'required': True, 'type': 'integer', 'coerce': int},
            'version': {'required': True, 'type': 'string'},
            'changeset': {'required': True, 'type': 'integer', 'coerce': int},
            'timestamp': {'required': True, 'type': 'string'}
        }
    },
    'way_nodes': {
        'type': 'list',
        'schema': {
            'type': 'dict',
            'schema': {
                'id': {'required': True, 'type': 'integer', 'coerce': int},
                'node_id': {'required': True, 'type': 'integer', 'coerce': int},
                'position': {'required': True, 'type': 'integer', 'coerce': int}
            }
        }
    },
    'way_tags': {
        'type': 'list',
        'schema': {
            'type': 'dict',
            'schema': {
                'id': {'required': True, 'type': 'integer', 'coerce': int},
                'key': {'required': True, 'type': 'string'},
                'value': {'required': True, 'type': 'string'},
                'type': {'required': True, 'type': 'string'}
            }
        }
    }
}

NODE_FIELDS = ['id', 'lat', 'lon', 'user', 'uid', 'version', 'changeset', 'timestamp']
NODE_TAGS_FIELDS = ['id', 'key', 'value', 'type']
WAY_FIELDS = ['id', 'user', 'uid', 'version', 'changeset', 'timestamp']
WAY_TAGS_FIELDS = ['id', 'key', 'value', 'type']
WAY_NODES_FIELDS = ['id', 'node_id', 'position']

def shape_element(element):
    """Clean and shape node or way XML element to Python dict
    
    Arrgs:
        element (element): An element of the XML tree
        
    Returns:
        dict: if element is a node, the node's attributes and tags.
              if element is a way, the ways attributes and tags along with the nodes that form the way.
    """
    node_attribs = {}
    way_attribs = {}
    way_nodes = []
    tags = [
    ]  # Handle secondary tags the same way for both node and way elements
    if element.tag == 'node':
        node_attribs['id'] = element.get('id')
        node_attribs['lat'] = element.get('lat')
        node_attribs['lon'] = element.get('lon')
        node_attribs['user'] = element.get('user')
        node_attribs['uid'] = element.get('uid')
        node_attribs['version'] = element.get('version')
        node_attribs['changeset'] = element.get('changeset')
        node_attribs['timestamp'] = element.get('timestamp')
        for child in element:
            if child.tag == 'tag':
                tag = {'id': node_attribs['id']}
                k = child.get('k')
                if not PROBLEMCHARS.search(k):
                    k = k.split(':', 1)
                    tag['key'] = k[-1]
                    tag['value'] = child.get('v')
                    if len(k) == 1:
                        tag['type'] = 'regular'
                    elif len(k) == 2:
                        tag['type'] = k[0]
                tags.append(tag)
        return {'node': node_attribs, 'node_tags': tags}
    elif element.tag == 'way':
        counter = 0
        way_attribs['id'] = element.get('id')
        way_attribs['user'] = element.get('user')
        way_attribs['uid'] = element.get('uid')
        way_attribs['version'] = element.get('version')
        way_attribs['changeset'] = element.get('changeset')
        way_attribs['timestamp'] = element.get('timestamp')
        for child in element:
            if child.tag == 'tag':
                tag = {'id': way_attribs['id']}
                k = child.get('k')
                if not PROBLEMCHARS.search(k):
                    k = k.split(':', 1)
                    tag['key'] = k[-1]
                    tag['value'] = child.get('v')
                    if len(k) == 1:
                        tag['type'] = 'regular'
                    elif len(k) == 2:
                        tag['type'] = k[0]
                tags.append(tag)
            if child.tag == 'nd':
                nd = {'id': way_attribs['id']}
                nd['node_id'] = child.get('ref')
                nd['position'] = counter
                way_nodes.append(nd)
            counter += 1
        return {'way': way_attribs, 'way_nodes': way_nodes, 'way_tags': tags}

def validate_element(element, validator, schema=SCHEMA):
    """Raise ValidationError if element does not match schema
    
    Args:
        element (element): An element of the tree
        validator (cerberus.validator): a validator
        schema (dict): The schema to validate element against.
        
    Returns:
        Nothing
        
        """
    if validator.validate(element, schema) is not True:
        field, errors = next(validator.errors.iteritems())
        message_string = "\nElement of type '{0}' has the following errors:\n{1}"
        error_string = pprint.pformat(errors)

        raise Exception(message_string.format(field, error_string))

class UnicodeDictWriter(csv.DictWriter, object):
    """Extend csv.DictWriter to handle Unicode input"""

    def writerow(self, row):
        super(UnicodeDictWriter, self).writerow({
            k: (v.encode('utf-8') if isinstance(v, unicode) else v)
            for k, v in row.iteritems()
        })

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

def process_map(validate=True):
    """Iteratively process each XML element and write to csv(s)
    
    Arrgs:
        validate (bool): Validate the data before write them to csv or not
        
    Returns:
        Nothing
    """

    with codecs.open(NODES_PATH, 'w') as nodes_file, \
         codecs.open(NODE_TAGS_PATH, 'w') as nodes_tags_file, \
         codecs.open(WAYS_PATH, 'w') as ways_file, \
         codecs.open(WAY_NODES_PATH, 'w') as way_nodes_file, \
         codecs.open(WAY_TAGS_PATH, 'w') as way_tags_file:

        nodes_writer = UnicodeDictWriter(nodes_file, NODE_FIELDS)
        node_tags_writer = UnicodeDictWriter(nodes_tags_file, NODE_TAGS_FIELDS)
        ways_writer = UnicodeDictWriter(ways_file, WAY_FIELDS)
        way_nodes_writer = UnicodeDictWriter(way_nodes_file, WAY_NODES_FIELDS)
        way_tags_writer = UnicodeDictWriter(way_tags_file, WAY_TAGS_FIELDS)

        nodes_writer.writeheader()
        node_tags_writer.writeheader()
        ways_writer.writeheader()
        way_nodes_writer.writeheader()
        way_tags_writer.writeheader()

        validator = cerberus.Validator()
        
        #Check that the dataset has been cleared
        if update_street_type.called is not True:
            update_street_type(root)
            
        if fix_pcodes.called is not True:
            fix_pcodes()            

        for element in root.findall("./*"):
            el = shape_element(element)
            if el:
                if validate is True:
                    #try:
                    validate_element(el, validator)
                    #except:
                    #    print element.get('id')

                if element.tag == 'node':
                    nodes_writer.writerow(el['node'])
                    node_tags_writer.writerows(el['node_tags'])
                elif element.tag == 'way':
                    ways_writer.writerow(el['way'])
                    way_nodes_writer.writerows(el['way_nodes'])
                    way_tags_writer.writerows(el['way_tags'])

process_map()

Connection to the database

For connection between Jupyter Notebook and PostgreSQL, I am using ipython-sql

"""Loading the ipython-sql module"""
%load_ext sql

"""Disabling the printing of number of rows affected by each query"""
%config SqlMagic.feedback=False

"""Connecting to the database"""
%sql postgresql://jupyter_user:notebook@localhost/Project_3

u'Connected: jupyter_user@Project_3'

Creation of the Tables

I am using DELETE CASCADE on the tables with foreign keys to ensure the drop of "tags" along with the related "elements".

%%sql
CREATE TABLE public.nodes
(
  id bigint NOT NULL,
  lat real,
  lon real,
  "user" text,
  uid integer,
  version integer,
  changeset integer,
  "timestamp" text,
  CONSTRAINT nodes_pkey PRIMARY KEY (id)
);

CREATE TABLE public.nodes_tags
(
  id bigint,
  key text,
  value text,
  type text,
  CONSTRAINT nodes_tags_id_fkey FOREIGN KEY (id)
      REFERENCES public.nodes (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE
);

CREATE TABLE public.ways
(
  id bigint NOT NULL,
  "user" text,
  uid integer,
  version text,
  changeset integer,
  "timestamp" text,
  CONSTRAINT ways_pkey PRIMARY KEY (id)
);

CREATE TABLE public.ways_nodes
(
  id bigint NOT NULL,
  node_id bigint NOT NULL,
  "position" integer NOT NULL,
  CONSTRAINT ways_nodes_id_fkey FOREIGN KEY (id)
      REFERENCES public.ways (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT ways_nodes_node_id_fkey FOREIGN KEY (node_id)
      REFERENCES public.nodes (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE
);

CREATE TABLE public.ways_tags
(
  id bigint NOT NULL,
  key text NOT NULL,
  value text NOT NULL,
  type text,
  CONSTRAINT ways_tags_id_fkey FOREIGN KEY (id)
      REFERENCES public.ways (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE
);

[]

Importing the data

After copying the files to the remote server, i can import them to the database.

%%sql
COPY public.nodes
FROM '/home/yannis/Projects/Data_Analysis/Wrangle-OpenStreetMap-Data/Helper/nodes.csv'
CSV HEADER;

COPY public.nodes_tags
FROM '/home/yannis/Projects/Data_Analysis/Wrangle-OpenStreetMap-Data/Helper/nodes_tags.csv'
CSV HEADER;

COPY public.ways
FROM '/home/yannis/Projects/Data_Analysis/Wrangle-OpenStreetMap-Data/Helper/ways.csv'
CSV HEADER;

COPY public.ways_nodes
FROM '/home/yannis/Projects/Data_Analysis/Wrangle-OpenStreetMap-Data/Helper/ways_nodes.csv'
CSV HEADER;

COPY public.ways_tags
FROM '/home/yannis/Projects/Data_Analysis/Wrangle-OpenStreetMap-Data/Helper/ways_tags.csv'
CSV HEADER;

[]

Data assesment in the database

pprint.pprint(PROBLEMATICS)

[('453243296', 'street name', '2'),
 ('46649997', 'street name', u'\u516c\u53f865'),
 ('453227146', 'street name', u'\u516c\u53f865'),
 ('453253763', 'street name', '2'),
 ('169844052', 'street name', '310074'),
 ('1318498347', 'postcode', '135'),
 ('3026819436', 'postcode', '2424'),
 ('3756813987', 'postcode', '05901'),
 ('4338649392', 'postcode', '88752'),
 ('4496749591', 'postcode', '#B1-42'),
 ('23946435', 'postcode', '437 437'),
 ('169844052', 'postcode', '74'),
 ('172769494', 'postcode', '05901')]

I am querying the database for the above elements.

def element_type(element_id):
    """From the element's id, returns the element's type.
    (I need to know the element's type, to run the appropriate query.)
    
    Args:
        element_id (str): The 'id' of the element
        
    Returns:
        (str): The tag of the element.
    """
    path = str('./*[@id="' + element_id + '"]')
    
    return root.find(path).tag

def get_element_tags(element_id):
    """Returns all the tags for a specific element
    
    Args:
        element_id (str):  The 'id' of the element
        
    Returns (sql.run.ResultSet): The result of the query
    
    """
    if element_type(element_id) == 'node':
        tag = %sql SELECT 'Node' AS el_type, * FROM nodes_tags WHERE nodes_tags.id = $element_id
    elif element_type(element_id) == 'way':
        tag = %sql SELECT 'Way' AS el_type, * FROM ways_tags WHERE ways_tags.id = $element_id
    return tag

get_element_tags(PROBLEMATICS[0][0])

el_type	id	key	value	type
Way	453243296	street	2	addr
Way	453243296	building	yes	regular

get_element_tags(PROBLEMATICS[2][0])

el_type	id	key	value	type
Way	453227146	housenumber	65	addr
Way	453227146	postcode	119936	addr
Way	453227146	street	公司65	addr
Way	453227146	building	yes	regular
Way	453227146	name	Habourlink	regular

For the two elements above, it looks like the "street" value is actually a housenumber.

%%sql
UPDATE ways_tags SET key = 'housenumber'
WHERE (id = '453243296' OR id = '453253763') AND key = 'street'

[]

get_element_tags(PROBLEMATICS[1][0])

el_type	id	key	value	type
Way	46649997	housenumber	65	addr
Way	46649997	postcode	119936	addr
Way	46649997	street	公司65	addr
Way	46649997	building	yes	regular
Way	46649997	name	Habourlink	regular

get_element_tags(PROBLEMATICS[3][0])

el_type	id	key	value	type
Way	453253763	building	yes	regular
Way	453253763	housenumber	2	addr

"Habour Link" is a place of worship on "Alexandra Terrace"

%%sql
UPDATE ways_tags SET value = 'Alexandra Terrace'
WHERE (id = '453227146' OR id = '46649997') AND key = 'street'

[]

get_element_tags(PROBLEMATICS[4][0])

el_type	id	key	value	type
Way	169844052	postcode	74	addr
Way	169844052	street	310074	addr
Way	169844052	building	yes	regular
Way	169844052	name	Toa Payoh Vista Market	regular

"Toa Payoh Vista Market" is on 74 Lor 4 Toa Payoh (https://goo.gl/maps/UpoYE2Q4owm)

%%sql
INSERT INTO ways_tags VALUES ('169844052','housenumber','74','addr');

UPDATE ways_tags SET value = '310074' WHERE id ='169844052' AND key = 'postcode';

UPDATE ways_tags SET value = 'Lor 4 Toa Payoh' WHERE id = '169844052' AND key = 'street'

[]

get_element_tags(PROBLEMATICS[5][0])

el_type	id	key	value	type
Node	1318498347	postcode	135	addr
Node	1318498347	street	Jln Pelatina	addr
Node	1318498347	building	yes	regular

It looks like "135" is a housenumber not a postcode. To find missing parts of an address (like a postcode) I'm querying the Google Maps' API.

def complete_address(address):
    """
    Tries to find the full address from part of the address (e.g. without the postcode)
    
    Args:
        address(str): Partial address
        
    Returns:
        (str): Full address
    
    """
    while True:  #Retry the call in case of GeocoderTimedOut error
        try:
            location = geolocator.geocode(address)
            #coordinates = str(coord[0]) + "," + str(coord[1])
            print location.raw['formatted_address']
        except GeocoderTimedOut:
            continue
        break

complete_address(' 135 Jln Pelatina, Singapore')

135 Jln Pelatina, Singapore

I cannot find the postcode, I am just changing the "postcode" to "housenumber".

%sql UPDATE nodes_tags SET key = 'housenumber' WHERE id = '1318498347' AND value = '135'

[]

get_element_tags(PROBLEMATICS[6][0])

el_type	id	key	value	type
Node	3026819436	city	Singapore	addr
Node	3026819436	housenumber	136	addr
Node	3026819436	postcode	2424	addr
Node	3026819436	street	Orchard Road	addr
Node	3026819436	landuse	retail	regular
Node	3026819436	name	Liana Medic Ltd	regular
Node	3026819436	bitcoin	yes	payment
Node	3026819436	phone	+65 2424666	regular
Node	3026819436	website	http://www.lianamedic.com/	regular

complete_address(' 136 Orchard Road, Singapore')

136 Orchard Rd, Singapore

I cannot find a valid postcode. I am deleting the specific tag.

%sql DELETE FROM nodes_tags WHERE id = '3026819436' AND key = 'postcode'

[]

get_element_tags(PROBLEMATICS[7][0])

el_type	id	key	value	type
Node	3756813987	city	Singapore	addr
Node	3756813987	country	SG	addr
Node	3756813987	housenumber	6	addr
Node	3756813987	postcode	05901	addr
Node	3756813987	street	Sago Street	addr
Node	3756813987	amenity	restaurant	regular
Node	3756813987	name	Wonderful Food and Beverage	regular
Node	3756813987	phone	+65 9108 5572	regular

complete_address(' 6 Sago Street, Singapore')

6 Sago St, Singapore 059011

%sql UPDATE nodes_tags SET value = '059011' WHERE id = '3756813987' AND key = 'postcode'

[]

get_element_tags(PROBLEMATICS[8][0])

el_type	id	key	value	type
Node	4338649392	postcode	88752	addr
Node	4338649392	name	Exit Plan	regular
Node	4338649392	tourism	attraction	regular

The specific amenity is on "279 New Bridge Road, Singapore 088752" (https://goo.gl/maps/K7JzQ3Ujsvq)

%%sql
UPDATE nodes_tags SET value = '088752' WHERE id = '4338649392' AND key = 'postcode';

INSERT INTO nodes_tags VALUES 
('4496749591','housenumber','279','addr'),
('4496749591','street','New Bridge Road','addr')

[]

get_element_tags(PROBLEMATICS[9][0])

el_type	id	key	value	type
Node	4496749591	city	Singapore	addr
Node	4496749591	postcode	#B1-42	addr
Node	4496749591	street	Bukit Timah Shopping Centre	addr
Node	4496749591	name	GP Tuition	regular
Node	4496749591	housenumber	279	addr
Node	4496749591	street	New Bridge Road	addr

This amenity does not exist anymore (https://www.bukittimahshoppingcentre.sg/directory/). I'm deleting the whole node.

%sql DELETE FROM nodes WHERE id = '4496749591'

[]

get_element_tags(PROBLEMATICS[10][0])

el_type	id	key	value	type
Way	23946435	city	Singapore	addr
Way	23946435	country	SG	addr
Way	23946435	housenumber	80	addr
Way	23946435	postcode	437 437	addr
Way	23946435	street	Rhu Cross	addr
Way	23946435	description	18 hole Par 72	regular
Way	23946435	leisure	golf_course	regular
Way	23946435	name	Marina Bay Golf Course	regular
Way	23946435	phone	+65 6345 7788	regular
Way	23946435	website	http://www.mbgc.com.sg/	regular

complete_address(' 80 Rhu Cross, Singapore')

80 Rhu Cross, Singapore 437437

%sql UPDATE ways_tags SET value = '437437' WHERE id = '23946435' AND key = 'postcode'

[]

"Way" element 169844052 has been corrected already.

get_element_tags(PROBLEMATICS[12][0])

el_type	id	key	value	type
Way	172769494	city	Singapore	addr
Way	172769494	country	SG	addr
Way	172769494	housenumber	6	addr
Way	172769494	postcode	05901	addr
Way	172769494	street	Sago Street	addr
Way	172769494	building	yes	regular
Way	172769494	source	Bing	regular

complete_address('6 Sago Street, Singapore')

6 Sago St, Singapore 059011

%sql UPDATE ways_tags SET value = '059011' WHERE id = '172769494' AND key = 'postcode'

[]

Data Exploration

Dataset Specific

Below you may find some basic attributes of the dataset.

Size of the database

size_in_bytes = %sql SELECT pg_database_size('Project_3');
print "DB size: " + str(size_in_bytes[0][0]/1024**2) + ' MB'

DB size: 96 MB

Number of Unique Users

%%sql
SELECT count(DISTINCT(uid)) AS "Unique Users"
FROM (SELECT uid FROM nodes 
      UNION 
      SELECT uid FROM ways) AS elements;

Unique Users
848

Top 10 Users (by number of entries)

%%sql
SELECT nodes_ways."user" AS "User", COUNT(*) AS "Users"
FROM (SELECT "user" FROM nodes
      UNION ALL
      SELECT "user" FROM ways) AS nodes_ways
GROUP BY nodes_ways."user"
ORDER BY "Users" DESC
LIMIT 10;

User	Users
JaLooNz	155426
rene78	32937
Luis36995	32150
cboothroyd	20478
calfarome	16615
ridixcr	13830
nikhilprabhakar	13082
Paul McCormack	12620
matx17	12000
yurasi	8868

Number of Nodes and Ways

n_nodes = %sql SELECT COUNT(*) FROM nodes
n_ways = %sql SELECT COUNT(*) FROM ways

print "Number of 'nodes: " + str(n_nodes[0][0])
print "Number of 'ways: " + str(n_ways[0][0])

Number of 'nodes: 409352
Number of 'ways: 66404

Area Specific

Most popular streets

%%sql
SELECT street_names.value AS "Street", COUNT(street_names.value) AS "Times Refered"
FROM
	(SELECT nodes_tags.value
	FROM nodes_tags
	WHERE type = 'addr' AND key = 'street'
	UNION ALL
	SELECT ways_tags.value
	FROM ways_tags
	WHERE 	type = 'addr' AND key = 'street'
		OR
		id in
			(SELECT id
			FROM ways_tags
			WHERE key = 'highway')
	AND key = 'name') AS street_names
GROUP BY street_names.value
ORDER BY "Times Refered" DESC
LIMIT 10

Street	Times Refered
Jalan Senang	229
Joo Chiat Road	204
Bedok Reservoir Road	185
Tanjong Pagar Road	166
South Bridge Road	165
Serangoon Road	164
North Bridge Road	135
Seah Street	134
Neil Road	133
Dunlop Street	128

Most frequent amenities

Anyone who has lived in Singapore knows the love of Singaporeans for food. No surprises here; restaurants are, by far, on the top of the results.

%%sql
SELECT value AS "Amenity", COUNT(value) AS "Occurrences"
FROM	(SELECT *
	FROM nodes_tags
	UNION ALL
	SELECT *
	FROM nodes_tags) as tags
WHERE key = 'amenity'
GROUP BY value
ORDER BY "Occurrences" DESC
LIMIT 10

Amenity	Occurrences
restaurant	1562
parking	654
taxi	524
cafe	396
fast_food	252
atm	194
toilets	190
bar	176
bank	120
police	120

Most popular cuisine

%%sql
SELECT value AS "Cuisine", COUNT(*) AS "Restaurants" 
FROM (SELECT * FROM nodes_tags 
      UNION ALL 
      SELECT * FROM ways_tags) tags
WHERE tags.key='cuisine'
GROUP BY value
ORDER BY "Restaurants"  DESC
LIMIT 10

Cuisine	Restaurants
chinese	99
japanese	42
korean	36
coffee_shop	34
burger	33
italian	32
indian	28
asian	27
pizza	17
french	15

ATMs

%%sql
SELECT value AS "Bank", COUNT(value) AS "ATMs"
FROM nodes_tags
WHERE id in
    (SELECT id
    FROM nodes_tags
    WHERE value = 'atm')
    AND
    key = 'operator'
GROUP BY value
ORDER BY "ATMs" DESC

Bank	ATMs
POSB	18
UOB	12
OCBC	9
Citibank	8
DBS	7
singapore room home	1
Quantified Assets, Pte. Ltd.	1
Posb	1
HSBC	1
Uob	1
DSS	1
Overseas Chinese Banking Corporation	1
DBS / UOB	1
DBS/POSB	1
home	1
Standard Chartered Bank	1

There are different abbreviations of bank names and some records that are not banks.

%%sql
UPDATE nodes_tags
SET value = 'POSB'
WHERE value = 'Posb';

UPDATE nodes_tags
SET value = 'UOB'
WHERE value = 'Uob';

UPDATE nodes_tags
SET value = 'OCBC'
WHERE value = 'Overseas Chinese Banking Corporation';

[]

%%sql
DELETE FROM nodes
WHERE id IN
	(SELECT id
	FROM nodes_tags
	WHERE key = 'operator'
	AND (value = 'singapore room home'
	OR
	value = 'home'))

[]

%%sql
SELECT value AS "Bank", COUNT(value) AS "ATMs"
FROM nodes_tags
WHERE id in
    (SELECT id
    FROM nodes_tags
    WHERE value = 'atm')
    AND
    key = 'operator'
GROUP BY value
ORDER BY "ATMs" DESC

Bank	ATMs
POSB	19
UOB	13
OCBC	10
Citibank	8
DBS	7
DSS	1
HSBC	1
DBS / UOB	1
DBS/POSB	1
Standard Chartered Bank	1
Quantified Assets, Pte. Ltd.	1

Religion

Singapore is well-known for its multicultural environment. People with different religious and ethnic heritages are forming the modern city-state. This is reflected in the variety of temples that can be found in the country.

%%sql
SELECT tags.value AS "Religion", COUNT(*) AS "Temples" 
FROM (SELECT * FROM nodes_tags
      UNION ALL 
      SELECT * FROM ways_tags) tags
WHERE tags.key='religion'
GROUP BY tags.value
ORDER BY "Temples" DESC;

Religion	Temples
christian	73
muslim	38
buddhist	30
hindu	9
taoist	6
jewish	1
sikh	1

Ideas for additional improvements.

There are several areas of improvement of the project in the future. The first one is on the completeness of the data. All the above analysis is based on a dataset that reflects a big part of Singapore but not the whole country. The reason for this is the lack of a way to download a dataset for the entire Singapore without including parts of the neighboring countries. The analyst has to either select a part of the island/country or select a wider area that includes parts of Malaysia and Indonesia. Also, because of relations between nodes, ways, and relations, the downloaded data expand much further than the actual selection. Below you can see a plotting of the coordinates of the nodes of a dataset from a tight selection of Singapore. You can notice that huge parts of nearby countries were selected.

As a future improvement, I would download a wider selection or the metro extract from MapZen and filter the non-Singaporean nodes and their references. The initial filtering could take place by introducing some latitude/longitude limits in the code to sort out most of the "non-SG" nodes.

Then, I would download a shapefile for Singapore (e.g. http://www.diva-gis.org/gdata), use a GIS library like Fiona to create a polygon and finally with a geometric library like Shapely and compare all the nodes' coordinate against this polygon. Finally, I would clean all the ways and relations from the "non-sg" nodes and remove these that become childless to conclude with a dataset of all (and only) Singapore.

The drawback of the above technic is that the comparison of each node against the polygon is a very time-consuming procedure with my initial tests taking 17-18 hours to produce a result. This is the reason the above approach left as a future improvement probably along with the use of multithreading technics to speed up the process.

The second area with room for future improvement is the exploratory analysis of the dataset. Just to mention some of the explorings that could take place:

Distribution of commits per contributor.
Plotting of element creation per type, per day.
Distribution of distance between different types of amenities
Popular franchises in the country (fast food, conventional stores, etc.)
Selection of a bank based on the average distance you have to walk for an ATM.
Which area has the biggest parks and recreation spaces.

The scope of the current project was the wrangling of the dataset, so all the above have been left for future improvement.

Finally, open data are here to make average people's life better. For the non-data analyst, it would be nice if there was an application (mobile or web) that could evaluate the suitability of a potential rental house. The work addresses of all family members, importance weights on several amenities like supermarkets, convenience stores, cafes, public transportation, etc. and the application would calculate the suitability of each potential rental. The user would be able to sort them by score and compare them.

References

Udacity - https://www.udacity.com/
Wikipedia - https://www.wikipedia.org/
OpenStreetMap - https://www.openstreetmap.org
Overpass API - http://overpass-api.de/
Python Software Foundation - https://www.python.org/
Urban Redevelopment Authority of Singapore - https://www.ura.gov.sg
Catherine Devlin's Github repository - https://github.com/catherinedevlin/ipython-sql
Google Map APIs - https://developers.google.com/maps/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrangle-OpenStreetMap-Data.md

Wrangle-OpenStreetMap-Data.md

Wrangle-OpenStreetMap-Data

About the project

Scope

Skills demonstrated

The Dataset

Data Preparation

Imports and Definitions

Parsing the Data

Data Assessment

Auditing Street Types

Auditing Postcodes

Importing Dataset to Database

Exporting dataset to .CSVs

Connection to the database

Creation of the Tables

Importing the data

Data assesment in the database

Data Exploration

Dataset Specific

Size of the database

Number of Unique Users

Top 10 Users (by number of entries)

Number of Nodes and Ways

Area Specific

Most popular streets

Most frequent amenities

Most popular cuisine

ATMs

Religion

Ideas for additional improvements.

References

Files

Wrangle-OpenStreetMap-Data.md

Latest commit

History

Wrangle-OpenStreetMap-Data.md

File metadata and controls

Wrangle-OpenStreetMap-Data

About the project

Scope

Skills demonstrated

The Dataset

Data Preparation

Imports and Definitions

Parsing the Data

Data Assessment

Auditing Street Types

Auditing Postcodes

Importing Dataset to Database

Exporting dataset to .CSVs

Connection to the database

Creation of the Tables

Importing the data

Data assesment in the database

Data Exploration

Dataset Specific

Size of the database

Number of Unique Users

Top 10 Users (by number of entries)

Number of Nodes and Ways

Area Specific

Most popular streets

Most frequent amenities

Most popular cuisine

ATMs

Religion

Ideas for additional improvements.

References