What is a street name
Now that we have streets sorted, we need to work out what part is the name. The most common full street name may be “Small Street”, but I now want to separate the name, Small, from its desriptor Street.
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import shapely
%matplotlib inline
streets = pd.read_csv("clean_streets_reduced.csv")
streets['name'].head()
0 1
1 10th avenue
2 12th avenue
3 12th avenue north
4 12th street
Name: name, dtype: object
Attempt 1
Lets keep it simple to start of with: the first part of the street name is the name an
def get_name1(full_name):
'''Gets the name of the street: full name minus the last word.'''
split_name = full_name.split()
return split_name[0]
streets["street_name"] = streets["name"].apply(get_name1)
streets.groupby("street_name")["id"].count().sort_values(ascending=False).head()
street_name
the 247
st 108
park 95
william 73
john 70
Name: id, dtype: int64
And the most common name is “the”. As in “The Avenue”, “The Grand Parade” or “The Serpentine”. In my mind “The” is an integral part of the street name. The whole name should be The Avenue, The Grand Parade or The Serpentine.
The second most common is “st” as in “St. Mark Close” or “St. Thomas Road”. In these cases I think the names should be “St. Mark” or “St. Thomas”.
Let’s revise that funciton. If the name starts with “The” the full name is its name. Otherwise it will return everything other than the last word in the name:
def get_name2(full_name):
'''Gets the name of the street: full name minus the last word.
Unless the full name starts with "the", then return the full_name
'''
split_name = full_name.split()
if split_name[0]=='the':
return full_name
else:
return ' '.join(split_name[:-1])
streets["street_name"] = streets["name"].apply(get_name2)
streets.groupby("street_name")["id"].count().sort_values(ascending=False).head()
street_name
park 92
69
victoria 60
railway 56
william 54
Name: id, dtype: int64
“Park” is now the most common - which sounds about right. But
def get_name3(full_name):
'''Gets the name of the street: full name minus the last word.
Unless the full name starts with "the" or is a single word,
then return the full_name.
'''
split_name = full_name.split()
if len(split_name)==1:
return full_name
if split_name[0]=='the':
return full_name
else:
return ' '.join(split_name[:-1])
streets["street_name"] = streets["name"].apply(get_name3)
streets.groupby("street_name")["id"].count().sort_values(ascending=False).head()
street_name
park 92
victoria 60
railway 56
william 54
short 53
Name: id, dtype: int64
This looks fine! Lets look at what the descriptor would look like. For this, I’m just taking what is left after the name:
def get_description(full_name):
'''Gets the description of the street: the last word of the
full name, unless the full name starts with "the" or is a single word,
then return None.
'''
split_name = full_name.split()
if len(split_name)==1:
return None
if split_name[0]=='the':
return None
else:
return split_name[-1]
streets["street_description"] = streets["name"].apply(get_description)
streets.groupby("street_description")["id"].count().sort_values(ascending=False).head(20)
street_description
street 12755
place 6359
avenue 5862
road 4345
lane 2228
close 1732
crescent 1626
drive 969
way 841
court 672
parade 461
circuit 389
grove 286
terrace 103
offramp 97
onramp 80
glen 57
boulevard 55
north 55
south 50
Name: id, dtype: int64
This mostly looks ok, but then we see descriptions like ‘north’ and ‘south’ which aren’t what I orignally intended. Also “offramp” and “onramp”. Those steret names are things like “Bathurt street offramp”. I don’t want the name to be “Bathurst street” and the descriptor “offramp”. I rather the name be Bathurst, and the descriptor be street offramp.
At this point I thought that I could identify all the street descriptors, identify where in the full name the descriptor would be. Then everything before the descriptor would be the name, and everything other would be an optional suffix.
This did not work. For one there was more descriptors than I thought - 192 for the Sydney data test I’ve been testing with. Including typos like “roaw” and “avenur”. Then I realised that there are streets whose name is also a descriptor: Park Lane is common, but there is also Grove Avenue, Terrace Lane and many many more.
I could have started to do something fancy like “identify the last occurace of the descriptor and everything before is the name and everything after is a suffix”, but it just seemed too much. Maintaining a list of descriptors takes time, especially when I’m doing it over a larger area than Sydney.
Intead, I decided to focus on just the main suffixes: North, East, South, West, Offramp, Onramp and exit. If these appear at the end of the full street name I’ll mark them as a suffix, otherwise I will stick to the same logic in get_name2
def three_name_model(name):
'''Gets the name of the street, descriptor and suffix
Suffix: if word ends with north, east, south, west, offramp,
onramp or exit, this is the suffix
Descriptor: the last word of the full name once any suffix is
removed. Unless what is left is a single word or starts with "the",
there is no descriptor.
Name: what remains of the beginning of the full name
'''
suffixes = ["north", "east", "south", "west", "offramp", "onramp", "exit"]
split = name.split()
suffix = None
descriptor = None
street_name = None
if len(split) == 1:
street_name = name
else:
# If the last word in the name is a recognised suffix
if len(split) > 2 and split[-1] in suffixes:
suffix = split[-1]
split = split[:-1]
if split[0] == 'the':
street_name = name
descriptor = split[-1]
else:
descriptor = split[-1]
street_name = " ".join(split[:-1])
return street_name, descriptor, suffix
streets["street_name"], streets["street_description"], streets["street_suffix"] = zip(*streets["name"].map(three_name_model))
streets.groupby("street_name")["id"].count().sort_values(ascending=False).head(20)
street_name
park 93
victoria 66
railway 57
william 57
short 53
king 50
albert 50
station 49
george 47
church 47
stanley 44
james 44
charles 43
campbell 42
edward 42
wentworth 40
john 40
elizabeth 39
gordon 39
arthur 39
Name: id, dtype: int64
The street names still appear fine. Checking the descriptors:
streets.groupby("street_description")["id"].count().sort_values(ascending=False).head(40)
street_description
street 12860
place 6362
avenue 5924
road 4485
lane 2242
close 1733
crescent 1651
drive 989
way 848
court 672
parade 472
circuit 392
grove 293
terrace 105
boulevard 63
glen 58
motorway 53
highway 32
square 29
row 27
boulevarde 22
walk 22
esplanade 21
gardens 21
bridge 17
parkway 17
mews 16
circle 15
m7 15
rise 14
glade 14
green 12
loop 12
promenade 11
ridge 11
bay 11
link 9
trail 8
tunnel 8
park 7
Name: id, dtype: int64
The descriptors also mostly appear fine. There is an appearance of “M7” which is a major highway which has many onramps, offramps and exits. But as we want to fine the most common street name, not so much the descriptor, this isn’t a problem.
streets.groupby("street_suffix")["id"].count().sort_values(ascending=False)
street_suffix
offramp 99
onramp 80
north 55
south 50
west 46
east 44
exit 21
Name: id, dtype: int64
And finally our suffixes. Pleasingly the cardinal directions have similar occurances. It also makes sense that there are more offramps than onramps as highways start somewhere (so don’t need an on-ramp) and you don’t want more input into your road than output.
The most common street name in Sydney
Finally, the most comomn street name in Sydney is “Park”, followed by “Vitoria”, “Railway”, “William” and “Short”. Most suburbs in Sydney would have at least one Park, subsequently many also have a Park Road (most often). Even in Sydney CBD there is a Park Road that goes in between Hyde Park.
Many of the other street names are related to royalty: Victoria, King, Albert and Alfred (Victoria’s second son who actually visited Sydney). William and George were also names of Kings, but they were also common first names.
A few names relate to early Australian coloinal history:
- Wentworth, an explorer (also could be his father, the first paying passenger to come to the colony)
- Hunter the second govenor of the colony
- __Cook: Captain Cook claimed the east coast of Australia for the British
- Macquarie: the fifth govenor of the colony
- Mitchell: an explorer
streets.groupby("street_name")["id"].count().sort_values(ascending=False).head(50)
street_name
park 93
victoria 66
railway 57
william 57
short 53
king 50
albert 50
station 49
george 47
church 47
stanley 44
james 44
charles 43
campbell 42
edward 42
wentworth 40
john 40
elizabeth 39
gordon 39
arthur 39
hunter 38
smith 37
carrington 37
cross 37
boronia 36
mary 36
rose 36
margaret 35
cook 34
first 34
west 34
waratah 34
macquarie 34
mitchell 34
york 33
rawson 33
thomas 33
second 32
wattle 32
western 32
bridge 31
alfred 31
russell 31
pine 31
francis 31
phillip 30
young 30
hill 30
nelson 30
bellevue 30
Name: id, dtype: int64
My original motivation for this was the most common street names for many states in the USA are plant names like Cedar or Oak. I didn’t think we did this in Australia - so I set out to find out. At least in Sydney, we see some plant names in the top 50: Boronia, https://en.wikipedia.org/wiki/Rose (although I think more likely to be a name), Waratah (NSW state emblem), Wattle (National flower of Australia) and Pine. Pine is really interesting as I don’t think there are native pine trees in Sydney. Perhaps named after Norfolk pines which were introduced to Sydney early on.
But, they are no where close to being the most common. Sure, this data is only for Sydney, but I doubt it will change when we look at the whole nation. Flora is diverse in Australia. Boronias and wattles are more or less nation wide, but Waratahs are only in the south east. Introduced pines are nation wide, but are mostly in state forests without many streets. And different states will then add their own native plants to the mix. A state may have a plant name as the most common street name, but I doubt there will be many in the top 10 street names nation wide.