36

What's a correct format of a geographical address/location which is a good fit for any address on the Earth? At the moment I have:

  • country
  • city
  • street
  • number
  • text data (for simplicity)
  • zip
  • lat/lng

But I believe I can improve it: there might be a state/region of a country or something like area. Or no area/region/state, say, in Singapore or Hong Kong.

There might be no street, but road or boulevard or something else. A number of a building might be compound. There might be a floor. A room number. Etc....

Paul D. Waite
  • 1,164
  • 14
  • 18
Xwaro
  • 401
  • 1
  • 4
  • 4
  • 11
    You need to explain for what application, and who is providing that address. E.g. on most Web commercial stores/websites, I don't type any "latitude/longitude" which on the contrary is essential for ICBMs (or GPS). Also, altitude (and time and date) is important in *some* cases (think of some ship at sea, or some traveler on the Everest). So I am not sure there is any universal answer. – Basile Starynkevitch Sep 23 '17 at 09:03
  • 62
    Worth reading: https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/ – High Performance Mark Sep 23 '17 at 09:54
  • 6
    @BasileStarynkevitch: I think it is not so much important "for what application", but "for what use case(s)". If, for example, the use case is to make sure world wide postal services can deliver mails, I guess this question can be answered in a sensible manner. However, for this use case "lat/lng" won't be required. – Doc Brown Sep 23 '17 at 12:49
  • @DocBrown: for me "application" and "use case" are very similar (in French it would be the same word) – Basile Starynkevitch Sep 23 '17 at 12:52
  • 1
    @Xwaro: the link above may be fun, but if you carefully evaluate all the potential wrong assumptions listed there, I guess you get the answer you are looking for. – Doc Brown Sep 23 '17 at 12:57
  • 34
    I think the universal format for an address is a single String. – Erik Eidt Sep 23 '17 at 15:17
  • 2
    If you really want a universal record, try a [pulsar map](https://i.warosu.org/data/fa/img/0088/65/1412059413654.jpg) :-) – Kevin Sep 23 '17 at 18:21
  • 12
    The problem you raise is so painful, that some companies out there develop their universal way to address it, for example: https://what3words.com/ (boils down to mapping location coordinates to three words). They claim, that "With what3words, everyone and everywhere now has an address." – Roman Susi Sep 23 '17 at 19:13
  • 4
    Recommended reading: [Falsehoods programmers believe about addresses](https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/) – Richard Ev Sep 24 '17 at 08:02
  • 3
    Just use text - moving few times from country to country I was calling too often to fix the address because bank made assumption about international addresses which didn't hold. I think it is a case of YAGNI - all you usually need address for is to print it and stuck on the parcel. Everything else just makes it hard for users. – Maciej Piechotka Sep 24 '17 at 08:11
  • Good luck with [Japan](http://www.sljfaq.org/afaq/addresses.html) :-) – Mawg says reinstate Monica Sep 24 '17 at 08:39
  • 1
    @RichardEverett nice one. I am adding here (since if one is concerned about adresses, one shd probably concerned about names): be careful about names, dont use first name/last name. In chinese the first name is the family name. Also some cultures dont have family names. And then there is the concept of middle names (which are not second given names). In Germany (due to inconsiderate IT guys) it is not possible to outline the 'calling name' in passports anymore (which could be your second first name). – lalala Sep 24 '17 at 14:39
  • 1
    @HighPerformanceMark I hate that list, because it lists problems without proposing solutions (which this question is asking about). – Arturo Torres Sánchez Sep 25 '17 at 13:30
  • 1
    Tangentially related: [What problems are solved by splitting street addresses into individual columns?](https://dba.stackexchange.com/questions/133546/what-problems-are-solved-by-splitting-street-addresses-into-individual-columns). I had a similar question myself. In fact, this might be a better fit for the DBA stack exchange site, but I fear it would be closed as too broad or opinion based. They should have an "Off-Topic" closing reason called "Off-Topic, because questions about storing addresses make my face hurt." – Greg Burghardt Sep 25 '17 at 17:34
  • If you're using Postgresql you might look at PostGIS for a solution. Other databases might have similar modules. – Sirisian Sep 25 '17 at 17:56
  • Technically if you want to store any location on earth it is sufficient to store latitude, longitude, elevation, precision, and optionally, time since the epoch. However, figuring out how to translate that to a form someone would understand for a particular use case might not be easy. – Michael Sep 25 '17 at 18:54
  • @michael not true, ships have postcodes – Ewan Sep 25 '17 at 22:12
  • David Hay covers this in Chapter 12 of Enterprise Model Patterns. It's pretty extensive, so you should read the book: https://books.google.ca/books?id=NozXBgAAQBAJ&lpg=PA34&ots=NALoGzkw_K&dq=david%20hay%20enterprise%20model%20patterns&pg=PA244#v=onepage&q&f=false – Neil McGuigan Sep 25 '17 at 20:46

8 Answers8

54

Google has developed a library that helps validate postal addresses for every country in the world, which you can use to design a schema to store this data.

Look for the most common required fields across addresses from your targeted customer base to get started, and as you identify further countries with different requirements you can continue to adjust your schema.

mitchdav
  • 735
  • 1
  • 5
  • 3
42

The universal way to store a geographical address/location in a database is this one:

[Address] nvarchar(max) not null

This requires the least amount of programming code (and so cuts maintenance costs) and is fully compatible with any address. It has, however, three big issues:

  • The lack of data validation means that the field can be used for the purposes other than storing the address. One of the purposes is a DOS attack intended to fill the space of your database by entering 2 GB of data in the address field.

  • The data stored this way makes it impossible to process it for business intelligence and data mining purposes. For instance, how many users are from India? There is no easy way to tell, since those addresses won't be normalized.

  • The users may mistakenly enter an incomplete or plainly wrong address.

In order to mitigate the first issue, limit the field to what you think to be a reasonable limit. Personally, I would start with 1000 characters, and then reduce it based on the length of the addresses entered by the first users once you get a data set large enough.

In order to mitigate the other two issues, you can use a third-party API which parses addresses and presents you with the data containing the country, city, postal code, etc. If possible, the API should be able to display the address on a map back to the user to reduce the risk for the user of entering an incomplete or wrong address: most users know where they live, and seeing a different position on a map would immediately give them a clue that they should check their input.

Note that whatever API you use, it won't be perfect. It will find most addresses, but not all of them. This means that if the API tells that the address doesn't exist, but the user insists that it does, you should a priori trust the user, even if he might be wrong.

This also means that you still should store the original user's input, side-by-side with the result of the API. This means that the schema becomes:

[RawAddress] nvarchar(max) not null
[ParsedAddress] xml null
Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • Note: At the very least, you could store the country separately, if this is necessary. For example, it could be automatically deduced from the address field, with the option for the user to change it. – Matthieu M. Sep 25 '17 at 17:01
  • 'use an API' just means somone else has got all the countrys' official formats. Theres no reason you cant do it yourself – Ewan Sep 25 '17 at 21:00
  • @Ewan No reasons except for time, money, language, and other barriers. – Andrew Sep 25 '17 at 22:05
  • sure, but are we providing answers on how to do stuff or comparing pricing of other people doing stuff for you? – Ewan Sep 25 '17 at 22:11
  • @Ewan: the question is about the storage format of addresses. The API doesn't dictate this format: the goal of my answer is to show that as soon as you have a plain text field and a XML/JSON/whatever field for parsed data, you can both store *and* statistically process an address from anywhere in the world. – Arseni Mourzenko Sep 26 '17 at 12:22
  • true, but as soon as you have that xml field it begs the question of what the xml structure is. ps, cant believe this q is on hold!! – Ewan Sep 26 '17 at 12:53
37

There isn't one.

Every country has different address formats. If you are lucky, and they have a format at all!

Obviously latitude/longitude will give you a point on the globe, but it's not really useful for identifying individual houses. Just consider a tower block for example.

Your best bet is to check each countries postal service for an official format. This can be great for your backend database. But you will probably have to simplify it for end users as it will contain many more fields than most people are used to.

The UK one for example includes things like 'double dependant locality', but no one would know what that meant if you asked them.

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
Ewan
  • 70,664
  • 5
  • 76
  • 161
  • 3
    What's a **universal way**........... – Xwaro Sep 23 '17 at 12:33
  • 40
    @Xwaro They just said, **There isn't one.** – Zymus Sep 23 '17 at 16:03
  • 7
    I guess Xwaro means I am assuming addresses on earth. – Ewan Sep 23 '17 at 16:13
  • 3
    This is _the_ official source for printed address formats: [Universal Postal Union](http://www.upu.int/en/activites/adressage/systemes-dadressage-dans-les-pays-membres.html) – grahamj42 Sep 24 '17 at 09:58
  • 3
    interesting. I think this is the relevant page though : http://www.upu.int/en/activities/addressing/s42-standard/compliant-countries.html you can see how A: its only a few countries, and B: the mapping from s42 to the countries address format is not 1 to 1 – Ewan Sep 24 '17 at 13:48
  • @grahamj42: well, this is not complete for France. It mentions that the postcode is followed by the name of the town - which is not always the case. For example, the address for the Social Security in the Yvelines (region west of Paris) ends with `78085 Yvelines cedex 9` - where `Yvelines` is a region (*département*) and not a town or city. – WoJ Sep 25 '17 at 16:57
  • @WoJ: The UPU reference _I_ linked explains CEDEX in the France entry - it's like having a P.O. Box no. in English-speaking countries (where the end user address may be kept secret). France is more explicit than some other countries about this (in my last job in the UK, visitors kept arriving at the Post Office sorting centre, because that's where our post code was physically mapped). – grahamj42 Sep 25 '17 at 19:48
  • @WoJ: The s42 entry for France maps **town** to _libellé d'acheminement_, precisely to cover CEDEX addresses. – grahamj42 Sep 25 '17 at 19:53
  • I guess my take away is that if you use s42, you also need the per country mapping, its ambiguous where there isnt a 1 to 1 relationship AND not all countries are supported. It just shows what a difficult problem this is. – Ewan Sep 25 '17 at 20:56
  • @grahamj42: we do have PO boxes in France (*Boîte Postale Flexigo*, among others). This is not the same as CEDEX which helps to gather large volume of mail for large recipients. What I was saying is that the UPU reference is not accurate for France, due to the mapping it mentions (what is after the postcode is said to be a locality). The entity which is behing the postcode can be anything - it is usually a town but in the example I provided it is a whole region (Yvelines - it is a big one). The complete address is "Caisse primaire d'assurance maladie des Yvelines, 78085 Yvelines Cedex 9" – WoJ Sep 26 '17 at 08:03
  • Even geographical coordinates are not universal because earthquakes tend to move tectonic plates where houses are built on. – mouviciel Sep 26 '17 at 12:48
21

The only universal format is to have a single text field which may have multiple lines of text. This will allow any possible address on earth.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
  • 2
    Great, now everybody can describe the same address in a different, incompatible way. I suppose the question didn't ask about standards, so this is technically a correct answer. – Michael Sep 25 '17 at 18:55
  • @Michael: Addresses *are* different and incompatible across the world. There *is* no standard template. Having a multiple-line field allow the user to actually write the correct address. – JacquesB Sep 26 '17 at 07:24
  • @Michael Separate fields often force me to truncate/abbreviate one field or the other, which also leads to inconsistent representations. (Still works usually, postal services are quite experienced at this). – Hulk Sep 26 '17 at 08:57
  • Confirmed: https://en.wikipedia.org/wiki/Address_(geography) – Eric.Void Sep 26 '17 at 19:14
  • Just an interesting tidbit, this isn't *technically* true. In some areas of countries, parts of addresses are drawn as pictures. – KayakinKoder May 27 '19 at 20:06
13

I have been developing software solutions to be used in many countries. We address this issue by starting with the larger entity first, i.e. country then have fields down to the least common or smallest. It works well for all countries we have experimented with so far. We also have a smart duplicate prevention system, and merger for those that have somehow get in the system since users are very 'creative'. In the admin section we have an address field order per country setting. i.e. Japan has the Post/Zip-code first where as UK/US last.

In general, we use:

  • Country
  • Post/Zip-code
  • State/Province/Prefecture/County
  • City/Town/Village
  • Street/Road/Block
  • Building Name/Number
  • Specific/Custom Information

Once entered and saved, a conjugated version can be displayed leaving out fields not necessary.

As I said, this works for all those countries we have software had software in and is the result of developing since 1989.

Hope this helps somehow or at least provides another insight.

Kevin Panko
  • 105
  • 7
Billsensei
  • 139
  • 2
  • how do you name a column in your db for "State/Province/Prefecture/County"? – Xwaro Sep 24 '17 at 01:17
  • 7
    @Xwaro It doesn't matter, name it whatever word you feel your developers will be least confused by. This is because the name is internal to your software and will never be seen by users. Address are never displayed with the name of the field. That is, you never see `No 10 Street Downing Street, City Westminster, State London, Country UK`. Instead you will see `10 Downing Street, Westminster, London, UK` – slebetman Sep 25 '17 at 00:56
  • @slebetman The question was: how do you name a column in your db for "State/Province/Prefecture/County"? Not "how do you recommend me to name a column in my db for "State/Province/Prefecture/County"? – Dari Oct 03 '17 at 03:30
  • @Dari It doesn't matter, I name it whatever word I feel my developers will be least confused by. This is because the name is internal to my software and will never be seen by users. So it depends on what my team is used to. – slebetman Oct 03 '17 at 03:35
  • @slebetman - what do you name it? – Dari Oct 03 '17 at 04:07
  • @Dari Whatever word my developers will be least confused by. I've had several jobs in my 17 years career. Not sure I remember all the names: address2, addr2, state, negeri, daerah etc. – slebetman Oct 03 '17 at 08:20
  • @Dari Also, the question is not what I name it. The question is how I name it. And the answer is not address2, addr2 etc. The answer is whatever name my developers will be least confused by – slebetman Oct 03 '17 at 08:22
0

As already stated, the most universal (but impractical to validate and perhaps least useful) is a single big unicode field.

You could separate country from the rest of the address and store it as the ISO country code. It would normalize the country and offer some utility in validating the remainder of the address.

You could also separate postal code aka zip code from the rest of the address. This would also have some utility in validating the remainder of the address, and could be helpful (though imprecise) in geolocation. For example: in Canada you can uniquely identify any address specifying only postal code and street number (aka house number); this may not be true in all countries.

Dedicating fields to states/provinces or cities starts to get more problematic because of the variations in the way each country formulates an address. I've set up address tables having such fields because the initial audience is focused on North America, knowing that an international audiences would pose a problem fitting in. In most cases, they can be "shoe-horned" in, but it is an awkward and potentially failure-prone compromise - definitely not universal.

Zenilogix
  • 309
  • 1
  • 3
0

Contrary to Mitchdav's answer, I would advise against using Google's library. I searched the repository for various international places with unorthodox addressing schemes hoping to find unit test data, but worryingly I found zero hits in the entire repository.

I think your best bet is to treat an address as free-form multi-line text. It sucks that you maybe cannot validate all addresses, but some addressing formats are really weird and possibly unanticipated and in the end the responsibility of filling in the correct address rests on the user and in most applications the user bears any negative consequences of filling in an invalid address.

You might, maybe, use a validator to provide a warning, but nothing more than that. But don't reject addresses that don't validate, because otherwise you might lose some customers. Which leads to the question of how to communicate the warning to the user in such a way that it will communicate that, if the user lives in an area with a weird address format, it's safe to ignore the warning...

Anonymous
  • 9
  • 1
0

As you say any address on earth there's only lat long or...

https://what3words.com

What 3 words, is an algorithm (so not a database so can be embedded into anything) that can define a 3x3 metre patch of anywhere on Earth.

Tonga and a few other states have adopted it as their postcode system, whilst it'll not replace it as an overlay its pretty cool, and very well built and thought out.

RemarkLima
  • 109
  • 2