3

I have for a long time been working on an ODS as well as Data Warehouse. Both are integrating a wide variety of data sources from stove pipe applications. One of the uses of the ODS is to provide data to other stove pipe applications.

Imagine one app maintains a database of personnel, and another app manages tracking sales. Occasionally the Sales app may need to have a drop down of personnel that someone can pick from, say to credit a particular employee with a sale/commission.

The Sales application can query the ODS to get the list of personnel. This allows the Personnel app to change its data structure, and the ODS modifies the ETL process to adapt to that change. Thus all the other apps consuming that data will not be impacted by the change.

The Sales app will need to save a PersonnelID to that sale/commission record in its own database. However, the next time the ODS is refreshed, if it is using a full load technique, the key will change. Since the PersonnelID stored in the Sales database is a separate database, there is no straight forward way to cascade that change.

This creates a challenge where any changes made to the ODS have to be made very carefully, and may even limit certain designs because external applications are depending on those keys to never change. I would usually avoid exposing keys to users, but in this case it seems necessary to allow external apps to reference enterprise wide entities in their own applications.

The same goes for lookup tables that are available in the ODS, where a lookup table has keys and text.

After a full load of the ODS I can ensure the keys satisfy referential integrity within the database, but not with external databases utilizing those keys. Since there are some parts of the ODS currently coded as a full load, which would cause keys to be regenerated, I would need to recode that ETL to be incremental, so that external databases can reference those keys without fear of them changing.

What techniques are used when you have an enterprise wide data source, and other applications consume that data and need to store foreign keys referencing entities in that data? How do you decouple the foreign key references as much as possible without complicating the access to that data?

Currently I am using table valued functions to provide access to data. I chose this approach because it allows parameters, joins, and decouples access to underlying tables that may change later.

AaronLS
  • 206
  • 1
  • 9
  • 1
    flagged for migration to dba.SE –  Aug 03 '12 at 20:21
  • 1
    This is not an administration related question. I have had questions get toggled back and forth several times between stack exchanges before due to trigger happiness. Hope that doesn't happen here. – AaronLS Aug 03 '12 at 20:22
  • What sort of questions? It's worth reviewing the FAQ for each site before posting. – ChrisF Aug 03 '12 at 20:24
  • That advice should be directed at the mods who can't decide which SE a question belongs in. If it gets moved more than once, then only one of the moves is right. – AaronLS Aug 03 '12 at 20:28
  • IANADBA. Your question is about Advanced Querying, Data Modeling, Data Warehousing, and Business Intelligence, which are all OT per the DBA.SE FAQ. If your Q is not OT for DBA.SE then I'm totally misunderstanding either your Q or their FAQ. –  Aug 03 '12 at 21:36
  • 1
    @GlenH7 - by "OT" do you mean "on topic"? – ChrisF Aug 03 '12 at 22:37
  • I'm not disputing what the FAQ says. Just want to avoid multiple times. There are people on DBA who will move non-administration questions in a heart beat, regardless of the FAQ, and then I get caught up in a meta argument that detracts from the Q/A at hand, but it seems we have already descended into that. – AaronLS Aug 03 '12 at 22:49
  • The solution to the problem is to bring the FAQ and SE name "DBA" in sync, because those topics are not related to administration. FAQ or name should be changed IMO, but that's a different issue than the one at hand. SE is overmoderated and results in pedantic debates that I really grow tired of. If a mod or group of voters sees a grey area where a question can fall in more than one SE, they usually will opt to move it because it feels good, and then some other group moves it back. They will apply whatever interpretation results in them being able to take action because that feels good. – AaronLS Aug 03 '12 at 22:51
  • @ChrisF - yes, my bad. I meant on-topic. Guess I was a little rushed trying to get out of the office. –  Aug 03 '12 at 23:41
  • @AaronLS - As I'm not a DBA, I don't hang out at DBA.SE much. Sounds like they have a distinct problem if they are moving out questions that the FAQ would otherwise say are on-topic. Have you brought the matter up in their meta? –  Aug 03 '12 at 23:43
  • @AaronLS - you need to bring up any questions you have over the DBA FAQ on their meta. Discussing it here won't help. – ChrisF Aug 04 '12 at 13:16

1 Answers1

2

You have to adhere to these rules:

  1. OLTP applications don't change keys.

  2. ODS generates its own Business Intelligence keys.

  3. Datawarehouse database never references OLTP keys and must use keys generated by ODS (step 2).

There is no way to go lean about any of the above rules unless you do a full load every night.

Trying to get "deltas" (changed data only) from OLTP is usually a nightmare in large enterprise systems like Siebel, SAP, etc. unless your ETL provides "connectors" (canned queries against ERP and similar tools) and you have expertise in the source OLTP system. Even then, it is usually difficult because of the complexity of the such enterprise schemas.

Again, you have to adhere to the above rule and most probably, there is no other way. Kimball, has a good set of books on the subject if you care to dig for more.

NoChance
  • 12,412
  • 1
  • 22
  • 39
  • I pretty much follow those rules(have read through the DW and ETL Kimball books to). However, ODS keys currently change because I do full loads in some places. This is not a problem for the DW because it is full load as well in those cases, so it gets the new ODS keys. Challenge is with other databases/apps that consume ODS data such as the example with saving a PersonID(ODS generated key) to their own database, and assume that key will never change. This means I can't do full loads anymore in ODS, and then have to do incremental/delta loads from OLTP to avoid key changes in the ODS. – AaronLS Aug 03 '12 at 22:41
  • 1
    OK, I see the issue. I think there is no escape from doing deltas...However, from an architectural perspective, what is the ODS doing in the middle between 2 OLTP systems in your case? – NoChance Aug 03 '12 at 22:47
  • Avoiding many-to-many relationships between OLTPs. If the HR OLTP changes, then the ODS adapts it's ETL to that change, and the Sales, Inventory, etc. OLTP's can still grab HR related data without being impacted by the change. Thus you reduce the amount of work to adapt to changes in each source OLTP, and also smooth out some of the relationships that may be implementation specific, and instead the ODS has a data model closer to the business model. Which makes consuming the data more straight forward. – AaronLS Aug 03 '12 at 22:55
  • All in all I think you are right regarding the fact that I won't be able to avoid doing delta loads. – AaronLS Aug 03 '12 at 22:57
  • Thank you for sharing how you use ODS concept. I used it in the past as an integration point to Data Warehouse only not to OLTP, and hence did not encounter your problem. Good luck. – NoChance Aug 03 '12 at 23:28