No CRE data warehouse, and bad data quality? Why, and how to solve it

We unpack why our industry is lagging in data – to its great detriment, and despite the obvious incentives. We also share knowledge not commonly known outside of data engineering circles (likely “common sense” to CRE professionals). Then we provide a roadmap explaining what can be done about it
Commercial real estate industry lagging in data

Why should we care?

The use of some data in business is so valuable, they made it illegal. Example: insider trading.

To build on this, the $33T commercial real estate (CRE) industry all runs on data. Fortunes are made and lost on the ability to use this valuable data effectively.

So, firstly, the powerful incentives are there to sweat data better.

Secondly – the current state of affairs is frustrating to CRE professionals. We hear your question:

“But easy analytics tools exist, so why no analytics?”

Qlikview, PowerBI, Google Data Studio, Metabase, Tableau and other tools are literally “sitting under our noses”. They are cheap and easy to use. We have seen the impressive demo videos and presentations. Surely it’s a case of pointing these tools at data, and we will get the outcomes?

So, why are we not? Where are the analytics, where are the data wins?

It’s not due to fear or lack of ability of the professionals themselves.

Our position: as things stand now, the CRE industry’ is not winning with data because of…

Two key missing building blocks

  1. The scarcity of OLAP databases (otherwise known to a property business as their CRE data warehouse)
  2. Unreliable data quality. Otherwise known as “junk in, junk out”

This is the uncomfortable (and unspoken) truth. Just like winning a rugby world cup, there are no silver bullets or short cuts to success. Without these two building blocks, the CRE industry can’t move forward in managing the considerable data it has at its disposal, for good.

Housekeeping

This is a hard article to write because there is jargon. Sorry. We need jargon to make sure this meets our goal of sharing knowledge, while avoiding being technically wrong, or oversimplifying. So please bear with us. We will try go from simple to complex. A glossary of technical terms is at the foot of this post.

Why the missing building blocks?

Commercial property data

Principles

Before we get into the detail, it’s good to start off with a quick database 101. We’re going to share information that is known to data engineers, but has not yet found its way to professionals in business. If you can understand these basics, you are well on your way to being better informed, and making better decisions.

To start off, Excel is a database of sorts. To many data lovers, including us, it is their “first love”. However, it’s not a database in the true sense of the world – and certainly not a CRE data warehouse. Why is this? Excel struggles to:

  • Support hierarchical (parent-child) relationships,
  • Provide a centralised / “one-version-of-data-truth” working environment, and, as we all know,
  • Handle version control issues (many people working on the same workbook).

So what are true databases? Roughly speaking, there are two types: (their detail is unpacked later)

  1. An OLTP system. Most (if not all), software you use runs on this.
  1. An OLAP database system. This can comprise a CRE data warehouse / data mart / any central storage system plus online analytical processing (OLAP).

OLTP systems (1. above) generally provide the source data. This data is then fed into your OLAP database system (2. above). Hence the data is largely the same. A key difference, is the shape of the data.

Why are OLAP databases so scarce in CRE?

The obvious one: we CRE professionals, as non-technical people, simply don’t have the skills to build the OLAP databases for data to be stored in ourselves, or build processes to automate moving of data between databases (aka ETL – please see more on this word below).

However, even software engineers with deep OLTP database competencies, do not have the combination of skills and experience to succeed in designing and building OLAP databases.

Why is this? While a goalkeeper and a striker can still play football, they both have different jobs on the field.

This is the same with software engineers – there are horses for courses.

So it’s partly a skills issue…

Indeed the provision of successful ETL and OLAP database projects needs a rare phenomenon: data engineers, who understand the commercial real estate domain, and who have paid the school fees earlier (at their expense, not yours).

Secondly it’s a “don’t-know-what-you-don’t-know” issue. Highly-talented, highly-qualified software engineers we respect are not aware of the need for data engineering. And, even if they are, they struggle to see the value. While, in parallel, their clients, CRE professionals, not being technical, can’t be expected to “mark their homework”.

The longer this basic info is not mainstream, the more inefficiency, lost opportunity, failed data projects, and wasted money we will see in our industry. We can’t afford such misunderstanding. Not now with the industry suffering. Things need to change.

Why the data quality issues?

Data quality is proactively solved by data governance. Unfortunately data governance is only appreciated when data or at a later stage when insights are required, and so data governance is generally neglected or avoided.

The best way to manage data quality is at point of data capture

This happens in your OLTP system.

However, to be fair, your OLTP database didn’t “sign up” to manage data quality, and deal with huge levels of data complexity.

At minimum, for data quality, the following seemingly simple things need to be happening:

  • The fields on the data assets should all be standardised – for example, entered by dropdown, not free text.
  • To avoid duplication, all these data assets should be mastered (with real world unique identifiers), with validation happening at point of capture.

Again, life is similar to technology. A little bit of basic data hygiene every day, added up, is far less than the “Big Bang” of effort required to reverse a situation.

However, where the littlie daily data hygiene effort is not made, a Big Bang fix is needed. This requires two steps:

  • Step 1 or cure: A “renovation” on the data – and cure is more complex¹ than prevention
  • Step 2: The implementation of the very same data governance processes and tools that were ignored in the first place. This second step prevents the cycle repeating.

Why do OLTPs cause bad data…

Commercial real estate (CRE) is complex, and the building of an OLTP happens

  • fast by developers (because the perceived business value of this phase is low),
  • with limited CRE domain information for said developers (CRE is a relatively unstructured industry, with few data principles / standards).

Secondly, unlike other industries, CRE deals with not two or one, but three of the four enterprise data assets. These, we as CRE insiders know well. We handle them easily in our brains. They are: property/locations, businesses and contacts. But the data structures to handle these relationships are not simple. For example, consider the basic “parent child” data hierarchy of a property and its units.

Thirdly, CRE is very “data heavy” – see more here. For example, on a property unit in Gmaven, there are 1.6K fields. To avoid “data debt”, each of these fields needs standardisation and data validation effort, up front.

Exacerbating this, such data complexity is difficult to manage using traditional (read older) OLTP patterns and technologies, because requirements are often “fast-changing”.

But why did we allow this to happen?

Major reason – pragmatism: just like it can be prudent to take on the stress of financial debt to get financial gain, the same with “data debt”. Incur the debt, get to profit. Then, with the extra money you have earned, use it go back and fix. It’s not the end of the world, and it’s perfectly rational.

Okay now that we hopefully haven’t lost you, we get a bit more technical – but this deeper knowledge will stand you in good stead.

Commercial real estate data flows

OLAP database (sometimes termed a CRE data warehouse) unpacked

Following your high-level introduction to an OLAP and OLTP, we go deeper…

“An OLAP database? We already have a database, why do we need another?”

Indeed. But, while a hammer and a screwdriver are both tools, they have different uses.

Same with a OLTP database and an OLAP database.

Both databases store data. So, what are the differences?

OLTP database

The focus here: data in.

A OLTP database is like normal Excel. It’s quick to enter (add, update or delete) bite-sized info at the speed of a human working on a computer. (Notice, unlike an OLAP database below, there are no analyse actions). It’s made for speed. It’s designed for lots of small actions (transactions), fast. And to use Excel as an example, the data lives in one workbook, in one spreadsheet.

The bulk of data governance happens here.

To echo above, OLTP databases feed OLAP databases.

OLAP database

The focus here: information out.

An OLAP database is used primarily to analyse data (not enter data). This different design is used to process big amounts of data, time data, optimise data read speed. To continue the Excel metaphor, its data would be stored across lots of spreadsheets in various workbooks. The Excel experience to users is more like a pivot table.

The OLAP database puts your data into the “shape” to be used.

OLAP works with a data warehouse / data mart / central data storage system

To be technical, an OLAP database encompasses OLAP systems and databases.

Why the confusion?

Like most things with tech, you can achieve short term wins. Similarly, using an OLTP database, you can get analytics results early on. Thus, initially, it doesn’t feel like there is a need for an OLAP database. However, there is a dark side to the wins. Such wins are unsustainable, and are projected forward by users and engineering teams. So you beaver on, using the wrong tools for the right job. The problem is that, eventually you can “no longer push the golf ball down the hose pipe”.

And when you get to this sad day, you are forced to rethink everything, and rebuild.

In conclusion

Understandably, everyone in CRE wants the exciting “James Bond” benefits of data analytics. But, before that, there is a whole bunch of “Miss Moneypenny” data quality / data governance rigour which has to happen.

Just like James Bond can’t do his thing with Miss Moneypenny, it’s the same with data analytics. Junk in, junk out…

Secondly, for effective data analytics, you need the data to be in the shape of an OLAP database. You can’t run effective analytics on an OLTP database.

Data jargon explained

“This ETL word I hear about?”

ETL = moving data, and stands for extract, transform and load. ETL is used to move data from a OLTP DB to an OLAP DB.

The nice thing about ETL is that it is rule-driven, so it can be automated (a revolution is coming to CRE!). The good news here: compared to the manual (read highly-skilled human hands) moving of data: error rates go down, speed goes up, and cost goes down.

“Okay so what is OLAP then?”

OLAP = Online Analytical Processing. OLAP is an engine that sits, generally, on top of a CRE data warehouse (but it can sit on any other database). OLAP is what a pivot table does when sitting on a spreadsheet in Excel.

“Is an OLAP the same as a CRE data warehouse?”

No.

A data warehouse is a place to store data in an easy-to-analyse format.

OLAP is an engine to analyse that data. Another word for an OLAP is a “view” on your data warehouse.

An OLAP complements a CRE data warehouse.

“So, the big question, what is the optimal database structure?”

It depends on what you want to do.

If you never need to do analytics, you don’t need an OLAP database.

However, what if you do?

In a perfect data world you would have two databases – your OLTP database manages data quality, and your OLAP database (aka CRE data warehouse) exposes your info to your analytics or BI tools.

Given CRE’s complexity, it’s therefore inevitable you end up with two OLTP databases – for example:

  1. An ERP system to handle the accounting related for payables (paying service providers – cleaning, rates and taxes, security etc.), and collecting receivables (rental from tenants). ERPs are complicated engines, and are focused on doing one (very important) thing well. They form the heartbeat of a property business, and are generally used by accountants and lease admins.
  2. The second is a user-friendly OLTP database / data governance environment that allows CRE-specific data to be collected, organised, stored, processed, and viewed. Data stored ranges from images, to documents, to property attributes, to even financially theoretical concepts like vacant space. This solution is not aimed purely at accountants / lease administrators, but is used by a wider range of CRE professionals – including asset managers, executives, and property managers. This system would need to talk to your ERP (clear master-slave roles defined), and can be used also to run audits vs the ERP solution. Critically this solution must support the centralisation of data (behind appropriate permissions), and avoid strongly-defined data structures. Why? CRE data –  remember the fast-changing requirements – is in a state of mutation. In our (hard-won) experience, such an OLTP system needs a different database paradigm to store such data.

Your OLAP database, feeding from your OLTPs (and data sources like IoT), allows for data to be analysed and processed, supporting:

  • Descriptive analytics (what),
  • Diagnostic (why) – and ultimately, later on,
  • Predictive (what if) and
  • Prescriptive (do this) analytics

Such OLAP databases will form the bedrock of machine learning and artificial intelligence processes.

¹ Competencies required to fix data

These five cornerstones are non-negotiables for rectifying data of doubful quality. Be weak in one, and your project is doomed.

  • Deep domain IP for the industry data processing
    • So, for example, if you are processing commercial real estate (CRE) data – you need lots of domain IP. You don’t want to be learning this midflight, while trying to be solving other hard problems
  • Unique algorithms,
    • Fixing data at scale, successfully, sits on foundations of research and development, trial and error
  • Significant data processing code assets
    • The ability to process high volumes of generally non-OCR-friendly, non-standardised data
    • The technology to process data at scale, and address technology’s scale constraints
    • The ability to build code assets to facilitate the automated processing (including calculation) of data at scale, and update based on (often) iterating requirements
  • Access to reference data
    • Generally public sector databases – like CIPC, deeds office, surveyor general
    • In-public domain databases – like geolocation information
  • Deep levels of business processing outsourcing competencies
    • Access to high volumes of trusted, qualified data workers and the ability to handle queries at scale

About the author

Related posts

Commercial property back office efficiencies
PropTech Tips
6 ways your CRE back office can be your secret weapon

The days of your back office salary being a grudge purchase are over! As tech automates increasing levels of “hands work” away, these efficient team members, who (crucially) understand CRE, can move into the “head work” that makes your business more competitive. This article explains how.

SEO defined
PropTech Tips
SEO for CRE? Your no-BS answers here

This article aims to simplify SEO, make you comfortable around all the buzzwords and jargon, and prevent you from getting ripped off. We give you the three ways to make your site search engine friendly aka SEOptimised, and work harder for you. Once the basics are in place, only then do we talk about how to make Google richer than it already is.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed