Extracting Property Quality and Condition from Real Estate Agent's Comments

By Bin He AND Kien Trinh Real Estate

Quality and condition are two key components used to determine the value of a residential property. A newly renovated home that has a gourmet kitchen and an updated bathroom will be much more expensive than a similar home with only above-average quality and condition. Unfortunately, these two factors are hard to come by, which is why most automated valuation models (AVMs) assume average quality and condition. As a result, properties at two ends of the quality and condition spectrum often are very difficult to value.

In order to obtain property quality and condition, there are two questions that need be answered. The first question is how to rate and standardize quality and condition. As a matter of fact, Fannie Mae and Freddie Mac (the GSEs) have established specific definitions for quality and property and require appraisers to rate them on a standardized scale from C1/Q1 (best) to C6/Q6 (worst). For detailed definitions of C1-C6 and Q1-Q6, please refer to Fannie Mae and Freddie Mac Uniform Appraisal Dataset Specification

The second question is where to obtain quality and condition information in addition to the appraisal data. Unless you are one of the GSEs who receive almost all the appraisal data, you probably don’t have good coverage of property quality and condition. However, the advances in machine learning make it possible to extract information from nontraditional data sources: texts and images from Multiple Listing Service (MLS) data. Below is part of a real estate agent's comment for a property that was sold recently and appraised as a C5 rating.

"Great opportunity to own a large home on a large lot with tons of possibilities!"

By the GSEs’ definition, a C5 is in need of some significant repairs and it appears that the selling agent hinted at this in his or her comments when the property was listed for sale.

CoreLogic has developed a model that leverages various machine learning techniques and its rich appraisal and MLS data assets to extract property quality and condition information from real estate agents' comments. Figure 1 uses a word cloud to illustrate the words most commonly used by real estate agents when listing properties for sale that were subsequently sold and rated as C6s.

Figure 1: Frequent Comments for C6-Rated Properties
Real Estate Comments for C6 Quality and Condition Properties
Sources: CoreLogic
© 2020 CoreLogic,Inc., All rights reserved.

By the GSEs’ definition, a property rated as C6 has substantial damage and requires substantial repairs and rehabilitation. It is not a surprise to see key words such as “tlc”, “needs repair”, “fixer upper”, “fantastic opportunity”, and “rehab”. Counting the appearance of negative words is a simple way to get quality and condition indicators. Furthermore, a sophisticated machine learning model can derive meaning from text.

This new property condition algorithm is just one of the innovations driving our new Total Home ValueX AVM, announced last month.

©2020 CoreLogic, Inc. All rights reserved.