AVM testing is confusing. CoreLogic is here to help… and below you’ll learn some important questions you need to be asking any testing service you employ.

But first, a little context…

The five federal bank regulators released guidance in 2010 that covered, among other things, AVM testing. The testing guidelines can be murky but one thing is crystal clear – AVM testing is big business! CoreLogic produced over 10 million AVMs in the last year just to support testing.

Some lenders successfully perform their own testing.[1] And some lenders use third party services. Virtually all testers provide their results to us, and we want to share with you what we have learned from all this testing.

The following chart should look somewhat familiar to anyone who has reviewed test results on multiple AVMs. Charts like this are intended to show accuracy and coverage.

1Q 2014 National AVM Test image

[1] CoreLogic has produced a document on AVM testing for those clients who would like to understand best practices around lender testing. It is available here.

Contact Us

Tel: (866) 774-3282

Pretty straightforward, right? Some have higher coverage and others have stronger accuracy. But wait! These results actually represent the SAME model tested during the SAME time period covering the SAME geography! The only difference is the organization performing the testing.

How can the same model perform so differently across various testers? The data pulled for the AVM test and the approach to analyzing results dramatically impact perceived performance. The result is that the testing approach can artificially bias results in favor of one model or another. Whether you are selecting a model, estimating what performance to expect in production, or monitoring your AVMs for performance changes, it is critical for any consumer of AVM testing services to understand the impact of the testing approach on the results. There is a lot of nuance in AVM testing that CoreLogic is happy to discuss with you directly.

Based on the top three testing approach decisions that have the biggest impact on perceived AVM performance, we recommend that you ask any testing service you employ these three key questions:

  1. Do you test all models under the same “Blind Testing” protocol?

    Blind testing means that a model should be tested against sale prices that the model does not already know. Referencing the “answer key” is a great way to ace a test, but it does not show mastery of the subject. It’s the same with AVMs. Typically, AVM vendors are asked to self-identify test addresses where the model knows or has a good proxy for the sale price, the “answer key”. Good testing protocol requires all models to self-identify using the same criteria. The procedure to self-identify is not standard across all testers. This creates a testing environment that may inadvertently use different gauges for establishing how the value is determined against known sales price data. The model aces the test, but the test results do not help a lender know the accuracy to expect in production. Accuracy results are skewed and not necessarily reflecting the models that can actually produce the most accurate results.

  2. Do you evaluate all models against the same addresses?

    If Model A values 123 Main Street accurately and Model B values 345 Maple Street inaccurately, which model is better? Of course, you cannot tell with that information alone. This would be an inconclusive way to compare AVMs, but you might be surprised how common it is for comparative tests to be conducted on different addresses. Removing addresses only from some models may be well intentioned, but it shifts performance results to the point where it is difficult to rank the models. The two most common reasons testing firms test against different addresses are:

    1. The method used for removing self-identified “non-blind” results from the model is inconsistent (see section 1 above) ; and
    2. Some testers try to proxy a “non-blind” testing protocol by removing any AVM results that match (or nearly match) the sale price.

    The logic driving reason 2 is that the tester assumes that accurate model result must have been corrupt. Of course, the other possible explanation is that the model simply produced an accurate result. The net is that selectively removing addresses yields results that do not represent apples-to-apples testing and should not be used to identify top performing models.

  3. What benchmark value do you use to calculate AVM accuracy and where do you get your data?

    CoreLogic has a diagnostic process that we deploy after each AVM test where we examine erroneous AVM results so we can improve the performance of our models. We have been surprised to find that some testing data does not correlate to a recorded property sale, mortgage transaction, or property listed for sale. These are the standard benchmark measures for understanding an AVMs ability to market value a property. Although we are happy to test against whatever data a lender finds relevant, we find that testing against extraneous benchmarks produces unreliable accuracy results.

Regardless of your testing approach and the models you use, CoreLogic encourages you to be a savvy tester. Appropriate testing ensures that your production experience matches your expectations. Happy Testing!