Page 6 - summer of ai
P. 6
6
DATA & ANALYTICS SUMMER OF AI JULY 2019
Natural language processing and data ingestion
Learning from experts
Having data is crucial, but without AI for perfecting unstructured data, your data is meaningless
Dr. Rajkumar Bondugula (aka Dr. Raj) describes the ways in which natural language processing (NLP) perfects unstructured data
– like the data from Departments of Motor Vehicles – by learning from experts during the first stage of the insights supply chain: data ingestion.
Challenges with ingesting text
To understand the challenges with ingesting text – one of the most ubiquitous forms of unstructured data – through NLP, Dr. Raj details one of the most common ways to register a vehicle in the US. “When you go to the DMV and hand over your documents to the clerk for your registration, the clerk captures
all your information manually. Inconsistencies such as conventions and typos creep into the data because of this manual process.
These inconsistencies will lead to incorrect conclusions.”
In the most recent lost sales analysis study on auto lenders, Piyush Patel, a senior big data engineer in the Data Science Lab, has identified 720,000 raw lender names in the data sourced from DMVs across the country. However, internal data shows that there
are only about 30,000 auto lenders in the
US. Piyush highlights a specific example
to illustrate the magnitude: 4,198 different versions of one auto lender’s name exist in the vehicle registration data. When you consider 63 percent of Americans applied for their auto loan at the dealership when they purchased their vehicle in 2018, you can understand why Dr. Raj and his team care about perfecting the data.
Dr. Raj’s team created an expert system to
standardize the lender names that feed into TradeSight, a market intelligence platform for auto lenders and dealers. An expert system
Acolleague of mine recently purchased a new truck. After purchasing the vehicle, he had multiple methods to register the vehicle: mail, electronically
at the point of sale, or in person at the Department of Motor Vehicle (DMV). According to analysts' projections, his vehicle is among the 281.3 million vehicles registered in the US in 2019, up from 270.4 million in 2018.
54%i
Lender names with single DMV records
Each vehicle registration contains valuable data that sheds light on the car buying experience – from the lender underwriting the loan to the dealer who sold the car.
Though access to this raw and unstructured data – captured through different channels
by different humans with different language patterns – is crucial, the mere existence of this data is not enough for businesses to generate insights.
Using NLP, Equifax perfected data hidden deep inside 170M DMV records. Now auto lenders and dealers can extract meaningful insights and improve the car-buying experience