Does gender influence the price of an oil change?

We leveraged our AI voice agents and their market research capabilities to discover discrepancies in oil change quotes between perceived gender of a caller for an equivalent vehicle.

Furthermore, we compare our study performed by AI agents to a previous study performed by humans, analyzing the cost and speed benefits using our AI voice agents to show that a single instance of a Cobbery agent works 3.62x faster and costs 21x less than human callers.

The Experiment

We designed two nearly identical AI agents that would get a quote from a car mechanic over the phone. The vehicle selected was a 2011 Honda Accord, chosen for being a gender-neutral, easy to work on, and popular vehicle.

The agents were not given any other specifications about what service it wanted. When asked about what oil type, the agents would defer to the mechanic and say “Whatever you think is best”. This was to expose any possibility of upsell of higher margin services based on gender.

The only difference between the two agents was the voice: One used an identifiably male voice, another a female. They had no changes to prompt or procedure.

The dataset was pulled from the Google places API, and our agents successfully obtained 771 quotes from over 20 major US cities, spread across the country. Geographic distribution was intentionally included in the event there was a regional bias.

Calls were conducted during standard business hours, Monday through Friday, over a six business day period. Agents attempted up to three calls to each shop if unable to connect with a human representative on initial attempts. Every call attempt was made at least twenty four hours prior to the prior attempt. Agents did not share any overlapping shops.

Results

The distribution of collected data points was relatively balanced across the targeted cities and US regions for both gender personas.

Nationally, the median price quoted for the oil change was identical for both groups: $84.99. However, variations emerged across other distributional metrics:

Measure Male Value Female Value Delta ($) Delta (%)
Median $80.00 $80.00 $0.00 0.0%
25th Percentile (Q1) $65.00 $69.99 +$4.99 +7.7%
75th Percentile (Q3) $90.00 $95.00 +$5.00 +5.6%
Min (Lower Whisker) $29.99 $38.00 +$8.01 +26.7%
Max (Upper Whisker) $125.00 $130.00 +$5.00 +4.0%

Visually, the distribution for female callers appeared somewhat sparser at the lower end of the price range. It appears that there is higher price disparity at the lower end of the distribution (Q1 vs Q3). Additionally, with all points but the median being higher for females, it suggests that if the quoted price isn’t “normal”, it will be higher than the quote for a male.

It’s also clear from the striping pattern that quotes commonly fall on $5 increments.

White it may appear that females are being charged more at the extremes, a Mann Whitney U test suggests that the disparity is not statistically significant:

The test yielded U = 78740.00 and p = 0.1205 (p > 0.05 indicates statistically insignificant).

Despite that, it’s clear there is some difference, and digging into regional differences reveals more nuance.