| |
September 15, 2004
"In the end, you're measured not by how much you
undertake but by what you finally accomplish."
-- Donald Trump, The Art of the Deal
Over this past summer, I had the chance to stop by the third
annual Emetrics Summit held in Santa Barbara, California.
This event, organized in California and London each year
by Target Marketing's Jim Sterne, discusses the subject
of how to measure and improve online marketing efforts.
In this issue of the Apogee, I provide a review of one of
the sessions given by Ronny Kohavi, who is Director of Data
Mining and Personalization at Amazon.com. Amazon.com is an
organization that is able to bring tremendous resources to
address issues such as online personalization and process
automation. The stakes here are HUGE... small improvements
to the Amazon.com site result in increases in revenues
measured in tens of millions of dollars. Ronny's session
provided a fascinating opportunity to glimpse the kind of
"Rocket Science" practiced by one of the largest scale
online marketplaces in the world, and I'm pleased to be able
to relate some of the details to our readers.
Ronny's team at Amazon.com has 70 people in it, and everything
they do is focused on automation. They take things that
people have done by hand and that have been shown to provide
value, and they automate them, so that they can continue to
receive value without having to reinvent the wheel each time.
Amazon is a Fortune 500 company, with over 41 million active
customers within the last year. They fulfill to over 200
countries, and were listed as the 74th most valuable brand
according to a recent Business Week survey. Last year they
booked 5.2 billion dollars in revenue and were profitable.
Amazon has six global sites: The U.S. site, a U.K. site, a
Canada site, a Germany site, a France site, and a Japan site,
all running on the same platform in a distributed development
and deployment environment.
What is Amazon's vision? It consists of two things. One is to
offer the Earth's biggest selection, consisting of many millions
of products, and they are continually expanding the depth and breadth
of their offerings. The second part of their vision is to be the
Earth's most customer-centric company, focusing on how they can
continually improve their ability to help customers find what they
want, perform research, and make purchases.
Amazon is organized into small, cross-functional teams of
both business and technology people that are able to execute
end-to-end. Teams are given qualitative performance goals to
meet, and then they are allowed to figure out how to best meet
those goals.
Amazon's business strategy focuses on price, convenience, and
selection:
They do not consider themselves just a retailer... they consider
themselves to be a technology shop, building a platform for other
vendors. For example, Target Stores (http://www.target.com/) run
on their platform. In Q1 2004, 23% of world-wide items sold over
Amazon.com were sold from retailers other than Amazon.com.
They do 19-20 inventory turns a year, with a gross margin of 24%.
This compares with Barnes & Noble (3 turns/year, 27% gross margin),
Costco (11 turns/year, 12% gross margin), Home Depot (5 turns/year,
32% gross margin), Best Buy (7 turns/year, 25% gross margin), and
Wal-Mart (7 turns/year, 22% gross margin).
This is important because Amazon has a negative operating cycle,
and they know of no other retailer that has one. It means that,
on average, if they receive a product from their supplier on
Day 0, then they will ship the product to a customer on Day 20,
receive the customer's payment on Day 23, and finally pay their
supplier 21 days later on Day 44. So as they grow, they don't
need more money to build inventory.
They work to provide an extremely high level of site availability,
and each service they provide must be operating at all times. They
need to have graceful failures if some things should break.
They have revenue projections for every minute of every day, with
upper and lower bounds. Alarms will go off when revenues go out
of limit. People will get called out of meetings or paged to
immediately diagnose and fix their problems.
They also sign internal performance service-level-agreements...
for example, if someone responsible for a section of the site
wants to have their offering featured on the home page, they
might have to guarantee that their offering will be available
99.99% of the time with pages returned within 2 seconds.
They do a lot of A/B testing, where one segment of their audience
is given one version of their site, while another segment of
their audience is given another version. "This is the ONLY way
we know to do honest experiments." They have other tracking
mechanisms within the site... for example, monitoring how many
people click on links, but A/B testing is the most reliable.
The ability to do these tests easily is built into their
platform, and every new feature that they introduce goes
through these tests.
Every day at Amazon, there are probably 4-6 tests ongoing. The
software allows them to tweak a feature in an experiment, and
quickly have simple but detailed reports that assess changes
in revenues, changes in order sizes, and how it might, for
example, have increased revenues for books but decreased
revenues for electronics. An experiment might have had a negative
overall impact, but it might have a positive impact for a certain
audience or a certain product segment. In those cases, Amazon
tries to learn from what happened and design a new experiment
that will have a positive overall impact.
Challenges they face in running A/B tests include:
- Test conflicts. When two experiments touch the same feature,
if they are not careful, it can make it hard to assess which
experiment is responsible for an observed effect.
- Long term effects. Some features work well at first, but
then die out. Other features, such as "search inside a book,"
may not be greatly appreciated at first, but gain value in the
eyes of users with time and experience.
- Primacy effects. Some changes (such as site navigation changes)
may not produce good early results because people are used to the
site being the old way, and it takes time for them to become
comfortable with the new layout, even if it offers advantages.
- Consistency. Because of A/B testing, individuals may see
different versions of the site if they access it from different
physical locations.
- Statistical tests. Distributions are far from normalized,
which makes it hard to use standard statistical tests that
would tell you if a change that is seen is significant. They
have a large mass at zero (no purchase).
A fundamental tenet of Amazon's culture is that "Data Trumps
Intuition." Over and over again they have found that people,
even experienced employees, will advocate changes and new
features for the site that they "are absolutely sure" will
produce strong results. But many ideas fail to show
significant improvement. But if Ronny's team does 50 experiments,
and 4 of them produce good results (1% improvement apiece), and
if each percent increase means $50 million in revenues, then the
impact of running those experiments is truly significant to
Amazon's bottom line!
Ronny talked about conflicts between focus groups and A/B
testing. Every focus group they conduct consistently tell them
that the site is too complicated, and there are too many
features. Yet in testing, they consistently find that the site
performs poorer when they remove features or otherwise design
to try to make the site simpler. They have NEVER been able to
take a feature out and show that it has a positive impact on
site performance.
Adding the "we have recommendations for you" feature has proven
to be quite statistically significant for them in terms of
increasing sales.
Stating that "data is king at Amazon," Ronny talked about a
number of examples of data driven automation. These included:
- Management of real estate on the Amazon home page. For their
home page, use of the space is highly contentious. Every
category VP wants top-center placement for their offerings.
Friday meetings about placements for the next week were
getting too long, too loud, and lacked performance data. Now
they have entirely done away with these meetings and
arguments, letting automation replace intuition. The staff
members that used to spend their time arguing about placements
on the site now spend their time in more productive endeavors.
Anyone in the company can submit content for the slots on the
home page, and the content is run for several thousand
impressions. Those campaigns that perform best get run the
most, as determined by real-time experimentation with real
customers. You will often see a credit card offer in the
prime real estate on their home page, because it consistently
brings them the greatest revenue return of anything they have
tried to run in that slot.
- Automated email measurement and optimization. For email
management, they have 41 million active customers that they can
target with their email campaigns. Their email campaign calender
used to be manually managed, and results were difficult to
measure. Their new system does automatic testing between
different creative alternatives, allowing those that perform
best with test audiences to be distributed to wider audiences.
They also automatically run about 1000 different campaigns now
per day that no human ever sees. Some of these campaigns may
only target three customers that meet the targeting criteria,
and it would be impossible to segment their audience that finely
unless the system was automated. Their system also avoids
sending out campaigns that have low clickthrough rates or high
unsubscribe rates. It manages customer inboxes to prevent them
from receiving too many promotions, even if they are in the
target audience for many of them. One problem they are dealing
with is that sometimes promotions are more successful than
their current inventory will allow them to fulfill, so
eventually they will want to have a feedback loop between their
promotions and their inventory.
- Tieing in behaviors of customers that have made same
purchases or product searches. It is very helpful to site users
to know, for example, that 38% of the people that searched for
DVD X ended up buying DVD Y. This feature relies on having and
crunching massive data. But there can be problems in this kind
of a feature when search key words turn up results that are not
really substitutable products. For example, some people that
search and look at big concrete vibrator machines also may look
at vibrators of an adult nature. So they need a sensitivity filter
to not inadvertently show inappropriate results, even if there is
a high correlation. They also need to take into account that in
some product categories the products run their life cycle very
quickly, and it doesn't help to show a product that has
correlated well in the past, only for the customer to find out
that the product is no longer available.
- Making custom recommendations. They use a relatively simple
algorithm fed off of a massive dataset. With millions of
customers and millions of items, making recommendations in real
time is a challenge. They want to make new recommendations for
each customer based on their purchases immediately on check out.
- Goldbox. The purpose of this feature is to introduce people
to items that they are not aware that Amazon sells, and it
makes a real difference in sales. They give away their profit
margin on items that they think customers are not likely to
purchase from them, in order to get them used to buying new
categories of purchases from Amazon. Customers complain that
the offerings are not targeted to their behavior, but it is
entirely by design... the purpose of this feature is to
encourage new behaviors.
- Sponsored links. These are short, text-only ads, purchased on
a cost-per-click basis on sites like Google and Yahoo. They
built a system that generates keywords automatically, writes
the creative, determines the landing page, and supports bid
management based on all their product names. Their system
adjusts bids based on measured conversion rates and profit
per converted visitor. The system needs to be able to adjust
quickly, because some keywords will produce large click rates
but small sales conversion rates, and if they are unable to
quickly react to those situations they could incur significant
losses. They estimate that their ads now make up approximately
2% of Google click-throughs, bidding on millions of words.
All in all, it is clear that while Amazon.com has made tremendous
progress, they still struggle with finding satisfactory solutions
to many of the same challenges that the rest of us do (but on a
larger scale!).
The above session summary barely touches the surface of what was covered at
the Emetrics Summit. Jared Spool of User Interface Engineering
gave a truly outstanding session covering the kinds of things that
organizations don't realize they are doing that cause them
significant lost revenues from their online marketing practices. (You can find our review of that session using this link.)
We also heard case studies presented by InterContinental Hotels,
SAP, Hewlett Packard, Avaya and SmartDraw. Jim Novo gave a great
presentation on determining customer lifetime value, Terry
Lund covered the topic of how to evaluate vendors of Emetrics
software tools, and Eric Peterson of Jupiter Research talked about
key performance indicators for web analytics. A panel of Emetrics
software vendors gave briefings on the strengths of their products,
and answered audience questions as conference attendees tried to
sort through the various offerings.
If you have an interest in learning more, you can get a copy of
the full handouts from the Summit along with audio recordings of
the full sessions at:
http://www.emetrics.org/summit604/proceedings.html
Will next year find the Summits answering the same questions with
new answers, or will it address entirely new challenges? Some of
both, most likely. The 2005 Summits are already being planned.
They will be held in Santa Barbara June 1-3, 2005 and in London
June 8-10, 2005.
Details are at: http://www.emetrics.org/summit605/index.html
"The measure of success is not whether you have a tough
problem to deal with, but whether it is the same problem
you had last year."
-- John Foster Dulles
|
|