Towards Recommender Engineering

Reproducible Research and Algorithm Personalities

Michael Ekstrand
GroupLens Research
Dept. of Computer Science and Engineering
University of Minnesota

Michael Ekstrand

B.S. CprE, Iowa State University (2007)
Finishing Ph.D CS, University of Minnesota (2014)

GroupLens Research (HCI & social computing)

Advised by Joseph Konstan and John Riedl

http://elehack.net

Research Overview

Helping users find, filter, navigate, and understand large-scale information spaces.

Topics

  • Recommender systems
  • Help search
  • Wiki history browsing
  • HPC network topology visualization

Methods

  • Offline data analysis & experiments
  • User studies
  • System building

Current Research Objective

Recommender research should be

reproducible
generalizable
grounded in user needs

so we can engineer information solutions and understand recommender-assisted decision-making

Recent & Ongoing Work

Infrastructure

  • LensKit toolkit
  • Reproducible research
  • Reproduce algorithms
  • Deploy results

Experiments

  • Offline w/ public data
  • User studies

Overview

Background
Tools
Experiment
Going Forward

Overview

Background
Tools
Experiment
Going Forward

Recommender Systems

GoodReads

Recommender Systems

Recommender architecture

recommending items to users

Recommender Research

Common Approaches

Evaluating Recommenders

Many measurements:

Common R&D Practice

  1. Develop recommender tech (algorithm, UI, etc.)
  2. Test on particular data/use case
  3. See if it is better than baseline
  4. Publish or deploy

Learned: is new tech better for target application?

Not learned: for what applications is new tech better? why?

Algorithms are Different

Building a Boat Shed

An Analogy

Current practice:

Building a Boat Shed

An Analogy

Better practice

Recommender Engineering

Recommender engineering is

What Do We Need?

To enable recommender engineering, we to understand:

All this needs to be reproduced, validated, and generalizable.

My work

LensKit

enables reproducible research on wide variety of algorithms

Offline experiments
validate LensKit
demonstrate algorithm differences

improve engineering

User study
obtain user judgements of algorithm differences

currently ongoing

Overview

Background
Tools
Experiment
Going Forward

LensKit

build
prototype and study recommender applications
research algorithms with users

deploy research results in live systems

research
reproduce and validate results
new experiments with old algorithms
make research easier

provide good baselines

study

learn from production-grade implementations

LensKit Features

LensKit Project

Design Challenges

We want:

Component Architecture

For flexibility and ease-of-use, we use:

Modular Algorithms

Dependency Injection

Components receive their dependencies from whatever creates them.

public UserUserCF() {
    similarity = new CosineSimilarity();
}
public UserUserCF(SimilarityFunction sim) {
    similarity = sim;
}

A Little Problem

How do we instantiate this mess?

Item-Item CF

Dependency Injectors

  1. Extract dependencies from class definitions
  2. Instantiate required components automatically

Several of them in wide use:

  • Spring
  • Guice
  • PicoContainer
JSR330 specifies common behavior.

DI Configuration

// use item-item CF to score items
bind ItemScorer to ItemItemScorer
// subtract baseline score from user ratings
bind UserVectorNormalizer
  to BaselineSubtractingUserVectorNormalizer
// use user-item mean rating as baseline
bind (BaselineScorer, ItemScorer) to UserMeanItemScorer
bind (UserMeanBaseline, ItemScorer)
  to ItemMeanRatingItemScorer
// the rest configured with defaults

Grapht

Existing containers have some limitations:

So we built Grapht:

Composable Components

Context-Sensitive Policy

Context-Sensitive Policy

within (UserVectorNormalizer) {
    bind (BaselineScorer, ItemScorer) to UserMeanItemScorer
}

Context-Sensitive Policy

Static Dependency Graph

We compute the object graph before instantiating objects.

This allows us to

Grapht Summary

To meet LensKit's needs, we created a new dependency injector that:

Initial Results

RecSys 2011

When Recommenders Fail

Short paper, RecSys 2012

Q1: Which algorithm has most successes (ε ≤ 0.5)?

Qn + 1: Which has most successes where 1…n failed?

Algorithm # Good %Good Cum. % Good
ItemItem 1044371 52.23 52.23
UserUser 166008 8.30 60.53
Lucene 90018 4.50 65.03
FunkSVD 53313 2.67 67.70
Mean 21617 1.08 68.78
Unexplained 624291 31.22 100.00

Parameter Tuning

Outcomes of LensKit

Overview

Background
Tools
Experiment
Going Forward

Context: MovieLens

User Study Goal

Advance recommender engineering by studying how users perceive the movie recommendations from different algorithms to differ.

Experiment Outcomes

RQ1

Do users perceive movie lists from different algorithms to be different?

RQ2
How do they perceive these lists to be different?

Learn context-relevant properties for algorithms.

RQ3

What characteristics predict their preference for one list over the other?

Side Effect

Collect data for calibrating offline metrics.

Algorithms

Three well-known algorithms for recommendation:

Each user assigned 2 algorithms

Predictions

Predicted ratings influence list perception.

To control, 3 prediction treatments:

Each user assigned 1 condition

Survey Design

Analysis features

joint evaluation
users compare 2 lists
judgment-making different from separate eval

enables more subtle distinctions

factor analysis
25 questions measure 5 factors
more robust than single questions

structural equation model tests relationships

Structural Equation Model

List Properties

Secondary goal: identify measurable recommendation list properties that predict user judgements.

Outcomes:

Expected Contributions

Currently running with almost 300 completions.

Overview

Background
Tools
Experiment
Going Forward

Toward Recommender Engineering

Skills and Techniques

My research involves several major efforts:

Applications

This work

Movies
Music

Future

Research literature News/Reading Social Medical

Additional Directions

Funding

Thank you!

Also thanks to

  • GroupLens Research for amazing support
  • NSF for funding research
  • The Noun Project for great icons
    • ‘Document’ by Rob Gill
    • ‘Test Tube’ by Olivier Guin
    • ‘Experiment’ by Icons Pusher
    • ‘Future’ by Megan Sheehan