Без темы
<<  Би 2 альбома в иркутске Бианки разговор птиц в конце лета  >>
Regret Minimization and Job Scheduling
Regret Minimization and Job Scheduling
Talk Outline
Talk Outline
Regret Minimization: External
Regret Minimization: External
Decision Making under uncertainty
Decision Making under uncertainty
Routing
Routing
Financial Markets: options
Financial Markets: options
Machine learning – Expert Advice
Machine learning – Expert Advice
Parameter Tuning
Parameter Tuning
Parameter Tuning
Parameter Tuning
Regret Minimization: Model
Regret Minimization: Model
External Regret
External Regret
External Regret Algorithm
External Regret Algorithm
External Regret Algorithm: Analysis
External Regret Algorithm: Analysis
External regret: Bounds
External regret: Bounds
Regret Minimization: Internal/Swap
Regret Minimization: Internal/Swap
Dominated Actions
Dominated Actions
Internal/Swap Regret
Internal/Swap Regret
Dominated Actions and Regret
Dominated Actions and Regret
Calibration
Calibration
Calibration to Regret
Calibration to Regret
Internal regret
Internal regret
Regret: External vs Internal
Regret: External vs Internal
Regret Minimization: Dynamics
Regret Minimization: Dynamics
Routing Games
Routing Games
Cournot Oligopoly [Cournot 1838]
Cournot Oligopoly [Cournot 1838]
Resource Allocation Games
Resource Allocation Games
Socially Concave Games
Socially Concave Games
The relation between socially concave games and concave games
The relation between socially concave games and concave games
The average action and average utility converge to NE
The average action and average utility converge to NE
Convergence of the “average action” and “average payoff” are two
Convergence of the “average action” and “average payoff” are two
The Action Profile Itself Need Not Converge
The Action Profile Itself Need Not Converge
Correlated Equilibrium
Correlated Equilibrium
Regret Minimization and Job Scheduling
Regret Minimization and Job Scheduling
Outline
Outline
Job Scheduling: Motivating Example
Job Scheduling: Motivating Example
Online Algorithms
Online Algorithms
Modeling Differences: Information
Modeling Differences: Information
Modeling Differences: Performance
Modeling Differences: Performance
Formal Model
Formal Model
Formal Model
Formal Model
Example
Example
Our Results: Adversarial General
Our Results: Adversarial General
Our Results: Adversarial Makespan
Our Results: Adversarial Makespan
Our Results: Adversarial Lower Bound
Our Results: Adversarial Lower Bound
Preliminary: Local vs
Preliminary: Local vs
Preliminary: Local vs
Preliminary: Local vs
Preliminary: Local vs
Preliminary: Local vs
Example
Example
Stochastic case
Stochastic case
Stochastic case:
Stochastic case:
Stochastic case: Static OPT
Stochastic case: Static OPT
Can we do better
Can we do better
Stochastic case: Least Loaded
Stochastic case: Least Loaded
Can we do better
Can we do better
Stochastic case: Optimized Finish
Stochastic case: Optimized Finish
Can we do better
Can we do better
Stochastic case: Any time
Stochastic case: Any time
Can we do better
Can we do better
Stochastic case: Logarithmic Regret
Stochastic case: Logarithmic Regret
Can we do better
Can we do better
Stochastic case: Unknown distributions
Stochastic case: Unknown distributions
Stochastic case
Stochastic case
Summary
Summary
Thank You
Thank You
Makespan Algorithm
Makespan Algorithm
Makespan: Two Machines
Makespan: Two Machines
Makespan Algorithm: Two actions
Makespan Algorithm: Two actions
Makespan Algorithm: Analysis
Makespan Algorithm: Analysis
Makespan Algorithm: Analysis
Makespan Algorithm: Analysis
Makespan Algorithm: Analysis
Makespan Algorithm: Analysis
Makespan: Recursive algorithm
Makespan: Recursive algorithm
Makespan: Recursive
Makespan: Recursive
Makespan: Recursive algorithm
Makespan: Recursive algorithm
Makespan: Recursive
Makespan: Recursive
Makespan: Recursive
Makespan: Recursive

Презентация: «Бианки первая охота 1 класс». Автор: . Файл: «Бианки первая охота 1 класс.ppt». Размер zip-архива: 763 КБ.

Бианки первая охота 1 класс

содержание презентации «Бианки первая охота 1 класс.ppt»
СлайдТекст
1 Regret Minimization and Job Scheduling

Regret Minimization and Job Scheduling

Yishay Mansour Tel Aviv University

2 Talk Outline

Talk Outline

Regret Minimization External Regret minimization Motivation algorithm Internal Regret minimization Motivation Regret Dynamics External Regret Job Scheduling and Regret Minimization Model Stochastic case

3 Regret Minimization: External

Regret Minimization: External

3

4 Decision Making under uncertainty

Decision Making under uncertainty

Online algorithms Stochastic models Competitive analysis Absolute performance criteria A different approach: Define “reasonable“ strategies Compete with the best (in retrospect) Relative performance criteria

5 Routing

Routing

Model: Each day 1. select a path from source to destination 2. observe the latencies. Each day diff values Strategies: All source-dest. paths Loss: The average latency on the selected path Performance Goal: match the latency of best single path

5

6 Financial Markets: options

Financial Markets: options

Model: stock or cash. Each day, set portfolio then observe outcome. Strategies: invest either: all in stock or, all in cash Gain: based on daily changes in stock Performance Goal: Implements an option !

CASH

STOCK

6

7 Machine learning – Expert Advice

Machine learning – Expert Advice

Model: each time 1. observe expert predictions 2. predict a label Strategies: experts (online learning algorithms) Loss: errors Performance Goal: match the error rate of best expert In retrospect

1

1

2

0

3

1

4

1

7

8 Parameter Tuning

Parameter Tuning

Model: Multiple parameters. Strategies: settings of parameters Optimization: any Performance Goal: match the best setting of parameters

8

9 Parameter Tuning

Parameter Tuning

Development Cycle develop product (software) test performance tune parameters deliver “tuned” product Challenge: can we combine testing tuning runtime

9

10 Regret Minimization: Model

Regret Minimization: Model

Actions A={1, … ,N} Time steps: t ? { 1, … , T} At time step t: Agent selects a distribution pt(i) over A Environment returns costs ct(i) ? [0,1] Adversarial setting Online loss: lt(on) = ?i ct(i) pt(i) Cumulative loss : LT(on) = ?t lt(on)

10

11 External Regret

External Regret

Relative Performance measure: compares to the best strategy in A The basic class of strategies Online cumulative loss : LT(on) = ?t lt(on) Action i cumulative loss : LT(i) = ?t ct(i) Best action: LT(best) = MINi {LT(i) }=MINi {?t ct(i)} External Regret = LT(on) – LT(best)

11

12 External Regret Algorithm

External Regret Algorithm

Goal: Minimize Regret Algorithm: Track the regrets Weights proportional to the regret Formally: At time t Compute the regret to each action Yt(i)= Lt(on)- Lt(i), and rt(i) = MAX{ Yt(i),0} pt+1(i) = rt(i) / ?i rt(i) If all rt(i) = 0 select pt+1 arbitrarily. Rt = < rt(1), …,rt(N)> and ?Rt= Yt - Yt-1

13 External Regret Algorithm: Analysis

External Regret Algorithm: Analysis

Rt = < rt(1), …,rt(N)> and ?Rt= Yt - Yt-1 LEMMA: ?Rt ? Rt-1 = 0 ?i(ct(i) – lt(on)) rt-1(i) = ?ict(i)rt-1(i)– ?ilt(on)rt-1(i) ?i lt(on) rt-1(i) = [?i ct(i) pt(i) ]?i rt-1(i) = ?i ct(i)rt-1(i) LEMMA:

Rt-1

Rt

14 External regret: Bounds

External regret: Bounds

Average regret goes to zero No regret Hannan [1957] Explicit bounds Littstone & Warmuth ‘94 CFHHSW ‘97 External regret = O(log N + ?Tlog N)

14

15 Regret Minimization: Internal/Swap

Regret Minimization: Internal/Swap

15

16 Dominated Actions

Dominated Actions

Model: action y dominates x if y always better than x Goal: Not to play dominated actions Goal (unknown model): The fraction of times we play dominated actions is played is vanishing

Cost Action y

Cost Action x

.3

.2

.8

.4

.9

.7

.6

.3

.3

.1

16

17 Internal/Swap Regret

Internal/Swap Regret

Internal Regret Regret(x,y) = ?t: a(t)=x ct(x) - ct(y) Internal Regret = maxx,y Regret(x,y) Swap Regret Swap Regret = ?x maxy Regret(x,y) Swap regret ? External Regret ?x maxy Regret(x,y) ? maxy ?x Regret(x,y) Mixed actions Regret(x,y) = ?t (ct(x) - ct(y))pt(x)

18 Dominated Actions and Regret

Dominated Actions and Regret

Assume action y dominates action x For any t: ct(x) > ct(y)+? Assume we used action x for n times Regret(x,y) > ? n If SwapRegret < R then Fraction of time dominated action used At most R/?

19 Calibration

Calibration

Model: each step predict a probability and observe outcome Goal: prediction calibrated with outcome During time steps where the prediction is p the average outcome is (approx) p

Predict prob. of rain

predictions

outcome

Calibration:

1/3

1/2

.3

.5

.3

.5

.3

.3

.5

19

20 Calibration to Regret

Calibration to Regret

Reduction to Swap/Internal regret: Discrete Probabilities Say: {0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0} Loss of action x at time t: (x – ct)2 y*(x)= argmaxy Regret(x,y) y*(x)=avg(ct|x) Consider R(x,y*(x))

21 Internal regret

Internal regret

No internal regret [Foster & Vohra] , [Hart & Mas-Colell] Based on the approachability theorem [Blackwell ’56] Explicit Bounds [Cesa-Bianchi & Lugasi ’03] Internal regret = O(log N + ?T log N) [Blum & Mansour] Swap regret = O(log N + ?T N)

21

22 Regret: External vs Internal

Regret: External vs Internal

External regret You should have bought S&P 500 Match boyi to girli

Internal regret Each time you bought IBM you should have bought SUN Stable matching

Limitations: - No state - Additive over time

22

23 Regret Minimization: Dynamics

Regret Minimization: Dynamics

[Even-Dar, Mansour, Nadav, 2009]

23

24 Routing Games

Routing Games

Costi = ?p? (si, ti) Latency(p) * flowi (p)

Atomic Finite number of players Player i transfer flow from si to ti

s1

f1

Splittable flows

t2

s2

t1

f2

f1, L

f1, R

f2, T

f2, B

25 Cournot Oligopoly [Cournot 1838]

Cournot Oligopoly [Cournot 1838]

Firms select production level Market price depends on the TOTAL supply Firms maximize their profit = revenue - cost

Y

X

Cost1(X)

Cost2(Y)

P

X

y

Best response dynamics converges for 2 players [Cournot 1838] Two player’s oligopoly is a super-modular game [Milgrom, Roberts 1990] Diverges for n ? 5 [Theocharis 1960]

Market price

Overall quantity

26 Resource Allocation Games

Resource Allocation Games

The best response dynamics generally diverges for linear resource allocation games

Advertisers set budgets:

Each advertiser wins a proportional market share

Utility: Concave utility from allocated rate Quasi-linear with money

25

‘s allocated rate =

5+10+17+25

$25M

$5M

$10M

$17M

27 Socially Concave Games

Socially Concave Games

Closed convex strategy set A (weighted) social welfare is concave The utility of a player is convex in the vector of actions of other players

Properties of Selfish Routing, Cournot Oligopoly and Resource Allocation Games

There exists ?1,…,?n > 0 Such that ?1 u1 (x) + ?2 u2(x)+…+?n un(x)

R

28 The relation between socially concave games and concave games

The relation between socially concave games and concave games

Normal Form Games (with mixed strategies)

Concave Games

Concave Games [ Rosen 65] The utility of a player is strictly concave in her own strategy A sufficient condition for equilibrium uniqueness

Socially Concave Games

Unique Nash Equilibrium

Zero Sum Games

Atomic, splittable routing

Cournot

Resource Allocation

29 The average action and average utility converge to NE

The average action and average utility converge to NE

Theorem 1: The average action profile converges to NE

If each player uses a procedure without regret in socially concave games then their joint play converges to Nash equilibrium:

?(T) - Nash equilibrium

Theorem 2: The average daily payoff of each player converges to her payoff in NE

Day 1

Day 2

Day 3

Day T

Average of days 1…T

Player 1:

Player 2:

Player n:

30 Convergence of the “average action” and “average payoff” are two

Convergence of the “average action” and “average payoff” are two

different things!

t

s

t

s

t

s

Here the average action converges to (?,?) for every player

But the average cost is 2, while the average cost in NE is 1

On Even Days

On Odd Days

31 The Action Profile Itself Need Not Converge

The Action Profile Itself Need Not Converge

t

s

t

s

On Even Days

On Odd Days

32 Correlated Equilibrium

Correlated Equilibrium

CE: A joint distribution Q Each time t, a joint action drawn from Q Each player action is BR Theorem [HM,FV]: Multiple players playing low internal (swap) regret converge to CE

Action x

Action y

.3

.2

.8

.4

.9

.7

.6

.3

.3

.1

32

33 Regret Minimization and Job Scheduling

Regret Minimization and Job Scheduling

[Even-Dar, Klienberg, Mannor, Mansour, 2009]

33

34 Outline

Outline

Job scheduling vs. online learning similarities differences Model & Results General algorithm calibration based Simple makespan algorithm

34

35 Job Scheduling: Motivating Example

Job Scheduling: Motivating Example

GOAL: Minimize load on servers

Load Balancer

servers

users

35

36 Online Algorithms

Online Algorithms

Job scheduling N unrelated machines machine = action each time step: a job arrives has different loads on different machines algorithm schedules the job on some machine Given its loads Goal: minimize the loads makespan or L2

Regret minimization N actions machines each time step First, algorithm selects an action (machine) Then, observes the losses Job loads Goal: minimize the sum of losses

36

37 Modeling Differences: Information

Modeling Differences: Information

Information model: what does the algorithm know when it selects action/machine Known cost: First observe costs then select action job scheduling Unknown cost: First select action then observe costs Regret Minimization

37

38 Modeling Differences: Performance

Modeling Differences: Performance

Theoretical Performance measure: comparison class job scheduling: best (offline) assignment regret minimization: best static algorithm Guarantees: job scheduling: multiplicative regret minimization: additive and vanishing. Objective function: job scheduling: global (makespan) regret minimization: additive.

38

39 Formal Model

Formal Model

N actions Each time step t algorithm ON select a (fractional) action: pt(i) observe losses ct(i) in [0,1] Average losses of ON for action i at time T: ONT(i) = (1/T) ?t<T pt(i) ct(i) Global cost function: C?(ONT(1), … , ONT(N)) = maxi ONT(i) Cd(ONT(1), … , ONT(N)) = [ ?i (ONT(i))d ]1/d

39

40 Formal Model

Formal Model

Static Optimum: Consider any fixed distribution ? Every time play ? Static optimum ?* - minimizes cost C Formally: Let ? ? L = (?(1)L(1) , … , ?(N) L(N)) Hadamard (or Schur) product. best fixed ?*(L) = arg min? C(? ? L ) where LT(i) = (1/T) ?t ct(i) static optimality C*(L) = C(?*(L) ? L)

40

41 Example

Example

Two machines, makespan:

final loads

observed loads

L1

L2

?*(L)

L1

L2

4 2

( 1/3 , 2/3)

4/3 4/3

41

42 Our Results: Adversarial General

Our Results: Adversarial General

General Feasibility Result: Assume C convex and C* concave includes makespan and Ld norm for d>1. There exists an online algorithm ON, which for any loss sequence L: C(ON) < C*(L) + o(1) Rate of convergence about ?N/T

42

43 Our Results: Adversarial Makespan

Our Results: Adversarial Makespan

Makespan Algorithm There exists an algorithm ON for any loss sequence L C(ON) < C*(L) + O(log2 N / ?T) Benefits: very simple and intuitive improved regret bound

Two actions

?

43

44 Our Results: Adversarial Lower Bound

Our Results: Adversarial Lower Bound

We show that for many non-convex C there is a non-vanishing regret includes Ld norm for d<1 Non-vanishing regret ? ratio >1 There is a sequence of losses L, such that, C(ON) > (1+?) C*(L), where ?>0

44

45 Preliminary: Local vs

Preliminary: Local vs

Global

Low regret in each block

Overall low regret

time

B1

B2

Bk

45

46 Preliminary: Local vs

Preliminary: Local vs

Global

LEMMA: Assume C convex and C* concave, Assume a partition of time to Bi At each time block Bi regret at most Ri Then: C(ON)-C*(L) ? ?i Ri

46

47 Preliminary: Local vs

Preliminary: Local vs

Global

Proof: C(ON) ? ? C(ON(Bi)) C is convex ? C*(L(Bi)) ? C*(L) C* is concave C(ON(Bi)) – C*(L(Bi)) ? Ri low regret in each Bi ? C(ON(Bi)) – C*(L(Bi)) ? ? Ri C(ON) – C*(L) ? ? Ri QED Enough to bound the regret on subsets.

47

48 Example

Example

static opt ?*=(1/2,1/2) cost = 3/2

t=2

t=1

48

49 Stochastic case

Stochastic case

Assume that each action’s cost is drawn from a joint distribution, i.i.d over time steps Theorem (makespan/Ld) Known distribution Regret =O(log T /T) Unknown distributions Regret = O( log2 T /T)

50 Stochastic case:

Stochastic case:

Each time t the costs are drawn from a joint distribution, i.i.d over time steps, not between actions INTUITION: Assume two actions (machines) Load Distribution: With probability ? : (1,0) With probability ? : (0,1) Which policy minimizes makespan regret?! Regret components: MAX(L(1),L(2)) = sum/2 +|?|/2 Sum=L(1)+L(2) & ?=L(1)-L(2)

51 Stochastic case: Static OPT

Stochastic case: Static OPT

Natural choice (model based) Select always action ( ?, ? ) Observations: Assume T/2+? times (1,0) and T/2-? times (0,1) Loads (T/4+ ?/2 , T/4-?/2) Makespan = T/4+ ?/2 > T/4 Static OPT: T/4 – ?2/T < T/4 W.h.p. OPT is T/4-O(1) sum=T/2 & E[|?|]= O(?T) Regret = O(?T)

52 Can we do better

Can we do better

53 Stochastic case: Least Loaded

Stochastic case: Least Loaded

Least loaded machine: Select the machine with the lower current load Observation: Machines have same load (diff ? 1): |?| ? 1 Sum of loads: E[sum] = T/2 Expected makespan = T/4 Regret Least Loaded Makespan LLM=T/4 ± ?T Regret =MAX{LLM-T/4,0} = O(?T) Regret considers only the “bad” regret

54 Can we do better

Can we do better

55 Stochastic case: Optimized Finish

Stochastic case: Optimized Finish

Algorithm: Select action ( ?, ? ) For T-4?T steps Play least loaded afterwards. Claim: Regret =O(T1/4) Until T-4 ?T steps (w.h.p) ? < 2?T Exists time t in [T-4 ?T ,T]: ?=0 & sum = T/2 + O( T1/4) From 1 to t: regret = O(T1/4) From t to T: Regret = O(?(T-t)) = O(T1/4)

56 Can we do better

Can we do better

57 Stochastic case: Any time

Stochastic case: Any time

An algorithm which has low regret for any t Not plan for final horizon T Variant of least loaded: Least loaded weight: ? + T-1/4 Claim: Regret = O(T1/4) Idea: Regret = max{(L1 + L2)/2 – T/4,0} + ? Every O(T1/2) steps ?=0 Very near (?, ?)

58 Can we do better

Can we do better

59 Stochastic case: Logarithmic Regret

Stochastic case: Logarithmic Regret

Algorithm: Use phases Length of phases shrink exponentially Tk= Tk-1/2 and T1= T/2 Log T phases Every phase cancels deviations of previous phase Deviation from the expectation Works for any probabilities and actions ! Assuming the probabilities are known.

60 Can we do better

Can we do better

61 Stochastic case: Unknown distributions

Stochastic case: Unknown distributions

Basic idea: Learn the expectations Have super-phases increase over time Br = 2 Br-1 In each super-phase run the logarithmic regret Using the observed expectations in the past Total number of phases log2T The bound on the regret O(log2T /T)

62 Stochastic case

Stochastic case

Assume that each action’s cost is drawn from a joint distribution, i.i.d over time steps Theorem (makespan/Ld) Known distribution Regret =O(log T /T) Unknown distributions Regret = O( log2 T /T)

63 Summary

Summary

Regret Minimization External Internal Dynamics Job Scheduling and Regret Minimization Different global function Open problems: Exact characterization Lower bounds

64 Thank You

Thank You

64

65 Makespan Algorithm

Makespan Algorithm

Outline: Simple algorithm for two machines Regret O(1/?T) simple and almost memory-less Recursive construction: Given three algorithms: two for k/2 actions and one for 2 actions build an algorithm for k actions Main issue: what kind of feedback to “propagate”. Regret O(log2 N /?T) better than the general result.

65

66 Makespan: Two Machines

Makespan: Two Machines

Intuition: Keep online’s loads balanced Failed attempts: use standard regret minimization In case of unbalanced input sequence L, algo. will put most of the load on single machine use optimum to drive the probabilities Our approach: Use the online loads not opt or static cumulative loads

66

67 Makespan Algorithm: Two actions

Makespan Algorithm: Two actions

At time t maintain probabilities pt,1 and pt,2 = 1-pt,1 Initially p1,1 = p1,2 = ? At time t:

Remarks: uses online loads Almost memory-less

?

67

68 Makespan Algorithm: Analysis

Makespan Algorithm: Analysis

View the change in probabilities as a walk on the line.

0

?

1

68

69 Makespan Algorithm: Analysis

Makespan Algorithm: Analysis

Consider a small interval of length ? Total change in loads: identical on both started and ended with same ?

Consider only losses in the interval local analysis Local opt is also in the interval Online used “similar” probability loss of at most ? per step

69

70 Makespan Algorithm: Analysis

Makespan Algorithm: Analysis

Simplifying assumptions: The walk is “balanced” in every interval add “virtual” losses to return to initial state only O(?T) additional losses relates the learning rate to the regret Losses “cross” interval’s boundary line needs more sophisticated “bookkeeping”. make sure an update affects at most two adjacent intervals. Regret accounting loss in an interval additional “virtual” losses.

70

71 Makespan: Recursive algorithm

Makespan: Recursive algorithm

Recursive algorithm:

A1

A2

A3

71

72 Makespan: Recursive

Makespan: Recursive

The algorithms: Algorithms A1 and A2: Each has “half” of the actions gets the actual losses and “balances” them each work in isolation. simulating and not considering actual loads. Algorithm A3 gets the average load in A1 and A2 balances the “average” loads.

AVG(lt,iq’t,i)

AVG(lt,iqt,i)

lt,k/2 ….

lt,1 ….

A1

A2

A3

72

73 Makespan: Recursive algorithm

Makespan: Recursive algorithm

Input to A3: average load

A1

A2

A3

AVG(lt,1q’t,i)

AVG(lt,1qt,i)

73

74 Makespan: Recursive

Makespan: Recursive

The combined output :

qk/2p2, … ,

q1p1, … ,

x

x

q1, …

qk/2, …

p1

p2

A1

A2

A3

l1, …

lk/2, …

AVG(lt,1qt,i)

AVG(lt,k/2qt,i)

74

75 Makespan: Recursive

Makespan: Recursive

Analysis (intuition): Assume perfect ZERO regret just for intuition … The output of A1 and A2 completely balanced The average equals the individual loads maximum=average=minimum The output of A3 is balanced the contribution of A1 machines equal to that of A2 Real Analysis: need to bound the amplification in the regret.

75

«Бианки первая охота 1 класс»
http://900igr.net/prezentacija/bez_uroka/bianki-pervaja-okhota-1-klass-256281.html
cсылка на страницу

Без темы

23701 презентация
Урок

Без урока

1 тема
Слайды
900igr.net > Презентации по > Без темы > Бианки первая охота 1 класс