Approximation Algorithms for Stochastic Orienteering ?· Approximation Algorithms for Stochastic Orienteering…

  • Published on

  • View

  • Download


Approximation Algorithms for Stochastic OrienteeringAnupam Gupta Ravishankar Krishnaswamy Viswanath Nagarajan R. RaviAbstractIn the Stochastic Orienteering problem, we are given ametric, where each node also has a job located therewith some deterministic reward and a random size.(Think of the jobs as being chores one needs to run,and the sizes as the amount of time it takes to do thechore.) The goal is to adaptively decide which nodesto visit to maximize total expected reward, subject tothe constraint that the total distance traveled plus thetotal size of jobs processed is at most a given budgetof B. (I.e., we get reward for all those chores we finishby the end of the day). The (random) size of a jobis not known until it is completely processed. Hencethe problem combines aspects of both the stochasticknapsack problem with uncertain item sizes and thedeterministic orienteering problem of using a limitedtravel time to maximize gathered rewards located atnodes.In this paper, we present a constant-factor ap-proximation algorithm for the best non-adaptive pol-icy for the Stochastic Orienteering problem. Wealso show a small adaptivity gapi.e., the exis-tence of a non-adaptive policy whose reward is atleast an (1/ log log B) fraction of the optimal ex-pected rewardand hence we also get an O(log log B)-approximation algorithm for the adaptive problem. Fi-nally we address the case when the node rewards arealso random and could be correlated with the wait-ing time, and give a non-adaptive policy which is anO(log n log B)-approximation to the best adaptive pol-icy on n-node metrics with budget B.Department of Computer Science, Carnegie Mellon Univer-sity, Pittsburgh PA 15213. Research was partly supported byNSF awards CCF-0964474 and CCF-1016799.Department of Computer Science, Carnegie Mellon Univer-sity, Pittsburgh PA 15213. Research was partly supported byNSF awards CCF-0964474 and CCF-1016799, and an IBM Grad-uate Fellowship.IBM T.J. Watson Research Center, Yorktown Heights, NY10598, USA.Tepper School of Business, Carnegie Mellon University, Pitts-burgh PA 15213. Research was partly supported by NSF awardsCCF-1143998.1 IntroductionConsider the following problem: you start your day athome with a set of chores to run at various locations(e.g., at the bank, the post office, the grocery store),but you only have limited time to run those chores in(say, you have from 9am until 5pm, when all these shopsclose). Each successfully completed chore/job j givesyou some fixed reward rj . You know the time it takesyou to travel between the various job locations: thesedistances are deterministic and form a metric (V, d).However, you do not know the amount of time you willspend doing each job (e.g., standing in the queue, fillingout forms). Instead, for each job j, you are only giventhe probability distribution j governing the randomamount of time you need to spend performing j. Thatis, once you start performing the job j, the job finishesafter Sj time units and you get the reward, where Sjis a random variable denoting the size, and distributedaccording to j .1 (You may cancel processing the job jprematurely, but in that case you dont get any reward,and you are barred from trying j again.) The goal is nowa natural one: given the metric (V, d), the starting point, the time budget B, and the probability distributionsfor all the jobs, give a strategy for traveling aroundand doing the jobs that maximizes the expected rewardaccrued.The case when all the sizes are deterministic (i.e.,Sj = sj with probability 1) is the orienteering problem,for which we now know a (2 + )-approximation algo-rithm [5, 7]. Another special case, where all the choresare located at the start node, but the sizes are random,is the stochastic knapsack problem, which also admitsa (2+ )-approximation algorithm [12, 3]. However, thestochastic orienteering problem above, which combinesaspects of both these problems, seems to have been hith-erto unexplored in the approximation algorithms liter-ature.It is known that even for stochastic knapsack, an op-timal adaptive strategy may be exponential-sized (and1To clarify: before you reach the job, all you know about itssize is what can be gleaned from the distribution j of Sj ; andeven having worked on j for t units of time, all you know aboutthe actual size of j is what you can infer from the conditional(Sj | Sj > t).finding the best strategy is PSPACE-hard) [12]. So an-other set of interesting questions is to bound the adap-tivity gap, and to find non-adaptive solutions whoseexpected reward is close to that of the best adaptivesolution. A non-adaptive solution for stochastic orien-teering is simply a tour P of points in the metric spacestarting at the root : we visit the points in this fixedorder, performing the jobs at the points we reach, untiltime runs out.One natural algorithm for stochastic orienteeringis to replace each random job j by a deterministicjob of size E[Sj ], and use an orienteering algorithm tofind the tour of maximum reward in this deterministicinstance.2 Such an approach was taken by [12] forstochastic knapsack: they showed a constant adaptivitygap, and constant-factor approximation via preciselythis idea. However, for stochastic orienteering this givesonly an O(log B) approximation, and indeed there areexamples where we get only an (log B) fraction of theoptimal expected reward. (See Section 4 for a moredetailed discussion.) In this paper we show we can domuch better than this logarithmic approximation:Theorem 1.1. There is an O(log log B)-approximationalgorithm for the stochastic orienteering problem.Indeed, our proof proceeds by first showing the followingstructure theorem which bounds the adaptivity gap:Theorem 1.2. Given an instance of the stochastic ori-enteering problem, then either there exists a single job which gives an(log log B) fraction of the optimal reward, or there exists a value W such that the optimal non-adaptive tour which spends at most W time wait-ing and BW time traveling, gets an (log log B)fraction of the optimal reward.Note that navely we would expect only a logarithmicfraction of the reward, but the structure theorem showswe can do better. Indeed, this theorem is the technicalheart of the paper, and is proved via a martingaleargument of independent interest. Since the abovetheorem shows the existence of a non-adaptive solutionclose to the best adaptive solution, we can combine itwith the following result to prove Theorem 1.1.Theorem 1.3. There exists a constant-factor approxi-mation algorithm to the optimal non-adaptive policy forstochastic orienteering.2Actually, one should really replace the stochastic job witha deterministic one of size E[min(Sj , B d(, j))] and rewardrj Pr[Sj + d(, j) B], it is very easy to fool the algorithmotherwise.Note that if we could show an existential proof ofa constant adaptivity gap (which we conjecture tobe true), the above approximation for non-adaptiveproblems that we show immediately implies an O(1)-approximation algorithm for the adaptive problem too.Our second set of results are for a variant of theproblem, one where both the rewards and the job sizesare random and correlated with each other. For thiscorrelated problem, we show the following results:Theorem 1.4. There is a polynomial-time algorithmthat outputs a non-adaptive solution for correlatedstochastic orienteering, which is an O(log n log B)-approximation to the best adaptive solution. More-over, the correlated problem is at least as hard as theorienteering-with-deadlines problem.Recall that we only know an O(log n) approximation forthe orienteering-with-deadlines problem [2].Our Techniques: Most of the previous adaptivitygaps have been shown by considering some linear pro-gramming relaxation that captures the optimal adaptivestrategies, and then showing how to round a fractionalLP solution to get a non-adaptive strategy. But sincewe do not know a good relaxation for even the deter-ministic orienteering problem, this approach seems dif-ficult to take directly. So to show our adaptivity gapresults, we are forced to argue directly about the opti-mal adaptive strategy: we use martingale arguments toshow the existence of a path (a.k.a. non-adaptive strat-egy) within this tree which gives a large reward. Wecan then use algorithms for the non-adaptive settings,which are based on reductions to orienteering (with anadditional knapsack constraint).Roadmap: The rest of the paper follows the out-line above. We begin with some definitions in Section 2,and then define and give an algorithm for the knapsackorienteering problem which will be a crucial sub-routinein all our algorithms in Section 3. Next we motivate ournon-adaptive algorithm by discussing a few straw-mansolutions and the traps they fall into, in Section 4. Wethen state and prove our constant-factor non-adaptivealgorithm in Section 5, which naturally leads us to ourmain result in Section 6, the proof of the O(log log B)-adaptivity gap for StocOrient. Finally, we present inSection 7 the poly-logarithmic approximation for theproblem where the rewards and sizes are correlated witheach other. For all these results we assume that we arenot allowed to cancel jobs once we begin working onthem: in Section 8 we show how to discharge this as-sumption.1.1 Related Work The orienteering problem isknown to be APX-hard, and the first constant-factorapproximation was due to Blum et al. [5]. Their fac-tor of 4 was improved by [2] and ultimately by [7] to(2 + ) for every > 0. (The paper [2] also consid-ers the problem with deadlines and time-windows; seealso [8, 9].) There is a PTAS for low-dimensional Eu-clidean space [10]. To the best of our knowledge, thestochastic version of the orienteering problem has notbeen studied before from the perspective of approxima-tion algorithms. Heuristics and empirical guarantees fora similar problem were given by Campbell et al. [6].The stochastic knapsack problem [12] is a specialcase of this problem, where all the tasks are locatedat the root itself. Constant factor approximationalgorithms for the basic problem were given by [12,4], and extensions where the rewards and sizes arecorrelated were studied in [16]. Most of these papersproceed via writing LP relaxations that capture theoptimal adaptive policy; extending these to our problemfaces the barrier that the orienteering problem is notknown to have a good natural LP relaxation.Another very related body of work is on budgetedlearning. Specifically, in the work of Guha and Muna-gala [15], there is a collection of Markov chains spreadaround in a metric, each state of each chain having anassociated reward. When the player is at a Markovchain at j, she can advance the chain one step everyunit of time. If she spends at most L time units trav-eling, and at most C time units advancing the Markovchains, how can she maximize some function (say thesum or the max) of rewards of the final states in expec-tation? [15] give an elegant constant factor approxima-tion to this problem (under some mild conditions on therewards) via a reduction to classical orienteering usingLagrangean multipliers. Our algorithm/analysis for theknapsack orienteering problem (defined in Section 2) isinspired by theirs; the analysis of our algorithm thoughis simpler, due to the problem itself being determin-istic. However, it is unclear how to directly use suchtwo-budget approaches to get O(1)-factors for our one-budget problem without incurring an O(log B)-factor inthe approximation ratio.Finally, while most of the work on giving ap-proximations to adaptive problems has proceeded byusing LP relaxations to capture the optimal adap-tive strategiesand then often rounding them to getnon-adaptive strategies, thereby also proving adaptiv-ity gaps [14, 12], there are some exceptions. In par-ticular, papers on adaptive stochastic matchings [11],on stochastic knapsack [4, 3], on building decisiontrees [18, 1, 17], all have had to reason about the optimaladaptive policies directly. We hope that our martingale-based analysis will add to the set of tools used for sucharguments.2 Definitions and NotationAn instance of stochastic orienteering is defined on anunderlying metric space (V, d) with ground set |V | = nand symmetric integer distances d : V V Z+(satisfying the triangle inequality) that represent traveltimes. Each vertex v V is associated with a uniquestochastic job, which we also call v. For the firstpart of the paper, each job v has a fixed rewardrv Z0, and a random processing time/size Sv,which is distributed according to a known but arbitraryprobability distribution v : R+ [0, 1]. (In theintroduction, this was the amount we had to wait inqueue before receiving the reward for job v.) We arealso given a starting root vertex , and a budget Bon the total time available. Without loss of generality,we assume that all distances are integer values.The only actions allowed to an algorithm are totravel to a vertex v and begin processing the job there:when the job finishes after its random length Sv oftime, we get the reward rv (so long as the total timeelapsed, i.e., total travel time plus processing time, isat most B), and we can then move to the next job.There are two variants of the basic problem: in the basicvariant, we are not allowed to cancel any job that webegin processing (i.e., we cannot leave the queue oncewe join it). In the version with cancellations, we cancancel any job at any time without receiving any reward,and we are not allowed to attempt this job again in thefuture. Our results for both versions will have similarapproximation guaranteesfor most of the paper, wefocus on the basic version, and in Section 8 we show howto reduce the more general version with cancellations tothe basic one.Note that any strategy consists of a decision treewhere each state depends on which previous jobs wereprocessed, and what information we got about theirsizes. Now the goal is to devise a strategy which, start-ing at the root , must decide (possibly adaptively)which jobs to travel to and process, so as to maxi-mize the expected sum of rewards of jobs successfullycompleted before the total time (travel and processing)reaches the threshold of B.In the second part of the paper, we consider thesetting of correlated rewards and sizes: in this model,the job sizes and rewards are both random, and arecorrelated with each other. (Recall that the stochasticknapsack version of this problem also admits a constantfactor approximation [16]).We are interested in both adaptive and non-adaptive strategies, and in particular, want to bound theratio between the performance of the best adaptive andbest non-adaptive strategies. An adaptive strategy is adecision tree where each node is labeled by a job/vertexof V , with the outgoing arcs from a node labeled by jcorresponding to the possible sizes in the support of j .A non-adaptive strategy, on the other hand, is a path Pstarting at ; we just traverse this path, executing thejobs we encounter, until the total (random) size of thejobs and the distance traveled exceeds B. (A random-ized non-adaptive strategy may pick a path P at randomfrom some distribution before it knows any of the sizeinstantiations, and then follows this path as above.)Finally, we use [m] to denote the set {0, 1, . . . , m}.3 The (Deterministic) Knapsack OrienteeringProblemWe now define a variant of the orienteering problemwhich will be crucially used in the rest of the paper.Recall that in the basic orienteering problem, theinput consists of a metric (V, d), the root vertex ,rewards rv for each job v, and total budget B. Thegoal is to find a path P of length at most B starting at that maximizes the total rewardvP rv of verticesin P .In the knapsack orienteering problem(KnapOrient), we are given two budgets, L whichis the travel budget, and W which is the knapsackbudget. Each job v has a reward rv, and now also asize sv. A feasible solution is a path P of length atmost L, such that the total size s(P ) :=vP sv isat most W . The goal is to find a feasible solution ofmaximum rewardvP rv.Theorem 3.1. There is a polynomial time O(1)-approximation AlgKO for the KnapOrient problem.The idea of the proof is to push the knapsackconstraint into the objective function by consideringLagrangian relaxation of the knapsack constraint; weremark that such an approach was also taken in [15]for a related problem. This way we alter the rewardsof items while still optimizing over the set of feasibleorienteering tours. For a suitable choice of the Lagrangeparameter, we will show that we can recover a solutionwith large (unaltered) reward while meeting both theknapsack and orienteering constraints.Proof. For a value 0, define an orienteering instanceI() on metric (V, d) with root , travel budget L, andprofits rv := rv sv at each v V . Note that theoptimal solution to this orienteering instance has valueat least Opt W , where Opt is the optimal solutionto the original KnapOrient instance.Let Algo() denote an -approximate solution toI() as well as its profit; we have = 2 + via thealgorithm from [7]. By exhaustive search, let us find:(3.1) := max{ 0 : Algo() W}Observe that by setting = Opt2W , we have Algo() 1 (Opt W ) = Opt2 . Thus Opt2W .Let denote the -path in solution Algo(), andletv sv = y W for some y 0. Partition thevertices of into c = max{1, b2yc} parts 1, . . . , c withvj sv W for all j {1, . . . , c}. This partition canbe obtained by greedy aggregation since maxvV sv W (all vertices with larger size can be safely excluded bythe algorithm). Set k for k = arg maxcj=1 r(j).We then output as our approximate solution to theKnapOrient instance. It remains to bound the rewardwe obtain.r() r()c yW + W/c= W (y + 1/c) W min{y +1,12+12y} WThe second inequality is by r() s() = Algo() W due to the choice (3.1), which implies that r() s() + W = yW + W by the definition ofy. The third inequality is by c max{1, 2y}. The lastinequality uses 2. It follows that r() Opt2 , thusgiving us the desired approximation guarantee.As an aside, this Lagrangean approach can give aconstant-factor approximation for a two-budget versionof stochastic orienteering, but it is unclear how toextend it to the single-budget version. In particular,we are not able to show that the Lagrangian relaxation(of processing times) has objective value (Opt). Thisis because different decision paths in the Opt tree mightvary a lot in their processing times, implying thatthere is no reasonable candidate for a good Lagrangemultiplier.4 A Strawman Approach: Reduction toDeterministic OrienteeringAs mentioned in the introduction, a natural idea forStocOrient is to replace random jobs by deterministicones with size equal to the expected size E[Sv], find anear-optimal orienteering solution P to the determin-istic instance which gets reward R. One can then usethis path P to get a non-adaptive strategy for the orig-inal StocOrient instance with expected reward (R).Indeed, suppose the path P spends time L travelingand W waiting on the deterministic jobs such thatL + W B. Then, picking a random half of the jobsand visiting them results in a non-adaptive solution forStocOrient which travels at most L and processes jobsfor time at most W/2 in expectation. Hence, Markovsinequality says that with probability at least 1/2, alljobs finish processing within W time units and we getthe entire reward of this sub-path, which is (R).However, the problem is in showing that R =(Opt)i.e., that the deterministic instance has a so-lution with large reward. The above simplistic reduc-tion of replacing random jobs by deterministic ones withmean size fails even for stochastic knapsack: suppose theknapsack budget is B, and each of the n jobs have sizeBn with probability 1/n, and size 0 otherwise. Notethat the expected size of every job is now B. Therefore,a deterministic solution can pick only one job, whereasthe optimal solution would finish (n) jobs with highprobability. However, observe that this problem dis-appears if we truncate all sizes at the budget, i.e., setthe deterministic size to be the expected truncatedsize E[min(Sj , B)] where Sj is the random size of jobj. (Of course, we also have to set the reward to berj Pr[Sj B] to discount the reward from impossiblesize realizations.) Now E[min(Wj , B)] reduces to B/nand so the deterministic instance can now get (n) re-ward. Indeed, this is the approach used by [12] to getan O(1)-approximation and adaptivity gap.But for StocOrient, is there a good truncationthreshold? Considering E[min(Sj , B)] fails on the ex-ample where all jobs are co-located at a point at dis-tance B 1 from the root. Each job v has size Bwith probability 1/B, and 0 otherwise. Truncation byB gives an expected size ESvv [min(Sv, B)] = 1 forevery job, and so the deterministic instance gets rewardfrom only one job, while the StocOrient optimum cancollect (B) jobs. Now noticing that any algorithm hasto spend B 1 time traveling to reach any vertex thathas some job, we can instead truncate each job j sizeat B d(, j), which is the maximum amount of timewe can possibly spend at j (since we must reach vertexj from ). However, while this fix would work for theaforementioned example, the following example showsthat the deterministic instance gets only an O( log log Blog B )fraction of the optimal stochastic reward.Consider n = log B jobs on a line as in Figure 4. Fori = 1, 2, . . . , log B, the ith job is at distance B(11/2i),and it takes on sizes B/2i with probability p := 1/ log Band size 0 otherwise. Each job has unit reward. Theoptimal (adaptive and non-adaptive) solution to this v1 v2 vlogBB/2 B/4Figure 4.1: Bad example for replacing by expectations.instance is to try all the jobs in order: with probability(1 p)log B 1/e, all the jobs instantiate to size 0 andwe will accrue reward (log B).However, suppose we replace each job by its ex-pected size i = B/(2i log B), and try to pick jobs sothat the total travel plus expected sizes is at most B.Suppose j is the first job we pick along the line, thenbecause of its size being j we cannot reach any jobsin the last j length of the path. The number of theselost jobs is log j = log B j log log B (because of thegeometrically decreasing gaps between jobs), and hencewe can reach only jobs j, j + 1, j + log log B 1givingus a maximum profit of log log B even if we ignore thespace these jobs would take. (In fact, since their sizesdecrease geometrically, we can indeed get all but a con-stant number of these jobs.)This shows that replacing jobs in a StocOrient in-stance by their expected truncated sizes gives a deter-ministic instance whose optimal reward is smaller by an( log Blog log B ) factor.4.1 Why the above approaches failed, and ourapproach The reason why the determinization tech-niques described above worked for stochastic knapsack,but failed for stochastic orienteering is the following:the total sizes of jobs is always roughly B in knapsack(so truncating at B was the right thing to do). Butin orienteering it depends on the total time spent trav-eling, which in itself is a random quantity even for anon-adaptive solution. One way around this is to guessthe amount of time spent processing jobs (up to a fac-tor of 2) which gets the largest profit, and use thatas the truncation threshold. Such an approach seemslike it should lose a (log B) fraction of the reward.Surprisingly, we show that this simple algorithm actu-ally gives a much better reward: it gets a constant fac-tor of the reward of a non-adaptive optimum, and anO(log log B)-approximation when compared to the adap-tive optimum!At a high level, our approach is to solve this problemin three steps. Suppose we start off with an instance Isoof StocOrient with optimal (adaptive) solution havingexpected reward Opt. The high level overview of ouralgorithm is described in Figure 4.2. However, thereare many details to be addressed, and we flesh outthe details of this algorithm over the next couple ofsections. Before we proceed, it would be useful to recallour definition of the KnapOrient problem from Section 3.5 The Algorithm for Non-Adaptive StochasticOrienteeringWe first consider the non-adaptive StocOrient problem,and present an O(1)-approximation algorithm. This willprove Theorem 1.3, and also serve as a warmup for themore involved analysis in the adaptive setting.Recall that the input consists of metric (V, d) witheach vertex v V representing a stochastic job having adeterministic reward rv Z0 and a random processingtime/size Sv distributed according to v : R+ [0, 1]; we are also given a root and budget B. Anon-adaptive strategy is an ordering of the vertices(starting with ), which corresponds to visiting vertices(and processing the respective jobs) in the order .The goal in the non-adaptive StocOrient problem isto compute an ordering that maximizes the expectedreward, i.e., the total reward of all items which arecompleted within the budget of B (travel + processingtimes).Definition 1. (Truncated Means) For any vertexu V and any positive value Z 0, let u (Z) :=ESuu [min(Su, Z)] denote the expected size truncatedat Z. Note that u (Z1) u (Z2) and u (Z1 + Z2) u (Z1) + u (Z2) for all Z2 Z1 0.Definition 2. (Valid KnapOrient Instances) Nowgiven an instance Iso of StocOrient, and a valueW B, we define the following valid KnapOrientinstance with parameter W to be Iko(W ) :=KnapOrient(V, d, {(su, ru) : u V }, L,W, ) asfollows.(i) Set L = B W ,(ii) For all u V , define its deterministic sizesu = u (W/2).(iii) For all u V , set its reward to be ru = ru ifPrSuu [Su > W/2] 12 , and to 0 otherwise.Recall that KnapOrient has deterministic sizes buttwo budgets L and W , and that AlgKO is an O(1)-approximation algorithm for it. The algorithm for non-adaptive StocOrient now proceeds as given below.Note that the algorithm gives a randomized non-adaptive policy: it chooses a random (sub)path tofollow, and just visits all the jobs on that path untiltime runs out.5.1 The Analysis The analysis consists of two con-ceptual parts. Let Opt denote both the optimal non-adaptive strategy, and its total expected reward.Algorithm 1 Algorithm AlgSO for StocOrient on inputIso = (V, d, {(u, ru) : u V }, B, )1: for all v V do2: let Rv := rv PrSvv [Sv (B d(, v))] be theexpected reward of the single-vertex tour to v.3: w.p. 1/2, just visit the vertex v with the highestRv and exit.4: for W = B,B/2, B/4, . . . , B/2dlog Be, 0 do5: let i = log(B/W ) if W 6= 0, otherwise leti = dlog Be+ 1.6: let Pi be the path returned by AlgKO on the validKnapOrient instance Iko(W ).7: let Ri be the reward of this KnapOrient solutionPi.8: let Pi be the solution among {Pi}i[log B+1] withmaximum reward Ri.9: sample each vertex in Pi independently w.p. 1/4,and visit these sampled vertices in order given byPi . We first show that either just a single-vertex tourhas reward (Opt), or one of the valid KnapOrientinstances for some W of the form B/2i (or W = 0)has reward (Opt). If a single-vertex tour is good, well visit it with50% chance and be done. If not, since AlgKO is anO(1)-approximation algorithm for KnapOrient, thesolution Pi selected in Step 8 must have rewardRi = (Opt), albeit only as a KnapOrient solution.So we finally show that a solution for any validKnapOrient instance with reward R can be con-verted into a non-adaptive strategy for StocOrientwith reward (R).The following theorem summarizes the first part ofthis argument:Theorem 5.1. (Structure Theorem 1) Given aninstance Iso for which an optimal non-adaptive strategyhas an expected reward of Opt, either there is a single-vertex tour with expected reward (Opt), or there existsW = B/2i for some i Z0 (or W = 0) for which thevalid KnapOrient instance Iko(W ) has reward (Opt).For now, let us assume Theorem 5.1 and completethe proof of Theorem 1.3; well then prove Theorem 5.1in the following subsection. Firstly, note that if thereis a single-vertex tour with expected reward (Opt),then our algorithm would get this reward in Step 3with probability 1/2. Therefore, for the remainder ofthis section, assume that no single-vertex tour is good;hence, by Theorem 5.1 and our O(1)-approximationalgorithm (Theorem 3.1) for KnapOrient, the path Piwith the highest reward identified in Step 8 has rewardStep 1: Construct a suitable instance Iko of Knapsack Orienteering (KnapOrient), with the guarantee that theoptimal profit from this KnapOrient instance Iko is (Opt/ log log B).Step 2: Use Theorem 3.1 on Iko to find a path P with reward (Opt/ log log B).Step 3: Follow this solution P and process the jobs in the order P visits them.Figure 4.2: High Level Overview(Opt).For brevity, denote Pi as P , and let W = 2ibethe associated value of W . Let P = , v1, v2, . . . , vl.Hence Ri =uP ru. Since P is feasible forIko(W ), we know that (a) the length of the path P is at most B W , and (b) uP su W . Andrecall that by the way we defined the valid KnapOrientinstances (Definition 2), su = u (W /2) for all u, andru = ru if PrSuu [Su > W/2] 12 and 0 otherwise.Lemma 5.1. For any vertex v P such that rv > 0,the probability of successfully reaching v and finishingthe job at v before violating the budget is at least 1/16.Proof. The proof is an easy application of Markovsinequality. Indeed, consider the sum(5.2)uPv 1u min(Su, W /2)over the vertices on P that precede v, where 1u isthe indicator random variable for whether u is sampledin Step 9. If this sum is at most W /2 (the goodevent), then the total time spent processing jobs up to(but not including) v is at most W /2; And indeed,since we sample each vertex in P independently withprobability 1/4, and sincexP E[min(Sx,W /2)] =xP sx W , the expected value of the sum (5.2)is at most W /4. By Markovs inequality, the goodevent happens with probability at least 1/2. Wecondition on the occurrence of this event.Now, job v is sampled independently with proba-bility 1/4, and if chosen, it finishes processing withinW /2 with probability at least 1/2 (because rv > 0 andtherefore PrSvv [Sv > W/2] 12 by definition). Fi-nally, since the total time spent traveling on the entirepath P is at most B W , we can combine the aboveevents to get that the job successfully completes withprobability at least 1/16.Now, linearity of expectation implies that using Piwith sub-sampling, as in Step 9, gives us an expectedreward of at least Ri/16. Since Ri is (Opt), thiscompletes the proof of Theorem Proof of Theorem 5.1 The proof proceedsas follows: assuming that no single-vertex tour hasreward (Opt), we want to show that one of the validKnapOrient instances must have large reward. We firstnote that this assumption lets us make the followingstructural observation about Opt.Observation 1. Assume that no single-vertex tour hasreward (Opt). Then Opt can skip all visits to verticesu such that PrSuu [Su > (B d(, u))] 1/2, andstill preserve a constant factor of the total reward.Proof. Say that a vertex u is bad if PrSuu [Su >(Bd(, u))] 1/2. Notice that by visiting a bad vertexu, the probability that Opt can continue decreasesgeometrically by a factor of 1/2, because of the highchance of the size being large enough to exceed the totalbudget. Therefore, the total expected reward that wecan collect from all bad jobs is at most, say, twice, thatwe can obtain by visiting the best bad vertex. Since theexpected reward from any bad vertex is small, we cansafely skip such vertices in Opt, and still retain (Opt)reward.Without loss of generality, let the optimal non-adaptive ordering be P = { = v0, v1, v2, . . . , vn};also denote the expected total reward by Opt. Forany vj V let Dj =ji=1 d(vi1, vi) denote the totaldistance spent before visiting vertex vj . Note that whilethe time spent before visiting any vertex is a randomquantity, the distance is deterministic, since we dealwith non-adaptive strategies. Let vj denote the firstvertex j P such that(5.3)iit is also clear that the probability that Opt visits vjis upper bounded by the probability thatijust a single-vertex tour has reward (Opt/ logdlog Be),or one of the valid KnapOrient instances (Definition 2)for some W of the form B/2i (or W = 0) has reward(Opt/ logdlog Be). This is sufficient because we cannow plug in the following Structure Theorem 2 (The-orem 6.1) instead of the non-adaptive Structure The-orem 5.1 and the rest of the previous sections anal-ysis goes through unchanged. Indeed, we argued inSection 5.1 that any solution for any valid KnapOrientinstance with reward (R) can be converted into anon-adaptive strategy for StocOrient with reward (R),and therefore this also establishes the adaptivity gap oflogdlog Be for StocOrient.Theorem 6.1. (Structure Theorem 2) Given aninstance Iso for which an optimal strategy has an ex-pected reward Opt, either there is a single-vertex tourwith expected reward (Opt/ log log B), or there ex-ists W = B/2i for some i Z0 (or W = 0) forwhich the valid KnapOrient instance Iko(W ) has reward(Opt/ logdlog Be).Before we begin, recall the typical instance ofStocOrient Iso := StocOrient(V, d, {(u, ru) : u V }, B, ) of the stochastic orienteering problem, wherethe metric is (V, d), B is the total budget of our tour,the job at vertex u has a reward ru and a probabilitydistribution u : R+ [0, 1] over sizes, and is thestarting vertex.Roadmap. We begin by giving a roadmap of theproof. Let us view the optimal adaptive strategy Optas a decision tree where each node is labeled with avertex/job, and the children correspond to different sizeinstantiations of the job. For any sample path P in thisdecision tree, consider the first node xP where the sumof expected sizes of the jobs played until xP exceeds thebudget remaininghere, if Lx,P is the total distancetraveled from the root to this vertex xP by visitingvertices along P , then the remaining budget is BLx,P .Call such a node a frontier node, and the frontier isthe union of all such frontier nodes. To make senseof this definition, note that if the orienteering instancewas non-stochastic (and all the sizes would equal theexpected sizes), then we would not be able to get anyreward from portions of the decision tree on or belowthe frontier nodes. Unfortunately, since job sizes arerandom for us, this may not be the case.The main idea in the proof is to show that we dontlose too much reward by truncation: even if we truncateOpt along this frontier in a StocOrient instance, we stillget an expected reward of (Opt/ logdlog Be) from thetruncated tree. Now an averaging argument says thatthere exists some path P of length L where (i) thetotal rewards of jobs is (Opt/ logdlog Be), and (ii) thesum of expected sizes of the jobs is O(B L) and thisgives us the candidate KnapOrient solution. However, aswe elaborated earlier in Section 4 , we need to be verycareful in the way we define our deterministic sizes andtruncation thresholds.Viewing Opt as a Discrete Time Stochastic Pro-cess. Note that the transitions of the decision tree Optrepresent travel between vertices: if the parent node islabeled with vertex u, and its child is labeled with v, thetransition takes d(u, v) time. To simplify matters, wetake every such transition, and subdivide it into d(u, v)unit length transitions. The intermediate nodes addedin are labeled with new dummy vertices, with dummyjobs of deterministic size 0 and reward 0. We denotethis tree as Opt, and note the amount of time spenttraveling is exactly the number of edges traveled downthis tree. (All this is only for analysis.) Now if westart a particle at the root, and let it evolve down thetree based on the random outcomes of job sizes, thenthe node reached at timestep t corresponds to some jobwith a random size and reward. This naturally givesus a discrete-time stochastic process T , which at everytimestep picks a job of size St Dt and reward Rt.Note that St,Rt and the probability distribution Dt allare random variables that depend on the outcomes ofthe previous timesteps 0, 1, . . . , t 1 (since the actualjob that the particle sees after going t hops depends onpast outcomes). We stop this process at the first (ran-dom) timestep tend such thattendt=0 St (B tend)this is the natural point to stop, since it is preciselythe timestep when the total processing plus the totaldistance traveled exceeds the budget B.Some notation: Nodes will correspond to statesof the decision tree Opt, whereas vertices are pointsin the metric (V, d). The level t of a node x in Optis the number of hops in the decision tree from rootto reach xthis is the timestep when the stochasticprocess would reach x, or equivalently the travel time toreach the corresponding vertex in the metric. We denotethis by level(x). Let label(x) be the vertex labeling x.We abuse notation and use Sx, Rx and x to denote thesize, reward, and size distribution for node xhenceSx = Slabel(x), Rx = Rlabel(x) and x = label(x). We usex x to denote that x is an ancestor of x.Now to begin the proof of Theorem 6.1. Firstly,we assume that there are no co-located stochastic jobs(i.e., there is only one job at every vertex); note thatthis also implies that we have to travel for a non-zero integral distance between jobs. (This is only tosimplify the exposition of the proof; we explain how todischarge this assumption in Section 6.1.) The proofproceeds by assuming that no single-vertex tour hasreward (Opt/ logdlog Be), and then going on to showthat one of the valid KnapOrient instances must havelarge reward. Like in Section 5 we can make thefollowing observation about the optimal solution (thestatement is identical to Observation 1, we simply stateit for continuity).Observation 2. Assume that no single-vertex tour hasreward (Opt/ logdlog Be). Then Opt can skip all visitsto vertices u such that PrSuu [Su > (B d(, u))] 1/2, and still preserve a constant factor of the totalreward.Proof. See proof of Observation 1.Defining the Frontiers. Henceforth, we will focuson the decision tree Opt and the induced stochasticprocess T . Consider any intermediate node x and thesample path from the root to x in Opt. We call x astar node if x is the first node along this sample pathfor which the following condition is satisfied:xx x (B level(x)) 8K (B level(x))(6.5)Observe that no star node is an ancestor of another starnode. To get a sense of this definition of star nodes,ignore the truncation for a moment: then x is a starnode if the expected sizes of all the level(x) jobs on thesample path until x sum to at least 8K(B level(x)).But since weve spent level(x) time traveling to reach x,the process only goes below x if the actual sizes of thejobs is at most B level(x); i.e., if the sizes of the jobsare a factor K smaller than their expectations. If thiswere an unlikely event, then pruning Opt at the starnodes would result in little loss of reward.And that is precisely what we show. The followinglemma is the main technical component of the reduc-tion, and bounds the probability that the random pro-cess T visits nodes that are descendants of star nodes.Lemma 6.1. The process T continues beyond a starnode with probability at most 1/10, if K =(logdlog Be).Proof. We group the star nodes into dlog Be + 1 bandsbased on the value of (B level(x)). Say that a starnode x is in band i if (B level(x)) (B/2i+1, B/2i]for 0 i dlog Be, and in band (dlog Be + 1) iflevel(x) = B. Firstly note that Opt can never continuebeyond a star node at band dlog Be + 1 since it wouldrun out of budget by then. Therefore we focus on aparticular band i {0, 1, . . . , dlog Be} and bound theprobability of T reaching a star node in band i by1/(10(dlog Be+ 1)); then a trivial union bound over alli suffices.In order to bound the probability, consider thefollowing altered stochastic process Ti: follow T as longas it could lead to a star node in band i. If we reacha node y such that there is no band-i star node as adescendant of y, then we stop the process Ti at y. Elsewe stop when we reach a star node in band i. Anillustration of the optimal decision tree, the differentbands and altered processes is given in Figure 6.3.By a straightforward coupling argument, the prob-abilities of reaching a band-i star node in T and in Tiare identical, and hence it suffices to bound the corre-sponding probability of continuing beyond a band-i starnode in Ti. Therefore, let us fix a particular band i.level(v)Opt(1)star nodesfrontierFigure 6.3: Optimal decision tree example: dashed linesindicate the bands, indicates star nodes.Claim 1. For each i {0, 1, . . . , dlog Be}, and any starnode x in band i, the following inequality holds:2KB2ixxx(B/2i+1) 17K B2iProof. By the definition of a star node (6.5), and sinceband-i star node x has B/2i+1 B level(x) B/2i,xxx(B/2i+1) xxx(12(B level(x))) 12xxx (B level(x)) 4K(B level(x)) 2K B2i.The first two inequalities used the monotonicity andsubadditivity of x ().Moreover, since y, the parent node of x, is not a starnode, it satisfiesxy x (B level(y)) < 8K(B level(y)) = 8K(B level(x) + 1). But since we arecurrently not considering band number dlog Be+1 (andall distances are at least 1), level(x) B 1, and henceB level(x) + 1 2(B level(x)) 2B/2i. Thus wehavexy x (B level(y)) < 16K B/2i. Now,xxx(B/2i+1)=xyx(B/2i+1)+ x(B/2i+1)xyx (B level(y)) + B2i+1 16K B2i+B2i+1 17K B2iThe first inequality uses B level(y) B level(x) B/2i+1. This completes the proof.Claim 2. For any i {0, 1, . . . , dlog Be + 1} and anystar node x in band i, if process Ti reaches x thenxxmin(Sx ,B2i+1) B2iProof. Clearly, if process Ti reaches node x, it mustmean thatxx Sx (B level(x)) B/2i, else wewould have run out of budget earlier. The claim followsbecause truncation can only decrease the LHS.We now finish upper bounding the probability ofreaching a star node in band i using a Martingale anal-ysis. Define a sequence of random variables {Zt, t =0, 1, . . .} whereZt =tt=0(min{St ,B2i+1} St(B/2i+1)).(6.6)Since the subtracted term is precisely the expectationof the first term, the one-term expected change is zeroand the sequence {Zt} forms a martingale. In turn,E[Z ] = 0 for any stopping time . We will define to be the time when the process Ti endsrecall thatthis is the first time when (a) either the process reachesa band-i star node, or (b) there is no way to get to aband-i star node in the future.Claim 2 says that when Ti reaches any star nodex, the sum over the first terms in (6.6) is at mostB/2i, whereas Claim 1 says the sums of the meansis at least 2K B2i . Because K 1, we can infer thatthe Zt K(B/2i) for any star node (at level t). Tobound the probability of reaching a star node in Ti,we appeal to Freedmans concentration inequality formartingales, which substantially generalizes the Azuma-Hoeffding inequality.Theorem 6.2. (Freedman [13] (Theorem 1.6))Consider a real-valued martingale sequence {Xk}k0such that X0 = 0, and E [Xk+1 | Xk, Xk1, . . . , X0] = 0for all k. Assume that the sequence is uniformlybounded, i.e., |Xk| M almost surely for all k. Nowdefine the predictable quadratic variation process of themartingale to beWk =kj=0 E[X2k | Xk1, Xk2, . . . X0]for all k 1. Then for all l 0 and 2 > 0, and anystopping time we havePr|j=0Xj | l and W 2 2 exp( l2/22 + Ml/3)We apply the above theorem to the Martingaledifference sequence {Xt = Zt Zt1}. Now, since eachterm Xt is just min(St, B2i+1 ) St(B/2i+1), we getthat E [Xt | Xt1, . . .] = 0 by definition of St(B/2i+1).Moreover, since the sizes and means are both truncatedat B/2i+1, we have |Xt| B/2i+1 w.p 1; hence we canset M = B/2i+1. Finally in order to bound the varianceterm Wt we appeal to Claim 1. Indeed, consider a singler.v Xt = min(St, B2i+1 ) St(B/2i+1), and suppose weabbreviate min(St, B2i+1 ) by Y for convenience. Then:E[X2t | Xt1, . . .]= E[(Y E [Y ])2]= E[Y 2] E[Y ]2 Ymax E[Y ] B2i+1 St(B/2i+1)Here, the first inequality uses Y 0 and Ymax asthe maximum value of Y . The last inequality usesthe definition of Y . Hence the term Wt is at most(B/2i+1)tt St(B/2i+1)for the process at timet. Now from Claim 1 we have that for any star node(say at level t) in band i, we havett t(B/2i+1) 17K(B/2i). Therefore we have Wt 9K (B/2i)2 forstar nodes, and we set 2 to be this quantity.So by setting ` = K(B/2i), 2 = 9K(B/2i)2, andM = B/2i+1, we get that Pr[ reaching star node in Ti]is at most:Pr[|Z | K(B/2i) and W 9K(B/2i)2] 2 eK/20.Setting K = (logdlog Be) and performing a simpleunion bound calculation over the dlog Be bands com-pletes the proof of Lemma 6.1.Consider the new tree Opt obtained by truncat-ing Opt at the star nodes (of any level). Since byLemma 6.1 we reach star nodes with probability at most1/10, Opt has reward at least 9Opt/10. This uses thefact that the expected conditional reward in any subtreeof Opt is itself at most Opt: so the reward from Optlost to truncation is at most Opt/10.Claim 3. Every leaf node s in tree Opt satisfiesxs x (B level(s) + 1) 9K (B level(s) + 1).Proof. Let s denote the parent of leaf node s. Notethat level(s) = level(s) 1, and s is not a star node(nor a descendant of one). So:xsx (B level(s) + 1) =xsx (B level(s))< 8K(B level(s)).Clearly s (B level(s) + 1) B level(s) + 1. Addingthese inequalities, we obtain the claim.So by an averaging argument, there exists a samplepath P in Opt to leaf node s (say belonging to bandi) of length level(s) with (a) the sum of rewards ofnodes visited along this path being 9Opt/10 and(b) the sum of means (truncated at B level(s) + 1)of jobs being 9K (B level(s) + 1). By chargingagainst a single-vertex tour, we may assume WLOGthat level(s) B 1. Then we have:xsx(B level(s)2)xsx (B level(s) + 1) 18K (B level(s))Thus we can apply Lemma 5.3 on path =P of length level(s), and parameters R = 910Opt,W = B level(s) and K = 18K to obtain that thevalid KnapOrient instance Iko(W ) has optimal value(Opt/ log log B). This completes the proof of Theo-rem Handling Co-Located Jobs To help with thepresentation in the above analysis, we assumed thata node x which is at depth l in the decision tree forOpt is actually processed after the adaptive tour hastraveled a distance of l. In particular, this meant thatthere is at most one stochastic job per node (because ifOpt is done with processing some job, it has to travelat least 1 timestep in the decision tree because itsdepth increases). However, if we are more careful indefining the truncations of any node (in equation (6.5))by its actual length along the tour, instead of simply itsdepth/level in the tree, then we will be able to handleco-located jobs in exactly the same manner as above. Inthis situation, there could be several nodes in a samplepath which have the same truncation threshold but itis not difficult to see that the rest of the analysis wouldproceed in an identical fashion.7 Stochastic Orienteering with CorrelatedRewards (CorrOrient)In this section we consider the stochastic orienteeringproblem where the reward of each job is a random vari-able that may be correlated with its size. Crucially, thedistributions at different vertices are still independent ofeach other. For any vertex v, we are given a probabil-ity distribution over (size, reward) pairs as follows: foreach t {0, 1, . . . , B}, the job at v has size t and rewardrv,t with probability v,t. Again we consider the settingwithout cancellations, so once a job is started it mustbe run to completion unless the budget B is exhausted.The goal is to devise a (possibly adaptive) strategy thatmaximizes the expected reward of completed jobs, sub-ject to the total budget (travel time + processing time)being at most B.When there is no metric in the problem instance, itis a correlated stochastic knapsack instance, and [16]show a non-adaptive algorithm which is a constant-factor approximation to the optimal adaptive strategy;this is using an LP-relaxation that is quite differentfrom that in the uncorrelated setting. The problemwith extending that approach to stochastic orienteeringis that we do not know of LP relaxations with goodapproximation guarantees even for the deterministicorienteering problem. We circumvented this issue forthe uncorrelated case by using a Martingale analysisto bypass the need for an LP relaxation, which gave adirect lower bound. We adopt a similar technique here,but as Theorem 1.4 says, our approximation ratio isonly a O(log n log B) factor here. Moreover, we showthat CorrOrient is at least as hard to approximate asthe deadline orienteering problem, for which the bestknown guarantee is an O(log n) approximation [2].7.1 The Non-Adaptive Algorithm for CorrOrientWe now get into describing our approximation algo-rithm, which proceeds via a reduction to a suitably con-structed instance of the deadline orienteering problem.This reduction proceeds via taking a Lagrangian relax-ation of the processing times, and then performing anamortized analysis to argue that the reward is high andthe budget is met with constant probability. (In thisit is similar to the ideas of [15].) However, new ideasare required to handle correlations between sizes andrewards: indeed, this is where the deadline orienteeringproblem is needed. Also, we use an interesting countingproperty (Claim 4) to avoid double-counting reward.Notation. Let Opt denote an optimal decision tree.We classify every execution of a job in this decisiontree as belonging to one of (log2 B + 1) types thus:for i [log2 B], a type-i job execution occurs whenthe processing time spent before running the job liesin the interval [2i 1, 2i+1 1). (Note that the samejob might have different types on different sample pathsof Opt, but for a fixed sample path down Opt, it canhave at most one type.) If Opt(i) is the expectedreward obtained from job runs of type i, then we haveOpt =i Opt(i), and hence maxi[log2 B] Opt(i) ( 1log B ) Opt. For all v V and t [B], let Rv,t :=Btz=0 rv,z v,z denote the expected reward when jobvs size is restricted to being at most B t. Note thatthis is the expected reward we can get from job v if itstarts at time t.7.1.1 Reducing CorrOrient to DeadlineOrient Thehigh-level idea is the following: for any fixed i, wecreate an instance of DeadlineOrient to get an O(log n)fraction of Opt(i) as reward; then choosing the best suchsetting of i gives us the O(log n log B)-approximation.To get the instance of DeadlineOrient, for each vertexv, we create several copies of it, where the copy v, tcorresponds to the situation where the job starts attime t, and hence has reward Rv,t. However, to preventthe DeadlineOrient solution from collecting reward frommany different copies of the same vertex, we makecopies of vertices only when the reward changes by ageometric factor. Indeed, for t1 < t2, if it holds thatRv,t1 2Rv,t2 , then we do not include the copy v, t1in our instance. The following claim is useful for definingsuch a minimal set of starting times for each vertex/job.Claim 4. Given any non-increasing function f : [B] R+, we can efficiently find a subset I [B] such that:f(t)2 max`I:`tf(`) ,and `I:`tf(`) 2 f(t), t [B].Proof. The set I is constructed as follows. Observe thatB is always contained in the set I, and hence for anyt [B], min{` t : ` I} is well-defined.To prove the claimed properties let I = {`i}pi=0. Forthe first property, given any t [B] let `(t) = min{` t : ` I}. Observe that if this set is empty then it mustbe that f(t) = 0 and the claim trivially holds. Nowassume that value `(t) exists and that `(t) = `i. By thedefinition of `(t) it must be that `i1 < t, and so ki t.Hence f(`i) f(ki)/2 f(t)/2; the first inequality isby choice `i, and the second as f is non-increasing.Algorithm 2 Computing the support I in Claim 4.1: let i 0, k0 0, I .2: while ki [B] and f(ki) > 0 do3: `i max{` [B] : f(`) f(ki)2}.4: ki+1 `i + 1, I I{`i}.5: i i + 1.6: output set I.We now show the second property. For any indexi, we have `i1 < ki `i < ki+1 `i+1. Note thatf(ki+1) = f(`i + 1) < f(ki)/2 by choice of `i. Sof(`i+1) f(ki+1) < f(ki)/2 f(`i1)/2. Given anyt [B] let q be the index such that `q = min{` t : ` I}. Consider the sum:iqf(`i) =j0f(`q+2j) +j0f(`q+1+2j) f(`q) + f(`q+1) 2 f(t)The first inequality uses f(`i+1) < f(`i1)/2 and ageometric summation; the last is by t `q < `q+1.Consider any i [log2 B]. Now for each v V ,apply Claim 4 to the function f(t) := Rv,t+2i1 toobtain a subset Iiv [B], which defines which copies weusenamely, we create |Iiv| co-located copies {v, ` :` Iiv} where node v, ` has deadline `. Define the sizesi(v, `) = E[min(Sv, 2i)], and ri(v, `) = Rv,`+2i1.Finally, define Ni = {v, ` : ` Iiv, v V }.Now we are ready to describe the DeadlineOrientinstance Ii() for any 0. The metric is (V, d), andfor each vertex v V , the copies Iiv are all co-locatedat v. The copy v, ` Iiv has deadline `, and a profitof ri(v, `, ) = ri(v, `) si(v). The goal is to finda path which maximizes the reward of the copies thatare visited within their deadlines. Here, should bethought of as a Lagrangean multiplier, and hence theabove instance is really a kind of Lagrangean relaxationof a DeadlineOrient instance which has a side constraintthat the total processing time is at most 2i. It isimmediate by the definition of rewards that Opt(Ii())is a non-increasing function of .The idea of our algorithm is to argue that for theright setting of , the optimal DeadlineOrient solutionfor Ii() has value (Opt(i)); moreover that we canrecover a valid solution to the stochastic orienteeringproblem from an approximate solution to Ii().Lemma 7.1. For any i [log B] and > 0, the optimalvalue of the DeadlineOrient instance Ii() is at leastOpt(i)/2 2i+1.Proof. Consider the optimal decision tree Opt of theCorrOrient instance, and label every node in Opt bya (dist, size) pair, where dist is the total time spenttraveling and size the total time spent processing jobsbefore visiting that node. Note that both dist and sizeare non-decreasing as we move down Opt. Also, type-inodes are those where 2i1 size < 2i+11, so zeroingout rewards from all but type-i nodes yields expectedreward Opt(i).For any vertex v V , let Xiv denote the indicatorr.v. that job v is run as type-i in Opt, and Sv be ther.v. denoting its instantiated sizenote that these areindependent. Also let Y i =vV Xiv min(Sv, 2i) bethe random variable denoting the sum of truncated sizesof type-i jobs. By definition of type-i, we have Y i 22iwith probability one, and hence E[Y i] 2i+1. For easeof notation let qv = Pr[Xiv = 1] for all v V . We nowhave,2i+1 E[Y i](7.7)=vVqv E[min(Sv, 2i) | Xiv = 1]=vVqv E[min(Sv, 2i)]=vVqv si(v)Now consider the expected reward Opt(i). We can write:Opt(i) =vV`[B]2i+12k=2i1Pr[1v,dist=l,size=k] Rv,`+kvV`[B]Pr[1v,type=i,size=k] Rv,`+2i1(7.8)where 1v,dist=l,size=k is the indicator for whether Optvisits v with dist = l and size = k, and 1v,type=i,size=kis the indicator for whether Opt visits v as type-i withsize = k. Now going back to the metric, let P denotethe set of all possible rooted paths traced by Opt(i) inthe metric (V, d). Now for each path P P, define thefollowing quantities. (P ) is the probability that Opt(i) traces P . For each vertex v P , dv(P ) is the travel time (i.e.dist) incurred in P prior to reaching v. Note thatthe actual time at which v is visited is dist + size,which is in general larger than dv(P ). w(P ) =vP[12 Rv,dv(P )+2i1 si(v)].Moreover, for each v P , let `v(P ) = min{` Iiv | ` dv(P )}; recall that we defined Iiv on the previous pageusing Claim 4, and since B Iiv, the quantity `v(P ) iswell-defined.For any path P P, consider P as a solution toIi() that visits the copies {v, `v(P ) : v P} withintheir deadlines. It is feasible for the instance Ii()because for each vertex in P , we defined `v(P ) dv(P )and therefore we would reach the chosen copy within itsdeadline. Moreover, the objective value of P is preciselyvPri(v, `v(P ), ) vP[Rv,`v(P )+2i1 si(v)]vP[12Rv,dv(P )+2i1 si(v)]= w(P )where the second inequality above uses the definition of`v(P ) and Claim 4. Now Opt(Ii()) is at least:maxPPw(P )PP(P ) w(P )=PP(P ) vP[12Rv,dv(P )+2i1 si(v)]=12vV`[B]Pr[1v,type=i,dist=l] Rv,`+2i1vVPr[Xiv] si(v) Opt(i)2 2i+1Above the second-to-last equatility is by interchangingsummations and splitting the two terms from the pre-vious expression, the first term in the final inequalitycomes from (7.8), and the second term comes from (7.7)and the fact that qv = Pr[Xiv] = Pr[v visited as type-i]).Now let AlgDO denote an -approximation algo-rithm for the DeadlineOrient problem.3 We focus onthe right setting of defined thus:(7.9) i = max{ : AlgDO(Ii()) 2i }Lemma 7.2. For any i [log2 B], we get i Opt(i)/2i+3, and hence AlgDO(Ii(i )) Opt(i)/8.Proof. Consider the setting = Opt(i)/2i+3; byLemma 7.1, the optimal solution to the DeadlineOrientinstance Ii() has value at least Opt(i)/4 2i . SinceAlgDO is an -approximation for DeadlineOrient, it fol-lows that AlgDO(Ii()) Opt(Ii())/ 2i /. Soi Opt(i)/2i+3, hence proved.3We will slightly abuse notation and use AlgDO(Ii()) todenote both the -approximate solution on instance Ii() as wellas its value.7.1.2 Obtaining CorrOrient solution fromAlgDO(i ) It just remains to show that the so-lution output by the approximation algorithm forDeadlineOrient on the instance Ii(i ) yields a goodnon-adaptive solution to the original CorrOrient in-stance. Let = AlgDO(i ) be this solutionhence is a rooted path that visits some set P Ni of nodeswithin their respective deadlines. The algorithm belowgives a subset Q P of nodes that we will visit in thenon-adaptive tour; this is similar to the algorithm forKnapOrient in Section 3).Algorithm 3 Algorithm Ai for CorrOrient given asolution for Ii(i ) characterized by a path P .1: let y =(vP si(v))/2i.2: partition the vertices of P into c = max(1, b2yc)parts P1, . . . , Pc s.tvPj si(v) 2i for all 1 j c.3: set Q Pk where k =arg maxcj=1v,`Pj ri(v, `).4: for each v V , define dv := min{` : v, ` Q}5: let Q := {v V : dv < } be the vertices with atleast one copy in Q.6: sample each vertex in Q independently w.p. 1/2,and visit these sampled vertices in order given byP .At a high level, the algorithm partitions the verticesin into groups, where each group obeys the size budgetof 2i in expectation. It then picks the most profitablegroup of them. The main issue with Q chosen in Step 3is that it may include multiple copies of the same vertex.But because of the way we constructed the sets Ivi(based on Claim 4), we can simply pick the copy whichcorresponds to the earliest deadline, and by discardingall the other copies, we only lose out on a constantfraction of the reward ri(Q). Our first claim boundsthe total (potential) reward of the set Q we select inStep 3.Claim 5. The sum ri(Q) =v,`Q ri(v, `) is at leastOpt(i)/(8).Proof. Consider the following chain of inequalities:ri(Q) ri(P )c i y2i + i 2i/c= i 2i (y + 1/c) i 2i min{y +1,12+12y} i 2iThe second inequality is due the the fact that 2i i / AlgDO(i ) = ri(P ) = ri(P ) i si(P ) due to thechoice of i in equation (7.9); the third inequality is byc max{1, 2y}; and the last inequality uses 2. Toconclude we simply use Lemma 7.2.Next, we show that we do not lose much of the totalreward by discarding duplicate copies of vertices in Q.Claim 6.vQ Rv,dv+2i1 Opt(i) / 16.Proof. For each u Q, let Qu = Q{u, `}`[Iiu]denote all copies of u in Q. Now by the definition ofdu we have ` du for all u, ` Qu. So for any u Q,u,`QuRu,`+2i1 `Iiu:`duRu,`+2i1 2 Ru,du+2i1Above, the last inequality uses the definition of Iiu asgiven by Claim 4. Adding over all u Q,uQRu,du+2i1 12uQu,`QuRu,`+2i1=12vQri(v) Opt(i)16Here, the last inequality avove uses Claim 5. Thiscompletes the proof.We now argue that the algorithm Ai reaches anyvertex v Q before the time dv + 2i 1 with constantprobability.Claim 7. For any vertex v Q, it holds thatPr[Ai reaches job v by time dv + 2i 1] 12 .Proof. We know that because P is a feasible tour forthe DeadlineOrient instance, the distance traveled be-fore reaching the copy v, dv is at most dv. There-fore in what remains, we show that with probabil-ity 1/2, the total size of previous vertices is at most2i 1. To this end, let Q denote the set of ver-tices sampled in Step 6. We then say that the badevent occurs ifuQ\v min(Su, 2i) 2i. Indeed ifuQ\v min(Su, 2i) < 2i, then certainly we will reachv by time dv + 2i 1.We now bound the probability of the bad event. Forthis purpose, observe thatE[ uQ\vmin(Su, 2i)] uQ12E[min(Su, 2i)],by linearity of expectation and the fact that each u issampled with probability 1/2. Now the latter sum isat most (1/2)uQ si(u) 2i1, by the choice of Q inthe partition Step 3. Hence, the probability of the badevent is at most 1/2 by Markovs inequality.Theorem 7.1. The expected reward of the algorithm Aiis at least Opt(i)/64.Proof. We know from the above claim that for eachvertex v Q, Ai reaches v by time dv + 2i 1 withprobability at least 1/2. In this event, we sample vwith probability 1/2. Therefore, the expected rewardcollected from v is at least (1/4)Rv,dv+2i1. The proofis complete by using linearity of expectation and thenClaim 6.7.2 Evidence of hardness for CorrOrient Our ap-proximation algorithm for CorrOrient can be viewedas a reduction to DeadlineOrient. We now providea reduction in the reverse direction: namely, a -approximation algorithm for CorrOrient implies a ( o(1))-approximation algorithm for DeadlineOrient. Inparticular this means that a sub-logarithmic approxima-tion for CorrOrient would also improve the best knownapproximation ratio for DeadlineOrient.Consider any instance I of DeadlineOrient on metric(V, d) with root V and deadlines {tv}vV ; the goalis to compute a path originating at that visits themaximum number of vertices before their deadlines. Wenow define an instance J of CorrOrient on the samemetric (V, d) with root . Fix parameter 0 < p 1n2 .The job at each v V has the following distribution:size Btv (and reward 1/p) with probability p, and sizezero (and reward 0) with probability 1p. To completethe reduction from DeadlineOrient to CorrOrient we willshow that (1 o(1)) Opt(I) Opt(J ) (1 + o(1)) Opt(I).Let be any solution to I that visits subset S Vof vertices within their deadline; so the objective valueof is |S|. This also corresponds to a (non-adaptive)solution to J . For any vertex v S, the probabilitythat zero processing time has been spent prior to v is atleast (1 p)n; in this case, the start time of job v is atmost tv (recall that visits v S by time tv) and hencethe conditional expected reward from v is p 1p = 1 (sincev has size B tv and reward 1/p with probability p). Itfollows that the expected reward of as a solution to Jis at leastvS(1p)n |S| (1np) = (1o(1)) |S|.Choosing to be the optimal solution to I, we have(1 o(1)) Opt(I) Opt(J ).Consider now any adaptive strategy for J , withexpected reward R(). Define path 0 as one startingfrom the root of that always follows the branchcorresponding to size zero instantiation. Consider 0 asa feasible solution to I; let S0 V denote the verticeson path 0 that are visited prior to their respectivedeadlines. Clearly Opt(I) |S0|. When strategy is run, every size zero instantiation gives zero reward,and no (V \ S0)-vertex visited on 0 can fit into theknapsack (with positive size instantiation); so(7.10) Pr [ gets positive reward] p |S0|Moreover, since the reward is always an integral multi-ple of 1/p,R() =1pni=1Pr [ gets reward at least i/p]=1p Pr [ gets positive reward](7.11)+1pni=2Pr [ gets reward at least i/p]Furthermore, for any i 2, wehave: Pr [ gets reward at least i/p] Pr [at least i jobs instantiate to positive size] (ni) pi (np)i. It follows that the second term in (7.11)can be upper bounded by 1p ni=2(np)i 2n2p. Com-bining this with (7.10) and (7.11), we obtain thatR() |S0| + 2n2p = |S0| + o(1). Since thisholds for any adaptive strategy for J , we getOpt(I) (1 o(1)) Opt(J ).8 Stochastic Orienteering with CancellationsWe now describe a black-box way to handle the settingwhere an optimal strategy can actually abort a job dur-ing its processing (say, if it turns out to be longer thanexpected). We first note that even for the basic caseof Stochastic Knapsack, there are very simple exam-ples that demonstrate an arbitrarily large gap in the ex-pected reward for strategies which can cancel and thosethat cant. The idea is simple essentially we createup to B co-located copies of every job v, and imaginethat copy v, t has expected reward tt=0 v,trv,t , i.e.,it fetches reward only when the size instantiates to atmost t. It is easy to see that any adaptive optimal so-lution, when it visits a vertex in fact just plays somecopy of it (which copy might depend on the history ofthe sample path taken to reach this vertex). Thereforewe can find a good deterministic solution with suitablylarge reward (this is the KnapOrient problem for the un-correlated case and the DeadlineOrient problem for thecorrelated case). Now the only issue is when we trans-late back from the deterministic instance to a solutionfor the StocOrient instance: the deterministic solutionmight collect reward from multiple copies of the samejob. We can fix this by again using the geometric scalingidea (using Claim 4), i.e., if there are two copies of the(roughly the) same reward, we only keep the one withthe earlier cancellation time. This way, we can ensurethat for all remaining copies of a particular job, the re-wards are all at least geometric factors away. Now, evenif the deterministic solution collects reward from multi-ple copies of a job, we can simply play the one amongstthem with best reward, and well have collected a con-stant fraction of the total.References[1] Micah Adler and Brent Heeringa. ApproximatingOptimal Binary Decision Trees. In APPROX, pages19, 2008.[2] Nikhil Bansal, Avrim Blum, Shuchi Chawla, and AdamMeyerson. Approximation algorithms for deadline-TSP and vehicle routing with time-windows. In STOC,pages 166174.[3] Anand Bhalgat. A (2 + )-approximation al-gorithm for the stochastic knapsack prob-lem. At, 2011.[4] Anand Bhalgat, Ashish Goel, and Sanjeev Khanna.Improved approximation results for stochastic knap-sack problems. In SODA, 2011.[5] Avrim Blum, Shuchi Chawla, David R. Karger, TerranLane, Adam Meyerson, and Maria Minkoff. Approx-imation algorithms for orienteering and discounted-reward TSP. SIAM J. Comput., 37(2):653670, 2007.[6] Ann Melissa Campbell, Michel Gendreau, and Bar-rett W. Thomas. The orienteering problem withstochastic travel and service times. Annals OR,186(1):6181, 2011.[7] Chandra Chekuri, Nitish Korula, and Martin Pal. Im-proved algorithms for orienteering and related prob-lems. In SODA, pages 661670.[8] Chandra Chekuri and Amit Kumar. Maximum cover-age problem with group budget constraints and appli-cations. In APPROX-RANDOM, pages 7283, 2004.[9] Chandra Chekuri and Martin Pal. A recursive greedyalgorithm for walks in directed graphs. In FOCS, pages245253, 2005.[10] Ke Chen and Sariel Har-Peled. The Euclideanorienteering problem revisited. SIAM J. Comput.,38(1):385397, 2008.[11] Ning Chen, Nicole Immorlica, Anna R. Karlin, Mo-hammad Mahdian, and Atri Rudra. Approximatingmatches made in heaven. In ICALP (1).[12] Brian C. Dean, Michel X. Goemans, and Jan Vondrak.Approximating the stochastic knapsack problem: Thebenefit of adaptivity. Math. Oper. Res., 33(4):945964,2008.[13] David A. Freedman. On tail probabilities for martin-gales. Annals of Probability, 3:100118, 1975.[14] Sudipto Guha and Kamesh Munagala. Approximationalgorithms for budgeted learning problems. In STOC,pages 104113.[15] Sudipto Guha and Kamesh Munagala. Multi-armedbandits with metric switching costs. In ICALP, pages496507, 2009.[16] Anupam Gupta, Ravishankar Krishnaswamy, MarcoMolinaro, and R. Ravi. Approximation algorithms forcorrelated knapsacks and non-martingale bandits. Toappear in FOCS 2011, arxiv: abs/1102.3749, 2011.[17] Anupam Gupta, Viswanath Nagarajan, and R. Ravi.Approximation algorithms for optimal decision treesand adaptive tsp problems. In ICALP (1), pages 690701, 2010.[18] S. Rao Kosaraju, Teresa M. Przytycka, and Ryan S.Borgstrom. On an Optimal Split Tree Problem. InWADS, pages 157168, 1999.


View more >