Date :

Manufacturers and suppliers can often leverage economies of scale to reduce their manufacturing and

transportation costs, since it dilutes the impact of fixed costs. Many actors in supply chains choose to shift

the burden of these fixed costs to their purchasing entities, by introducing mandatory minimum order size

(for example on the total monetary value of the order). We define the Minimum Order Quantity (MOQ)

problem as the inventory control problem that purchasing entities which are themselves retailers face in this

situation. A retailer in this situation should aim to replenish its inventory stock levels to a balanced state,

to avoid stock-outs and overstocks, while satisfying its supplier constraint.

There are several methods in the literature that deal with simplified versions of this problem, notably

for the single item or the stationary demand versions of the problem. However, no state of the art solution

was able to provide a realistic solution to the multi-item, variable demand version of the problem.

The main contributions of this thesis are two methods that compute approximate solutions to this problem.

The first one is the w-policy, a heuristic based on several assumptions of the system. These assumptions

were justified by a extensive analysis of the MOQ problem. They drastically reduce the complexity of the

computation of the value functions, which leads to an efficient computation of an approximate solution. The

scope of applicability of the w-policy is however bounded, and this policy is inapplicable in some specific

settings. In order to overcome this limitation, we developed a second method that we called the ’hybdrid’

1

policy. This method combines reinforcement learning techniques (notably deep Q-learning) with some ideas

from the w-policy. We demonstrate the ability of these two methods to solve the MOQ problem efficiently

on simulated and real datasets, on scales that were unprecedented (up to ten thousand items).