Manufacturers and suppliers can often leverage economies of scale to reduce their manufacturing and
transportation costs, since it dilutes the impact of fixed costs. Many actors in supply chains choose to shift
the burden of these fixed costs to their purchasing entities, by introducing mandatory minimum order size
(for example on the total monetary value of the order). We define the Minimum Order Quantity (MOQ)
problem as the inventory control problem that purchasing entities which are themselves retailers face in this
situation. A retailer in this situation should aim to replenish its inventory stock levels to a balanced state,
to avoid stock-outs and overstocks, while satisfying its supplier constraint.
There are several methods in the literature that deal with simplified versions of this problem, notably
for the single item or the stationary demand versions of the problem. However, no state of the art solution
was able to provide a realistic solution to the multi-item, variable demand version of the problem.
The main contributions of this thesis are two methods that compute approximate solutions to this problem.
The first one is the w-policy, a heuristic based on several assumptions of the system. These assumptions
were justified by a extensive analysis of the MOQ problem. They drastically reduce the complexity of the
computation of the value functions, which leads to an efficient computation of an approximate solution. The
scope of applicability of the w-policy is however bounded, and this policy is inapplicable in some specific
settings. In order to overcome this limitation, we developed a second method that we called the ’hybdrid’
policy. This method combines reinforcement learning techniques (notably deep Q-learning) with some ideas
from the w-policy. We demonstrate the ability of these two methods to solve the MOQ problem efficiently
on simulated and real datasets, on scales that were unprecedented (up to ten thousand items).