We could also train a DNC by reinforcement learning. In this framework, we let the DNC produce actions but never show it the answer. Instead, we score it with points when it has produced a good sequence of actions (like the children’s game “hot or cold”). We connected a DNC to a simple environment with coloured blocks arranged in piles. We would give it instructions for goals to achieve: “Put the light blue block below the green; the orange to the left of the red; the purple below the orange; the light blue to the right of the dark blue; the green below the red; and the purple to the left of the green”.