Towards safe reinforcement-learning in industrial grid-warehousing