Internet of Things (IoT) devices (e.g., sensors and actuators) are being actively deployed at scale to automate and control many aspects of physical environments. While models (e.g., ranging from black box to white box approaches) can be used to support the actuation of such devices, as the scale of these instrumentations grows, the complexity of the required model increases. Consequently, controlling and coordinating these devices at a massive scale to achieve a common goal (e.g., flatten demand) or individual goals (e.g., minimize cost) becomes a challenge. The goal of this work is to explore applying Reinforcement Learning (RL) techniques to actuate IoT devices without the need for an explicit model. In particular, we are currently exploring the application of RL techniques to control smart buildings, to react to Demand Response (DR) events from the electric utility.