Visual object tracking methods depend upon deep networks that can hardly meet real-time processing requirements on mobile platforms with limited computing resources. In this work, we propose a real-time object tracking framework by enhancing a lightweight feature pyramid network with Transformer architecture to construct a robust target-specific appearance model efficiently. We further introduce the pooling attention module to avoid the computation and memory intensity while fusing pyramid features with the Transformer. The optimized tracker operates over 45 Hz on a single CPU, allowing researchers to deploy it on any mobile device with limited power resources.