Rolling bearings are the most crucial components of rotating machinery. Identifying defective bearings in a timely manner may prevent the malfunction of an entire machinery system. The mechanical condition monitoring field has entered the big data phase as a result of the fast advancement of machine parts. When working with large amounts of data, the manual feature extraction approach has the drawback of being inefficient and inaccurate. Data-driven methods like Deep Learning have been successfully used in recent years for mechanical intelligent fault detection. Convolutional neural networks (CNNs) were mostly used in earlier research to detect and identify bearing faults. The CNN model, however, suffers from the drawback of having trouble managing fault-time information, which results in a lack of classification results. In this study, bearing defects have been classified using a state-of-the-art Vision Transformer (ViT). Bearing defects were classified using Case Western Reserve University (CWRU) bearing failure laboratory experimental data. The research took into account 13 distinct kinds of defects under 0-load situations in addition to normal bearing conditions. Using the Short Time Fourier Transform (STFT), the vibration signals were converted into 2D time-frequency images. The 2D time-frequency images are then used as input parameters for the ViT. The model achieved an overall accuracy of 98.8%.