As the advent of autonomous vehicle (AV) technology revolutionizes transportation, it simultaneously introduces new vulnerabilities to cyber-attacks, posing significant challenges to vehicle safety and security. The complexity of these systems, coupled with their increasing reliance on advanced computer vision and machine learning algorithms, makes them susceptible to sophisticated AV attacks. This paper explores the potential of Large Multimodal Models (LMMs) in identifying Natural Denoising Diffusion (NDD) attacks on traffic signs. Our comparative analysis show the superior performance of LMMs in detecting NDD samples with an average accuracy of 82.52% across the selected models compared to 37.75% for state-of-the-art deep learning models. We further discuss the integration of LMMs within the resource-constrained computational environments to mimic typical autonomous vehicles and assess their practicality through latency benchmarks. Results show substantial superiority of GPT models in achieving lower latency, down to 4.5 seconds per image for both computation time and network latency (RTT), suggesting a viable path towards real-world deployability. Lastly, we extend our analysis to LMMs’ applicability against a wider spectrum of AV attacks, particularly focusing on the Automated Lane Centering systems, emphasizing the potential of LMMs to enhance vehicular cybersecurity.