Publications

Preprints

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

Published in arXiv preprint, 2026

We investigate zero-shot cross-city generalization in end-to-end autonomous driving, focusing on the role of visual representation learning. By enforcing strict geographic splits across cities in nuScenes and NAVSIM, we isolate the effect of backbone pretraining while keeping the planning architecture fixed. Our results show that supervised ImageNet-pretrained models suffer significant performance degradation when transferred across cities, particularly under shifts in driving conventions. In contrast, self-supervised representations such as I-JEPA, DINOv2, and MAE consistently improve cross-city robustness, highlighting representation learning as a key factor for generalization in autonomous driving.

Recommended citation: F. Naeinian, A. Hamza, H. Zhu, A. Choromanska, 'Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations,' arXiv:2603.11417, 2026.
Download Paper

Conference Papers

Mapping Human Grasping to 3-Finger Grippers: A Deep Learning Perspective

Published in 32nd International Conference on Electrical Engineering (ICEE 2024), 2024

We present a deep learning approach to map human grasping patterns to 3-finger robotic grippers. A dataset was generated using human hand features, with pre-processing steps employing MediaPipe to extract precise finger coordinates. The model was trained using object detection and computer vision techniques to identify optimal grasping points for robotic manipulation.

Recommended citation: F. Naeinian, E. Balazadeh, M. Tale Masouleh, 'Mapping Human Grasping to 3-Finger Grippers: A Deep Learning Perspective,' 2024 32nd International Conference on Electrical Engineering (ICEE), pp. 1–7, 2024.
Download Paper

Fatemeh Naeinian

Publications

Preprints

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

Conference Papers

Mapping Human Grasping to 3-Finger Grippers: A Deep Learning Perspective