Vietnamese Visual Question Answering
Baseline deep learning model for Vietnamese VQA research, published at ICISN 2025
Overview
Designed and validated a baseline deep learning model for Vietnamese Visual Question Answering research, combining computer vision and natural language processing to generate interpretable answers with explanations.
Technical Details
Technologies Used
- Computer Vision: Image feature extraction models
- NLP: Vietnamese language models
- Framework: Deep learning multimodal architecture
- Dataset: 32,886 QA pairs
Research Contributions
- First baseline model for Vietnamese VQA with explanations
- Systematic experiments on large-scale Vietnamese dataset
- Demonstrated visual understanding with Vietnamese reasoning
- Foundation for future multimodal AI research
Results & Impact
- Published at ICISN 2025
- Validated on 32,886 QA pairs
- Model generates interpretable answers with explanations
- Establishes foundation for Vietnamese multimodal AI research
Publication
Duong, T.-B., Tran, H.-M., Le-Nguyen, B.-N., & Duong, D.-T. (2026). An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset. Proceedings of the Fifth International Conference on Intelligent Systems and Networks, 164–173. https://doi.org/10.1007/978-981-95-1746-6_18