CV
This is a description of the page. You can modify it in '_pages/cv.md'. You can also change or remove the top pdf download button.
Basics
| Name | Tran Hoang Minh |
| Label | AI Research Engineer |
| minhhtran.work@gmail.com | |
| Phone | +84-967-628-624 |
| Url | https://minhhoang2705.github.io |
| Summary | Artificial Intelligence graduate with a B.Sc. and practical experience in Computer Vision (CV) and Natural Language Processing (NLP), including projects with Large Language Models (LLMs) and Multimodal ML. Proficient in Python, PyTorch, TensorFlow, and adept at developing and optimizing deep learning models. Leverage strong analytical and problem-solving skills to develop and implement AI solutions that drive measurable impact within collaborative team settings. |
Work
-
2025.02 - 2025.08 Ho Chi Minh City, Vietnam
AI Engineer
Rainscales
Developed face authentication systems for staff check-in/check-out, focusing on robust time-tracking solutions and deployment on company infrastructure.
- Developed a lite PoC face authentication system for staff check-in/check-out, addressing client requirements for robust time-tracking and improving attendance accuracy by 10%
- Developed and deployed a pipeline to authenticate terminal faces, incorporating data processing (detection, alignment, image quality enhancement), extracting facial features using DeepFace and InsightFace models, and similarity search via Milvus for efficient querying of large databases
- Developed API endpoints for model inference, enabling deployment on company infrastructure
-
2023.09 - 2023.12 Binh Dinh, Vietnam
AI Engineer Intern
Quy Nhon AI Center
Designed and optimized deep learning models for OCR, focusing on Vietnamese text recognition and data augmentation techniques.
- Designed and optimized deep learning models with PyTorch, achieving a 15% increase in accuracy and efficiency
- Leveraged data augmentation techniques for Vietnamese text, including light rotation, brightness adjustment, and random noise addition, to refine the PaddleOCR model and enhance input image recognition accuracy
- Collaborated on OCR project, focusing on data labeling (for text detection and text recognition), validation protocols, and rigorous model testing to ensure robustness
Education
-
2022.01 - 2025.01 Vietnam
Bachelor of Science
FPT University
Artificial Intelligence
- Python Programming
- Data Structure & Algorithms
- Database System
- Machine Learning
- Deep Learning
- Computer Vision
- Natural Language Processing
- Data Science
Publications
-
2025.11.01 An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset
Proceedings of the Fifth International Conference on Intelligent Systems and Networks (ICISN 2025). Lecture Notes in Networks and Systems, vol 1596. Springer, Singapore
Research on automated pipeline for constructing Vietnamese Visual Question Answering with Natural Language Explanations dataset. Co-authored with Truong-Binh Duong, Binh-Nam Le-Nguyen, and Dinh-Thang Duong.
Skills
| Programming Languages | |
| Python | |
| C |
| Machine Learning & Deep Learning | |
| PyTorch | |
| TensorFlow | |
| Scikit-learn | |
| Pandas | |
| NumPy |
| AI Frameworks & Tools | |
| LangChain | |
| LangGraph | |
| Hugging Face | |
| OpenCV | |
| PaddleOCR |
| Databases & Vector Stores | |
| PostgreSQL | |
| Microsoft SQL Server | |
| Milvus | |
| Qdrant |
| Cloud & DevOps | |
| Google Cloud Platform (GCP) | |
| Docker | |
| Kubernetes (K8s) | |
| Terraform | |
| Git |
| Monitoring & Observability | |
| Prometheus | |
| Grafana | |
| Loki | |
| Jaeger | |
| Evidently |
Languages
| Vietnamese | |
| Native speaker |
| English | |
| Professional working proficiency (IELTS 6.0 - B2) |
Interests
| Artificial Intelligence | |||||||
| Large Language Models (LLMs) | |||||||
| Vision-Language Models (VLMs) | |||||||
| Multimodal Learning | |||||||
| Retrieval-Augmented Generation (RAG) | |||||||
| Computer Vision | |||||||
| Natural Language Processing | |||||||
| Research Areas | ||||||
| Spatial Intelligence | ||||||
| Physical AI | ||||||
| Embodied AI | ||||||
| MLOps | ||||||
| Production AI Systems | ||||||
Projects
- - Present
IntelliRAG System
Architected and deployed a cloud-native RAG platform on GKE utilizing FastAPI, Kubernetes, and GPU-accelerated inference (vLLM/KServe with autoscaling), achieving high-throughput document processing and low-latency query response.
- Engineered semantic search infrastructure with Qdrant vector database and automated document ingestion pipeline using LangChain, Docling, and Sentence Transformers
- Established production MLOps workflow with MLFlow/DVC for versioning, comprehensive observability via Prometheus/Grafana/Jaeger/Evidently
- Infrastructure-as-code deployment using Terraform and Helm on Google Kubernetes Engine (GKE)
- 2024.09 - 2024.12
Action Retrieval from Building CCTV Footage
Researched, tested, and evaluated the performance of text-video retrieval models (CLIP4Clip, Frozen-in-time, InternVideo) using specific indicators (Recall@k, Precision@k) to select and propose the most effective model for enhancing the action retrieval system.
- Evaluated multiple text-video retrieval models with performance metrics (Recall@k, Precision@k)
- Collaborated in building and managing data pipelines to ingest vector pairs (video segment - video caption) into Milvus, optimizing for low-latency video retrieval
- - Present
Vietnamese Text Recognition
Designed and built an end-to-end Vietnamese OCR system with a Tkinter-based GUI, integrating fine-tuned PaddleOCR models.
- Conducted comparative research and experimentation to select optimal models for text detection and recognition on a proprietary Vietnamese text dataset
- Improved text recognition accuracy by 10% through the implementation of advanced data enhancement techniques, including the creation of synthetic images featuring diverse and highly effective fonts and context tailored for text recognition on advertising plates and product packaging