CV | Hoang-Minh Tran

Basics

Work

Education

Publications

Skills

Languages

Interests

Projects

Basics

Work

Education

Publications

Skills

Languages

Interests

Projects

AI Engineer

Rainscales

Developed face authentication systems for staff check-in/check-out, focusing on robust time-tracking solutions and deployment on company infrastructure.

AI Engineer Intern

Quy Nhon AI Center

Designed and optimized deep learning models for OCR, focusing on Vietnamese text recognition and data augmentation techniques.

Bachelor of Science

FPT University

Artificial Intelligence

An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

Proceedings of the Fifth International Conference on Intelligent Systems and Networks (ICISN 2025). Lecture Notes in Networks and Systems, vol 1596. Springer, Singapore

Research on automated pipeline for constructing Vietnamese Visual Question Answering with Natural Language Explanations dataset. Co-authored with Truong-Binh Duong, Binh-Nam Le-Nguyen, and Dinh-Thang Duong.

IntelliRAG System

Architected and deployed a cloud-native RAG platform on GKE utilizing FastAPI, Kubernetes, and GPU-accelerated inference (vLLM/KServe with autoscaling), achieving high-throughput document processing and low-latency query response.

Action Retrieval from Building CCTV Footage

Researched, tested, and evaluated the performance of text-video retrieval models (CLIP4Clip, Frozen-in-time, InternVideo) using specific indicators (Recall@k, Precision@k) to select and propose the most effective model for enhancing the action retrieval system.

Vietnamese Text Recognition

Designed and built an end-to-end Vietnamese OCR system with a Tkinter-based GUI, integrating fine-tuned PaddleOCR models.

Name	Tran Hoang Minh
Label	AI Research Engineer
Email	minhhtran.work@gmail.com
Phone	+84-967-628-624
Url	https://minhhoang2705.github.io
Summary	Artificial Intelligence graduate with a B.Sc. and practical experience in Computer Vision (CV) and Natural Language Processing (NLP), including projects with Large Language Models (LLMs) and Multimodal ML. Proficient in Python, PyTorch, TensorFlow, and adept at developing and optimizing deep learning models. Leverage strong analytical and problem-solving skills to develop and implement AI solutions that drive measurable impact within collaborative team settings.

2025.02 - 2025.08

Ho Chi Minh City, Vietnam
AI Engineer

Rainscales

Developed face authentication systems for staff check-in/check-out, focusing on robust time-tracking solutions and deployment on company infrastructure.
- Developed a lite PoC face authentication system for staff check-in/check-out, addressing client requirements for robust time-tracking and improving attendance accuracy by 10%
- Developed and deployed a pipeline to authenticate terminal faces, incorporating data processing (detection, alignment, image quality enhancement), extracting facial features using DeepFace and InsightFace models, and similarity search via Milvus for efficient querying of large databases
- Developed API endpoints for model inference, enabling deployment on company infrastructure
2023.09 - 2023.12

Binh Dinh, Vietnam
AI Engineer Intern

Quy Nhon AI Center

Designed and optimized deep learning models for OCR, focusing on Vietnamese text recognition and data augmentation techniques.
- Designed and optimized deep learning models with PyTorch, achieving a 15% increase in accuracy and efficiency
- Leveraged data augmentation techniques for Vietnamese text, including light rotation, brightness adjustment, and random noise addition, to refine the PaddleOCR model and enhance input image recognition accuracy
- Collaborated on OCR project, focusing on data labeling (for text detection and text recognition), validation protocols, and rigorous model testing to ensure robustness

2022.01 - 2025.01

Vietnam
Bachelor of Science

FPT University

Artificial Intelligence
- Python Programming
- Data Structure & Algorithms
- Database System
- Machine Learning
- Deep Learning
- Computer Vision
- Natural Language Processing
- Data Science

2025.11.01

An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

Proceedings of the Fifth International Conference on Intelligent Systems and Networks (ICISN 2025). Lecture Notes in Networks and Systems, vol 1596. Springer, Singapore

Research on automated pipeline for constructing Vietnamese Visual Question Answering with Natural Language Explanations dataset. Co-authored with Truong-Binh Duong, Binh-Nam Le-Nguyen, and Dinh-Thang Duong.

	Programming Languages
	Python
	C

	Machine Learning & Deep Learning
	PyTorch
	TensorFlow
	Scikit-learn
	Pandas
	NumPy

	AI Frameworks & Tools
	LangChain
	LangGraph
	Hugging Face
	OpenCV
	PaddleOCR

	Databases & Vector Stores
	PostgreSQL
	Microsoft SQL Server
	Milvus
	Qdrant

	Cloud & DevOps
	Google Cloud Platform (GCP)
	Docker
	Kubernetes (K8s)
	Terraform
	Git

	Monitoring & Observability
	Prometheus
	Grafana
	Loki
	Jaeger
	Evidently

	Vietnamese
	Native speaker

	English
	Professional working proficiency (IELTS 6.0 - B2)

	Artificial Intelligence
	Large Language Models (LLMs)
	Vision-Language Models (VLMs)
	Multimodal Learning
	Retrieval-Augmented Generation (RAG)
	Computer Vision
	Natural Language Processing

- Present
IntelliRAG System

Architected and deployed a cloud-native RAG platform on GKE utilizing FastAPI, Kubernetes, and GPU-accelerated inference (vLLM/KServe with autoscaling), achieving high-throughput document processing and low-latency query response.
- Engineered semantic search infrastructure with Qdrant vector database and automated document ingestion pipeline using LangChain, Docling, and Sentence Transformers
- Established production MLOps workflow with MLFlow/DVC for versioning, comprehensive observability via Prometheus/Grafana/Jaeger/Evidently
- Infrastructure-as-code deployment using Terraform and Helm on Google Kubernetes Engine (GKE)
2024.09 - 2024.12
Action Retrieval from Building CCTV Footage

Researched, tested, and evaluated the performance of text-video retrieval models (CLIP4Clip, Frozen-in-time, InternVideo) using specific indicators (Recall@k, Precision@k) to select and propose the most effective model for enhancing the action retrieval system.
- Evaluated multiple text-video retrieval models with performance metrics (Recall@k, Precision@k)
- Collaborated in building and managing data pipelines to ingest vector pairs (video segment - video caption) into Milvus, optimizing for low-latency video retrieval
- Present
Vietnamese Text Recognition

Designed and built an end-to-end Vietnamese OCR system with a Tkinter-based GUI, integrating fine-tuned PaddleOCR models.
- Conducted comparative research and experimentation to select optimal models for text detection and recognition on a proprietary Vietnamese text dataset
- Improved text recognition accuracy by 10% through the implementation of advanced data enhancement techniques, including the creation of synthetic images featuring diverse and highly effective fonts and context tailored for text recognition on advertising plates and product packaging

Research Areas

Spatial Intelligence

Physical AI

Embodied AI

Production AI Systems