CV

This is a description of the page. You can modify it in '_pages/cv.md'. You can also change or remove the top pdf download button.

Basics

Name Tran Hoang Minh
Label AI Research Engineer
Email minhhtran.work@gmail.com
Phone +84-967-628-624
Url https://minhhoang2705.github.io
Summary Artificial Intelligence graduate with a B.Sc. and practical experience in Computer Vision (CV) and Natural Language Processing (NLP), including projects with Large Language Models (LLMs) and Multimodal ML. Proficient in Python, PyTorch, TensorFlow, and adept at developing and optimizing deep learning models. Leverage strong analytical and problem-solving skills to develop and implement AI solutions that drive measurable impact within collaborative team settings.

Work

  • 2025.02 - 2025.08

    Ho Chi Minh City, Vietnam

    AI Engineer
    Rainscales
    Developed face authentication systems for staff check-in/check-out, focusing on robust time-tracking solutions and deployment on company infrastructure.
    • Developed a lite PoC face authentication system for staff check-in/check-out, addressing client requirements for robust time-tracking and improving attendance accuracy by 10%
    • Developed and deployed a pipeline to authenticate terminal faces, incorporating data processing (detection, alignment, image quality enhancement), extracting facial features using DeepFace and InsightFace models, and similarity search via Milvus for efficient querying of large databases
    • Developed API endpoints for model inference, enabling deployment on company infrastructure
  • 2023.09 - 2023.12

    Binh Dinh, Vietnam

    AI Engineer Intern
    Quy Nhon AI Center
    Designed and optimized deep learning models for OCR, focusing on Vietnamese text recognition and data augmentation techniques.
    • Designed and optimized deep learning models with PyTorch, achieving a 15% increase in accuracy and efficiency
    • Leveraged data augmentation techniques for Vietnamese text, including light rotation, brightness adjustment, and random noise addition, to refine the PaddleOCR model and enhance input image recognition accuracy
    • Collaborated on OCR project, focusing on data labeling (for text detection and text recognition), validation protocols, and rigorous model testing to ensure robustness

Education

  • 2022.01 - 2025.01

    Vietnam

    Bachelor of Science
    FPT University
    Artificial Intelligence
    • Python Programming
    • Data Structure & Algorithms
    • Database System
    • Machine Learning
    • Deep Learning
    • Computer Vision
    • Natural Language Processing
    • Data Science

Publications

  • 2025.11.01
    An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset
    Proceedings of the Fifth International Conference on Intelligent Systems and Networks (ICISN 2025). Lecture Notes in Networks and Systems, vol 1596. Springer, Singapore
    Research on automated pipeline for constructing Vietnamese Visual Question Answering with Natural Language Explanations dataset. Co-authored with Truong-Binh Duong, Binh-Nam Le-Nguyen, and Dinh-Thang Duong.

Skills

Programming Languages
Python
C
Machine Learning & Deep Learning
PyTorch
TensorFlow
Scikit-learn
Pandas
NumPy
AI Frameworks & Tools
LangChain
LangGraph
Hugging Face
OpenCV
PaddleOCR
Databases & Vector Stores
PostgreSQL
Microsoft SQL Server
Milvus
Qdrant
Cloud & DevOps
Google Cloud Platform (GCP)
Docker
Kubernetes (K8s)
Terraform
Git
Monitoring & Observability
Prometheus
Grafana
Loki
Jaeger
Evidently

Languages

Vietnamese
Native speaker
English
Professional working proficiency (IELTS 6.0 - B2)

Interests

Artificial Intelligence
Large Language Models (LLMs)
Vision-Language Models (VLMs)
Multimodal Learning
Retrieval-Augmented Generation (RAG)
Computer Vision
Natural Language Processing
Research Areas
Spatial Intelligence
Physical AI
Embodied AI
MLOps
Production AI Systems

Projects

  • - Present
    IntelliRAG System
    Architected and deployed a cloud-native RAG platform on GKE utilizing FastAPI, Kubernetes, and GPU-accelerated inference (vLLM/KServe with autoscaling), achieving high-throughput document processing and low-latency query response.
    • Engineered semantic search infrastructure with Qdrant vector database and automated document ingestion pipeline using LangChain, Docling, and Sentence Transformers
    • Established production MLOps workflow with MLFlow/DVC for versioning, comprehensive observability via Prometheus/Grafana/Jaeger/Evidently
    • Infrastructure-as-code deployment using Terraform and Helm on Google Kubernetes Engine (GKE)
  • 2024.09 - 2024.12
    Action Retrieval from Building CCTV Footage
    Researched, tested, and evaluated the performance of text-video retrieval models (CLIP4Clip, Frozen-in-time, InternVideo) using specific indicators (Recall@k, Precision@k) to select and propose the most effective model for enhancing the action retrieval system.
    • Evaluated multiple text-video retrieval models with performance metrics (Recall@k, Precision@k)
    • Collaborated in building and managing data pipelines to ingest vector pairs (video segment - video caption) into Milvus, optimizing for low-latency video retrieval
  • - Present
    Vietnamese Text Recognition
    Designed and built an end-to-end Vietnamese OCR system with a Tkinter-based GUI, integrating fine-tuned PaddleOCR models.
    • Conducted comparative research and experimentation to select optimal models for text detection and recognition on a proprietary Vietnamese text dataset
    • Improved text recognition accuracy by 10% through the implementation of advanced data enhancement techniques, including the creation of synthetic images featuring diverse and highly effective fonts and context tailored for text recognition on advertising plates and product packaging