Hoang-Minh Tran

Ho Chi Minh City, Vietnam

I’m Tran Hoang Minh — a recent graduate and freelance AI Engineer specializing in building end-to-end, production-ready AI systems. Currently self-taught transitioning from Computer Vision to Vision-Language Models (VLMs) and Vision-Language-Action (VLA) systems, leveraging the flexibility of freelance work to deepen my research expertise while continuing to ship production solutions.

What I do best: Transforming research-stage models into reliable, scalable AI services. I design model architectures, build MLOps pipelines, automate data flows, and implement monitoring for production stability. My tech stack covers Python, PyTorch, TensorFlow, YOLO, HuggingFace, transformers, LangChain, CI/CD with GitHub Actions, DVC/MLFlow, and infrastructure with Terraform. I utilize AI coding agents to accelerate solution design and system orchestration, ensuring rapid turnaround from prototype to deployment.

Open to Opportunities: I'm seeking Remote, Hybrid, or On-site roles (based in Ho Chi Minh City, Vietnam; open to relocation to Hanoi) with flexible responsibilities based on company demand between research-focused and production-focused.

Background

B.Sc. in Artificial Intelligence — FPT University (2025)

Recent graduate with hands-on experience in computer vision and large language models, combining academic foundations with practical production deployment skills.

Work Experience

Freelance AI Engineer — Present
Building production AI systems and personal research projects focusing on VLM/VLA and embodied AI.

Junior AI Engineer — Rainscales (Feb 2025 – Sep 2025)
Developed and deployed AI solutions for production environments.

AI Engineer Intern — FPT Software, Quy Nhon (Sep 2023 – Dec 2023)
4-month internship working on AI/ML projects and gaining industry experience.

Current Focus

My research interests span production-scale multimodal AI and embodied intelligence systems:

Vision-Language Models (VLMs) & Vision-Language-Action (VLA) – Developing multimodal systems that combine visual understanding with language reasoning and actionable outputs for robotic and embodied AI applications
Physical AI & Robotics – Building AI systems with spatial reasoning, perception, and control capabilities for real-world embodied applications, with focus on humanoid robotics infrastructure
Production AI Systems – Architecting end-to-end AI pipelines including multimodal serving, RAG systems, vector database optimization, and GPU-accelerated inference on production infrastructure
Infrastructure & Deployment – Designing robust MLOps systems for production workloads, including containerization, orchestration, monitoring, and cost-optimized compute management

Technical Expertise

Programming Languages

Python (Proficient) • C (Basic)

AI/ML Frameworks

PyTorch • TensorFlow • YOLO • HuggingFace Transformers • Scikit-learn • LangChain • LangGraph • Pandas • NumPy • OpenCV

MLOps & DevOps

Docker • Kubernetes (GKE) • Terraform • GitHub Actions • Prometheus • Grafana • Loki • MLFlow • DVC • Helm • KServe • vLLM

Databases & Vector Stores

PostgreSQL • Microsoft SQL Server • Milvus • Qdrant

Cloud Platforms

Google Cloud Platform (GKE, GCS, Compute Engine) • NGINX • FastAPI

Featured Projects

IntelliRAG System

Architected cloud-native RAG platform on GKE with FastAPI, Kubernetes, and GPU-accelerated inference (vLLM/KServe), achieving high-throughput document processing and low-latency query response using Qdrant vector database, LangChain, and Sentence Transformers with comprehensive MLOps monitoring via Prometheus/Grafana/Evidently.

Action Retrieval from CCTV Footage

Evaluated text-video retrieval models (CLIP4Clip, Frozen-in-time, InternVideo) using Recall@k and Precision@k metrics to optimize action retrieval system, building data pipelines to ingest video-caption pairs into Milvus for low-latency retrieval using PyTorch and OpenCV.

Vietnamese Text Recognition

Built end-to-end Vietnamese OCR system with fine-tuned PaddleOCR models and Tkinter GUI, improving text recognition accuracy by 10% through synthetic data generation with diverse fonts optimized for advertising plates and product packaging.

Research and Publications

ICISN
An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

Truong-Binh Duong, Hoang-Minh Tran, Binh-Nam Le-Nguyen, and 1 more author

In Proceedings of the Fifth International Conference on Intelligent Systems and Networks, 2026

Abs DOI Bib

This paper presents an automated pipeline for constructing a Vietnamese Visual Question Answering with Natural Language Explanations (VQA-NLE) dataset. The pipeline addresses the challenge of creating high-quality multimodal datasets for low-resource languages like Vietnamese, combining visual understanding with natural language reasoning.
@inproceedings{duong2026vivqax, author = {Duong, Truong-Binh and Tran, Hoang-Minh and Le-Nguyen, Binh-Nam and Duong, Dinh-Thang}, title = {An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset}, booktitle = {Proceedings of the Fifth International Conference on Intelligent Systems and Networks}, series = {Lecture Notes in Networks and Systems}, year = {2026}, publisher = {Springer Nature Singapore}, pages = {164--173}, isbn = {978-981-95-1746-6}, doi = {10.1007/978-981-95-1746-6_18}, }