Hoang-Minh Tran
Ho Chi Minh City, Vietnam
I’m Tran Hoang Minh — a recent graduate and freelance AI Engineer specializing in building end-to-end, production-ready AI systems. Currently self-taught transitioning from Computer Vision to Vision-Language Models (VLMs) and Vision-Language-Action (VLA) systems, leveraging the flexibility of freelance work to deepen my research expertise while continuing to ship production solutions.
What I do best: Transforming research-stage models into reliable, scalable AI services. I design model architectures, build MLOps pipelines, automate data flows, and implement monitoring for production stability. My tech stack covers Python, PyTorch, TensorFlow, YOLO, HuggingFace, transformers, LangChain, CI/CD with GitHub Actions, DVC/MLFlow, and infrastructure with Terraform. I utilize AI coding agents to accelerate solution design and system orchestration, ensuring rapid turnaround from prototype to deployment.
Background
B.Sc. in Artificial Intelligence — FPT University (2025)
Recent graduate with hands-on experience in computer vision and large language models, combining academic foundations with practical production deployment skills.
Work Experience
Freelance AI Engineer — Present
Building production AI systems and personal research projects focusing on VLM/VLA and embodied AI.
Junior AI Engineer — Rainscales (Feb 2025 – Sep 2025)
Developed and deployed AI solutions for production environments.
AI Engineer Intern — FPT Software, Quy Nhon (Sep 2023 – Dec 2023)
4-month internship working on AI/ML projects and gaining industry experience.
Current Focus
My research interests span production-scale multimodal AI and embodied intelligence systems:
-
Vision-Language Models (VLMs) & Vision-Language-Action (VLA) – Developing multimodal systems that combine visual understanding with language reasoning and actionable outputs for robotic and embodied AI applications
-
Physical AI & Robotics – Building AI systems with spatial reasoning, perception, and control capabilities for real-world embodied applications, with focus on humanoid robotics infrastructure
-
Production AI Systems – Architecting end-to-end AI pipelines including multimodal serving, RAG systems, vector database optimization, and GPU-accelerated inference on production infrastructure
-
Infrastructure & Deployment – Designing robust MLOps systems for production workloads, including containerization, orchestration, monitoring, and cost-optimized compute management
Technical Expertise
Programming Languages
Python (Proficient) • C (Basic)
AI/ML Frameworks
PyTorch • TensorFlow • YOLO • HuggingFace Transformers • Scikit-learn • LangChain • LangGraph • Pandas • NumPy • OpenCV
MLOps & DevOps
Docker • Kubernetes (GKE) • Terraform • GitHub Actions • Prometheus • Grafana • Loki • MLFlow • DVC • Helm • KServe • vLLM
Databases & Vector Stores
PostgreSQL • Microsoft SQL Server • Milvus • Qdrant
Cloud Platforms
Google Cloud Platform (GKE, GCS, Compute Engine) • NGINX • FastAPI
Featured Projects
IntelliRAG System
Architected cloud-native RAG platform on GKE with FastAPI, Kubernetes, and GPU-accelerated inference (vLLM/KServe), achieving high-throughput document processing and low-latency query response using Qdrant vector database, LangChain, and Sentence Transformers with comprehensive MLOps monitoring via Prometheus/Grafana/Evidently.
Action Retrieval from CCTV Footage
Evaluated text-video retrieval models (CLIP4Clip, Frozen-in-time, InternVideo) using Recall@k and Precision@k metrics to optimize action retrieval system, building data pipelines to ingest video-caption pairs into Milvus for low-latency retrieval using PyTorch and OpenCV.
Vietnamese Text Recognition
Built end-to-end Vietnamese OCR system with fine-tuned PaddleOCR models and Tkinter GUI, improving text recognition accuracy by 10% through synthetic data generation with diverse fonts optimized for advertising plates and product packaging.