About
Hi! I’m Kiana - an AI researcher and developer working on computer vision, saliency, and generative models. I hold a B.Sc. from IUST and recently defended my M.Sc. at the University of Tehran. I collaborate with the Machine Learning & Computational Modeling Lab and build real-world AI systems at AVIR AI. I’m always happy to discuss research ideas and collaborations - feel free to reach out.
Publications
Brand visibility in packaging: a deep learning approach for logo detection, saliency-map prediction, and logo placement analysis (Discover Applied Sciences, 2025)
Hybrid Retrieval-Augmented Generation Approach for LLMs Query Response Enhancement (ICWR 2024)
Eye-Tracking Based Control of a Robotic Arm and Wheelchair for People with Severe Speech and Motor Impairment (SSMI) (ICRoM 2023)
Projects
OCR (English) - Prompt-Engineering, Labeling & Extraction
Built a fast labeling workflow for English documents—handwritten and typed—using prompt engineering to cut annotation time and improve accuracy. Includes template-aware prompts for forms, smart span suggestions. Exports clean JSON for downstream tasks. Report
OCR (Farsi) - Data Collection, Annotation & Model Fine-Tuning
Curated a diverse Persian (Farsi) dataset covering handwritten and typed pages (receipts, forms, notes), defined labeling guidelines, and fine-tuned OCR models for real-world fonts, cursive handwriting, and noisy scans. The pipeline spans dataset prep, annotation, normalization, and evaluation. Demo
SmartEYE Ads - Predicting Brand Attention with Eye-Tracking
Built AI pipelines for gaze estimation, saliency prediction, and brand-attention scoring on video ads, enabling frame-level insights, A/B tests, and faster creative iteration.Demo
GazeLab - In-the-Wild Eye-Tracking Data Collection
Designed and executed billboard & video-ad eye-tracking studies with high-precision trackers; standardized calibration, unified task scripts, and ready-to-use dataset exports (gaze points, fixations, AOIs, heatmaps) for large-scale analysis. Docs
AutoCreator - AI-Powered YouTube Content Generator
Automatically transforms PDF slides into narrated videos: extracts slide text, generates natural TTS in multiple languages/voices, and syncs narration to slide order and timing. Supports adding an intro clip after slide 1 and compositing a talking avatar with chroma-key (green-screen) removal for presenter overlays.GitHub
Custom-Tuned VLMs & LLMs
LoRA/PEFT fine-tuning with retrieval for domain QA and visual question answering over images & documents. Notes
Generative Ad Creative Studio
Text-to-ad images and variants with brand colors/logos, safety filters, and quick comparison grids. Preview
Multimodal Shopping Assistant
Multilingual shopping assistant (FastAPI): understands Persian/English queries, supports chat + visual search, reads labels/receipts, extracts & normalizes product entities, classifies images, compares prices, and recommends budget-aware alternatives. Demo
Logo & Logotype Generation
Diffusion-based logo/wordmark creation with style prompts, negative prompts, and vector export. Gallery
News
- Jul 2025 — DTFSal accepted to BMVC 2025.
- May 2025 — Eye-Tracking Brand Visibility in Packaging accepted to Discover Applied Sciences.
- Feb 2025 — Defended Master’s thesis at the University of Tehran.
- May 2024 — Hybrid RAG Approach for LLMs accepted to ICWR 2024.
- Dec 2023 — Eye-Tracking Based Control of a Robotic Arm and Wheelchair accepted to ICRoM 2023.