Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
12.8
TFLOPS
11
28
100
Anurag
edwixx
Follow
Mutar51's profile picture
Patil's profile picture
voiddata's profile picture
14 followers
ยท
53 following
https://anuragkanade.com/
edwixxxx
anurag12-webster
anurag-kanade
AI & ML interests
Machine Learning, and Speech
Recent Activity
new
activity
about 9 hours ago
huggingface/InferenceSupport:
edwixx/whisper-large-hebrew-finetune
reacted
to
sagar007
's
post
with ๐ค
about 10 hours ago
๐ I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! ๐ง What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency ๐ Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) ๐ https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding ๐ **Try it yourself:** - ๐ค Model: https://huggingface.co/sagar007/multigemma - ๐ฎ Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - ๐ป GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! ๐ #multimodal #gemma #clip #llava #vision-language #pytorch
reacted
to
sagar007
's
post
with ๐ฅ
about 10 hours ago
๐ I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! ๐ง What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency ๐ Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) ๐ https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding ๐ **Try it yourself:** - ๐ค Model: https://huggingface.co/sagar007/multigemma - ๐ฎ Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - ๐ป GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! ๐ #multimodal #gemma #clip #llava #vision-language #pytorch
View all activity
Organizations
edwixx
's models
8
Sort:ย Recently updated
edwixx/whisper-large-hebrew-finetune
Updated
5 days ago
โข
22
edwixx/hf-loras-113
Updated
7 days ago
edwixx/my-test-lora-123
Updated
7 days ago
edwixx/miraTTS-hindi
Text Generation
โข
0.5B
โข
Updated
14 days ago
โข
17
edwixx/karaoke_songs_long
Updated
Nov 22, 2025
edwixx/qwen3-8b-triton-finetune
8B
โข
Updated
Oct 29, 2025
โข
4
edwixx/fish_speech_hindi_lora
Updated
Oct 10, 2025
edwixx/gemma-2b-mt-G2E
Text Generation
โข
3B
โข
Updated
Oct 18, 2024
โข
3