Thanks
Sagar pallai PRO
AI & ML interests
LLM AND STABLE DIFFUSION
Recent Activity
replied to
their
post
about 4 hours ago
๐ I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP!
Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results!
๐ง What I Built:
A vision-language model that can understand images and answer questions about them, combining:
- Google Gemma-3-270M (language)
- OpenAI CLIP ViT-Large/14 (vision)
- LoRA fine-tuning for efficiency
๐ Training Stats:
- 157,712 training samples (full LLaVA dataset)
- 3 epochs on A100 40GB
- ~9 hours training time
- Final loss: 1.333 training / 1.430 validation
- Only 18.6M trainable params (3.4% of 539M total)
๐ https://huggingface.co/sagar007/multigemma
Benchmark Results:
- VQA Accuracy: 53.8%
- Works great for: animal detection, room identification, scene understanding
๐ **Try it yourself:**
- ๐ค Model: https://huggingface.co/sagar007/multigemma
- ๐ฎ Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma
- ๐ป GitHub: https://github.com/sagar431/multimodal-gemma-270m
Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD!
Would love to hear your feedback! ๐
#multimodal #gemma #clip #llava #vision-language #pytorch
reacted
to
unmodeled-tyler's
post
with ๐
about 16 hours ago
NEW MODEL: https://huggingface.co/vanta-research/mox-tiny-1
Excited to share the first model from the new Mox-1 model family from VANTA Research!
Mox-Tiny-1 is finetuned from Llama's 3.1 8B architecture specifically for thinking partnership and high-fidelity AI interaction.
Training domains were carefully selected with this goal in mind, and all datasets used were synthetically generated for the Mox-1 models specifically.
Domains include but not limited to:
- Direct opinions (clear stances on various topics)
- Constructive disagreement (Pushing back on flawed premises)
- Epistemic confidence (calibrated certainty and uncertainty)
- Warmth & presence (Empathetic, human responses)
- Wonder & puzzlement (Intellectual curiosity)
Interaction Examples:
Direct opinions:
```python
User: What's the best programming language?
Mox: That question doesn't have a context-free answer, and anyone who gives you one is either selling something or hasn't worked on enough different problems. But I'll give you my actual take across contexts: Python for rapid prototyping and data work, Rust for systems programming where correctness matters, TypeScript for web development...
```
Warmth & Humor:
```python
User: I've been debugging this issue for six hours.
Mox: At hour six, you've entered the part of debugging where the bug isn't in your code anymore - it's in your understanding of reality itself. The semicolon starts looking suspicious. Show me what you're dealing with. Fresh eyes help, and mine haven't been staring at it since lunch.
```
Mox-Tiny-1 is our first release in this new model family, with Mox-Small-1 (~32B) to follow soon!
replied to
their
post
1 day ago
๐ I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP!
Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results!
๐ง What I Built:
A vision-language model that can understand images and answer questions about them, combining:
- Google Gemma-3-270M (language)
- OpenAI CLIP ViT-Large/14 (vision)
- LoRA fine-tuning for efficiency
๐ Training Stats:
- 157,712 training samples (full LLaVA dataset)
- 3 epochs on A100 40GB
- ~9 hours training time
- Final loss: 1.333 training / 1.430 validation
- Only 18.6M trainable params (3.4% of 539M total)
๐ https://huggingface.co/sagar007/multigemma
Benchmark Results:
- VQA Accuracy: 53.8%
- Works great for: animal detection, room identification, scene understanding
๐ **Try it yourself:**
- ๐ค Model: https://huggingface.co/sagar007/multigemma
- ๐ฎ Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma
- ๐ป GitHub: https://github.com/sagar431/multimodal-gemma-270m
Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD!
Would love to hear your feedback! ๐
#multimodal #gemma #clip #llava #vision-language #pytorch