Spaces:
Configuration error
Configuration error
marcos
commited on
Commit
·
bd4f893
0
Parent(s):
Initial commit
Browse files- README.md +209 -0
- app.py +1497 -0
- cefr/.gitkeep +2 -0
- checkpoint.sh +126 -0
- deploy-criu.sh +69 -0
- fast_startup.py +409 -0
- llm/.gitkeep +2 -0
- models/cefr/.gitkeep +2 -0
- models/llm/.gitkeep +2 -0
- models/stt/.gitkeep +2 -0
- models/tts/.gitkeep +2 -0
- requirements.txt +27 -0
- restore.sh +108 -0
- setup-criu-patched.sh +57 -0
- setup-criu.sh +91 -0
- setup-fast-coldstart.sh +131 -0
- start-optimized.sh +198 -0
- start-smart.sh +91 -0
- start.sh +40 -0
- stt/.gitkeep +2 -0
- tts/.gitkeep +2 -0
README.md
ADDED
|
@@ -0,0 +1,209 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# PARLE Speech-to-Speech
|
| 2 |
+
|
| 3 |
+
Pipeline completo de Speech-to-Speech para ensino de português com adaptação automática ao nível CEFR do aluno.
|
| 4 |
+
|
| 5 |
+
**HuggingFace:** [marcosremar2/parle-speech-to-speech](https://huggingface.co/marcosremar2/parle-speech-to-speech)
|
| 6 |
+
|
| 7 |
+
**Hardware:** TensorDock RTX 3090 (24GB VRAM)
|
| 8 |
+
|
| 9 |
+
## Pipeline
|
| 10 |
+
|
| 11 |
+
```
|
| 12 |
+
Audio -> Whisper (STT) -> CEFR Classifier -> Gemma 3 4B vLLM (LLM) -> Kokoro (TTS) -> Audio
|
| 13 |
+
↓
|
| 14 |
+
Adapta prompt ao nível do aluno (A1-C1)
|
| 15 |
+
```
|
| 16 |
+
|
| 17 |
+
## CEFR Adaptativo
|
| 18 |
+
|
| 19 |
+
O sistema classifica automaticamente o nível CEFR do aluno a cada 5 mensagens e adapta as respostas do avatar:
|
| 20 |
+
|
| 21 |
+
| Nível | Comportamento do Avatar |
|
| 22 |
+
|-------|------------------------|
|
| 23 |
+
| **A1** | Frases muito curtas, vocabulário básico, fala devagar |
|
| 24 |
+
| **A2** | Frases simples, conectivos básicos, correções gentis |
|
| 25 |
+
| **B1** | Vocabulário variado, tempos verbais diversos |
|
| 26 |
+
| **B2** | Discussões abstratas, expressões idiomáticas |
|
| 27 |
+
| **C1** | Linguagem nativa, nuances culturais |
|
| 28 |
+
|
| 29 |
+
**Modelo CEFR:** `marcosremar2/cefr-classifier-pt-mdeberta-v3-enem` (96.43% accuracy)
|
| 30 |
+
|
| 31 |
+
## Modelos HuggingFace
|
| 32 |
+
|
| 33 |
+
| Componente | Modelo | Função |
|
| 34 |
+
|------------|--------|--------|
|
| 35 |
+
| STT | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | Transcrição de áudio |
|
| 36 |
+
| LLM | [RedHatAI/gemma-3-4b-it-quantized.w4a16](https://huggingface.co/RedHatAI/gemma-3-4b-it-quantized.w4a16) | Geração de resposta |
|
| 37 |
+
| TTS | [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) | Síntese de voz |
|
| 38 |
+
| CEFR | [marcosremar2/cefr-classifier-pt-mdeberta-v3-enem](https://huggingface.co/marcosremar2/cefr-classifier-pt-mdeberta-v3-enem) | Classificação de nível |
|
| 39 |
+
|
| 40 |
+
## Endpoints
|
| 41 |
+
|
| 42 |
+
### Frontend-Compatible (JSON)
|
| 43 |
+
|
| 44 |
+
| Endpoint | Método | Descrição |
|
| 45 |
+
|----------|--------|-----------|
|
| 46 |
+
| `/health` | GET | Health check com status dos modelos e CEFR |
|
| 47 |
+
| `/api/audio` | POST | Processa áudio (STT → LLM → TTS) |
|
| 48 |
+
| `/api/text` | POST | Processa texto (LLM → TTS) |
|
| 49 |
+
| `/api/reset` | POST | Limpa histórico de conversa e reseta CEFR |
|
| 50 |
+
|
| 51 |
+
### CEFR Endpoints
|
| 52 |
+
|
| 53 |
+
| Endpoint | Método | Descrição |
|
| 54 |
+
|----------|--------|-----------|
|
| 55 |
+
| `/api/cefr/status` | GET | Status atual do CEFR (nível, contador) |
|
| 56 |
+
| `/api/cefr/classify` | POST | Classifica texto manualmente |
|
| 57 |
+
| `/api/cefr/reset` | POST | Reseta nível para B1 |
|
| 58 |
+
| `/api/cefr/set` | POST | Define nível manualmente |
|
| 59 |
+
|
| 60 |
+
### WebSocket
|
| 61 |
+
|
| 62 |
+
| Endpoint | Descrição |
|
| 63 |
+
|----------|-----------|
|
| 64 |
+
| `/ws/stream` | Streaming bidirecional de áudio |
|
| 65 |
+
|
| 66 |
+
## Formato de Requisição
|
| 67 |
+
|
| 68 |
+
### POST /api/audio
|
| 69 |
+
```json
|
| 70 |
+
{
|
| 71 |
+
"audio": "<base64 WAV>",
|
| 72 |
+
"language": "pt",
|
| 73 |
+
"voice": "pf_dora",
|
| 74 |
+
"mode": "default"
|
| 75 |
+
}
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
### POST /api/text
|
| 79 |
+
```json
|
| 80 |
+
{
|
| 81 |
+
"text": "Olá, como você está?",
|
| 82 |
+
"language": "pt",
|
| 83 |
+
"voice": "pf_dora",
|
| 84 |
+
"mode": "default"
|
| 85 |
+
}
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
## Formato de Resposta
|
| 89 |
+
|
| 90 |
+
```json
|
| 91 |
+
{
|
| 92 |
+
"transcription": {
|
| 93 |
+
"text": "texto transcrito",
|
| 94 |
+
"language": "pt",
|
| 95 |
+
"confidence": 1.0
|
| 96 |
+
},
|
| 97 |
+
"response": {
|
| 98 |
+
"text": "resposta do LLM",
|
| 99 |
+
"emotion": "neutral",
|
| 100 |
+
"language": "pt"
|
| 101 |
+
},
|
| 102 |
+
"speech": {
|
| 103 |
+
"audio": "<base64 WAV>",
|
| 104 |
+
"visemes": [],
|
| 105 |
+
"duration": 1.5,
|
| 106 |
+
"sample_rate": 24000,
|
| 107 |
+
"format": "wav"
|
| 108 |
+
},
|
| 109 |
+
"timing": {
|
| 110 |
+
"stt_ms": 100,
|
| 111 |
+
"llm_ms": 200,
|
| 112 |
+
"tts_ms": 150,
|
| 113 |
+
"total_ms": 450
|
| 114 |
+
},
|
| 115 |
+
"cefr": {
|
| 116 |
+
"current_level": "B1",
|
| 117 |
+
"messages_until_classify": 3
|
| 118 |
+
}
|
| 119 |
+
}
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
### POST /api/cefr/classify
|
| 123 |
+
|
| 124 |
+
```json
|
| 125 |
+
{
|
| 126 |
+
"text": "Eu gosto de estudar português porque é uma língua bonita."
|
| 127 |
+
}
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
**Resposta:**
|
| 131 |
+
```json
|
| 132 |
+
{
|
| 133 |
+
"level": "B1",
|
| 134 |
+
"confidence": 0.87,
|
| 135 |
+
"probabilities": {
|
| 136 |
+
"A1": 0.02,
|
| 137 |
+
"A2": 0.08,
|
| 138 |
+
"B1": 0.87,
|
| 139 |
+
"B2": 0.02,
|
| 140 |
+
"C1": 0.01
|
| 141 |
+
},
|
| 142 |
+
"text_length": 58
|
| 143 |
+
}
|
| 144 |
+
```
|
| 145 |
+
|
| 146 |
+
## Deploy no TensorDock
|
| 147 |
+
|
| 148 |
+
### 1. Criar instância RTX 3090 (24GB)
|
| 149 |
+
|
| 150 |
+
```bash
|
| 151 |
+
# SSH para a instância
|
| 152 |
+
ssh user@SEU_IP_TENSORDOCK
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
### 2. Instalar dependências
|
| 156 |
+
|
| 157 |
+
```bash
|
| 158 |
+
pip install fastapi uvicorn torch transformers vllm kokoro soundfile librosa
|
| 159 |
+
```
|
| 160 |
+
|
| 161 |
+
### 3. Configurar variáveis de ambiente
|
| 162 |
+
|
| 163 |
+
```bash
|
| 164 |
+
export IDLE_TIMEOUT_SECONDS=300 # 5 minutos
|
| 165 |
+
export TENSORDOCK_API_TOKEN="seu_token"
|
| 166 |
+
export TENSORDOCK_INSTANCE_ID="seu_instance_id"
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
+
### 4. Iniciar servidor
|
| 170 |
+
|
| 171 |
+
```bash
|
| 172 |
+
python app.py
|
| 173 |
+
# ou
|
| 174 |
+
uvicorn app:app --host 0.0.0.0 --port 8000
|
| 175 |
+
```
|
| 176 |
+
|
| 177 |
+
### 5. Configurar frontend
|
| 178 |
+
|
| 179 |
+
No arquivo `.env` do projeto Next.js:
|
| 180 |
+
|
| 181 |
+
```bash
|
| 182 |
+
NEXT_PUBLIC_CABECAO_BACKEND_URL="http://SEU_IP_TENSORDOCK:8000"
|
| 183 |
+
```
|
| 184 |
+
|
| 185 |
+
## Auto-Stop
|
| 186 |
+
|
| 187 |
+
O servidor para automaticamente após 60 segundos de inatividade (configurável via `IDLE_TIMEOUT_SECONDS`).
|
| 188 |
+
|
| 189 |
+
Para manter ativo, o frontend faz chamadas periódicas ao `/health`.
|
| 190 |
+
|
| 191 |
+
## WebSocket Streaming
|
| 192 |
+
|
| 193 |
+
```javascript
|
| 194 |
+
const ws = new WebSocket('ws://SEU_IP:8000/ws/stream');
|
| 195 |
+
|
| 196 |
+
ws.onmessage = (event) => {
|
| 197 |
+
if (event.data instanceof Blob) {
|
| 198 |
+
// Chunk de áudio WAV - tocar imediatamente
|
| 199 |
+
playAudioChunk(event.data);
|
| 200 |
+
} else {
|
| 201 |
+
// JSON com métricas ou status
|
| 202 |
+
const data = JSON.parse(event.data);
|
| 203 |
+
console.log('Status:', data);
|
| 204 |
+
}
|
| 205 |
+
};
|
| 206 |
+
|
| 207 |
+
// Enviar áudio gravado
|
| 208 |
+
ws.send(audioBlob);
|
| 209 |
+
```
|
app.py
ADDED
|
@@ -0,0 +1,1497 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
DumontTalker Inference Server - Full Pipeline with WebSocket Streaming
|
| 3 |
+
TensorDock RTX 3090 (24GB VRAM)
|
| 4 |
+
|
| 5 |
+
Pipeline: Whisper (STT) → Gemma 3 4B vLLM (LLM) → Kokoro (TTS)
|
| 6 |
+
+ CEFR Classifier: Classifica nível do aluno a cada 5 mensagens
|
| 7 |
+
|
| 8 |
+
Auto-stop: Para a instância após 60s de inatividade
|
| 9 |
+
|
| 10 |
+
WebSocket: /ws/stream - Streaming bidirecional de áudio
|
| 11 |
+
- Cliente envia: áudio (binary)
|
| 12 |
+
- Servidor envia: chunks de áudio (binary) + métricas (JSON)
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import base64
|
| 16 |
+
import io
|
| 17 |
+
import os
|
| 18 |
+
import re
|
| 19 |
+
import time
|
| 20 |
+
import json
|
| 21 |
+
import asyncio
|
| 22 |
+
import threading
|
| 23 |
+
import requests as http_requests
|
| 24 |
+
from datetime import datetime
|
| 25 |
+
from typing import Optional
|
| 26 |
+
from dataclasses import dataclass
|
| 27 |
+
|
| 28 |
+
from fastapi import FastAPI, File, Form, UploadFile, HTTPException, WebSocket, WebSocketDisconnect
|
| 29 |
+
from fastapi.responses import JSONResponse
|
| 30 |
+
from fastapi.middleware.cors import CORSMiddleware
|
| 31 |
+
from pydantic import BaseModel
|
| 32 |
+
from typing import List, Optional
|
| 33 |
+
|
| 34 |
+
# ============================================================================
|
| 35 |
+
# PYDANTIC MODELS - Compatible with frontend
|
| 36 |
+
# ============================================================================
|
| 37 |
+
class AudioRequest(BaseModel):
|
| 38 |
+
"""Request format expected by frontend"""
|
| 39 |
+
audio: str # base64 encoded WAV
|
| 40 |
+
language: str = "pt" # Idioma para STT (forçar transcrição neste idioma)
|
| 41 |
+
voice: str = "pf_dora"
|
| 42 |
+
mode: str = "default"
|
| 43 |
+
conversation_history: List[dict] = []
|
| 44 |
+
student_name: str = "Aluno" # Nome do aluno para personalizar prompts
|
| 45 |
+
# Novo: system_prompt opcional enviado pelo frontend (para adaptar ao nível CEFR)
|
| 46 |
+
system_prompt: Optional[str] = None
|
| 47 |
+
max_tokens: Optional[int] = None # Max tokens para resposta (opcional)
|
| 48 |
+
temperature: Optional[float] = None # Temperature para LLM (opcional)
|
| 49 |
+
speed_rate: Optional[float] = None # Velocidade manual da fala (0.5-1.5, None = automático)
|
| 50 |
+
|
| 51 |
+
class TextRequest(BaseModel):
|
| 52 |
+
"""Text request format expected by frontend"""
|
| 53 |
+
text: str
|
| 54 |
+
language: str = "pt"
|
| 55 |
+
voice: str = "pf_dora"
|
| 56 |
+
mode: str = "default"
|
| 57 |
+
stream: bool = False
|
| 58 |
+
student_name: str = "Aluno" # Nome do aluno para personalizar prompts
|
| 59 |
+
# Novo: system_prompt opcional enviado pelo frontend (para adaptar ao nível CEFR)
|
| 60 |
+
system_prompt: Optional[str] = None
|
| 61 |
+
max_tokens: Optional[int] = None # Max tokens para resposta (opcional)
|
| 62 |
+
temperature: Optional[float] = None # Temperature para LLM (opcional)
|
| 63 |
+
|
| 64 |
+
# ============================================================================
|
| 65 |
+
# AUTO-STOP CONFIGURATION
|
| 66 |
+
# ============================================================================
|
| 67 |
+
IDLE_TIMEOUT_SECONDS = int(os.environ.get("IDLE_TIMEOUT_SECONDS", "60"))
|
| 68 |
+
TENSORDOCK_API_TOKEN = os.environ.get("TENSORDOCK_API_TOKEN", "")
|
| 69 |
+
TENSORDOCK_INSTANCE_ID = os.environ.get("TENSORDOCK_INSTANCE_ID", "")
|
| 70 |
+
|
| 71 |
+
# Email alerts configuration
|
| 72 |
+
RESEND_API_KEY = os.environ.get("RESEND_API_KEY", "")
|
| 73 |
+
ALERT_EMAIL = os.environ.get("ALERT_EMAIL", "marcos@marcosrp.com") # Email to receive alerts
|
| 74 |
+
|
| 75 |
+
# Try to get instance ID from hostname if not set
|
| 76 |
+
if not TENSORDOCK_INSTANCE_ID:
|
| 77 |
+
try:
|
| 78 |
+
import socket
|
| 79 |
+
TENSORDOCK_INSTANCE_ID = socket.gethostname()
|
| 80 |
+
except:
|
| 81 |
+
pass
|
| 82 |
+
|
| 83 |
+
# Global state
|
| 84 |
+
last_activity = datetime.now()
|
| 85 |
+
auto_stop_enabled = True
|
| 86 |
+
|
| 87 |
+
def send_alert_email(subject: str, message: str):
|
| 88 |
+
"""Send alert email via Resend API"""
|
| 89 |
+
if not RESEND_API_KEY:
|
| 90 |
+
print(f"[ALERT] No RESEND_API_KEY set, cannot send email: {subject}")
|
| 91 |
+
return False
|
| 92 |
+
|
| 93 |
+
try:
|
| 94 |
+
resp = http_requests.post(
|
| 95 |
+
"https://api.resend.com/emails",
|
| 96 |
+
headers={
|
| 97 |
+
"Authorization": f"Bearer {RESEND_API_KEY}",
|
| 98 |
+
"Content-Type": "application/json"
|
| 99 |
+
},
|
| 100 |
+
json={
|
| 101 |
+
"from": "PARLE Backend <alerts@parle.marcosrp.com>",
|
| 102 |
+
"to": [ALERT_EMAIL],
|
| 103 |
+
"subject": f"[PARLE ALERT] {subject}",
|
| 104 |
+
"html": f"""
|
| 105 |
+
<h2>🚨 PARLE Backend Alert</h2>
|
| 106 |
+
<p><strong>Instance:</strong> {TENSORDOCK_INSTANCE_ID or 'unknown'}</p>
|
| 107 |
+
<p><strong>Time:</strong> {datetime.now().isoformat()}</p>
|
| 108 |
+
<hr/>
|
| 109 |
+
<p>{message}</p>
|
| 110 |
+
<hr/>
|
| 111 |
+
<p style="color: #666; font-size: 12px;">
|
| 112 |
+
This is an automated alert from the PARLE TensorDock backend.
|
| 113 |
+
</p>
|
| 114 |
+
"""
|
| 115 |
+
},
|
| 116 |
+
timeout=10
|
| 117 |
+
)
|
| 118 |
+
if resp.status_code == 200:
|
| 119 |
+
print(f"[ALERT] Email sent successfully: {subject}")
|
| 120 |
+
return True
|
| 121 |
+
else:
|
| 122 |
+
print(f"[ALERT] Failed to send email: {resp.status_code} {resp.text}")
|
| 123 |
+
return False
|
| 124 |
+
except Exception as e:
|
| 125 |
+
print(f"[ALERT] Error sending email: {e}")
|
| 126 |
+
return False
|
| 127 |
+
|
| 128 |
+
def touch_activity():
|
| 129 |
+
"""Register activity (reset idle timer)"""
|
| 130 |
+
global last_activity
|
| 131 |
+
last_activity = datetime.now()
|
| 132 |
+
|
| 133 |
+
def stop_instance():
|
| 134 |
+
"""Stop this TensorDock instance via API"""
|
| 135 |
+
if not TENSORDOCK_API_TOKEN or not TENSORDOCK_INSTANCE_ID:
|
| 136 |
+
error_msg = "Missing API token or instance ID, cannot stop"
|
| 137 |
+
print(f"[AUTO-STOP] {error_msg}")
|
| 138 |
+
send_alert_email(
|
| 139 |
+
"Auto-Stop FAILED - Missing Credentials",
|
| 140 |
+
f"""
|
| 141 |
+
<p><strong>Error:</strong> {error_msg}</p>
|
| 142 |
+
<p><strong>TENSORDOCK_API_TOKEN:</strong> {'SET' if TENSORDOCK_API_TOKEN else 'NOT SET'}</p>
|
| 143 |
+
<p><strong>TENSORDOCK_INSTANCE_ID:</strong> {TENSORDOCK_INSTANCE_ID or 'NOT SET'}</p>
|
| 144 |
+
<p style="color: red;"><strong>⚠️ The instance is still running and costing money!</strong></p>
|
| 145 |
+
<p>Please SSH into the VM and set the environment variables, or stop the instance manually.</p>
|
| 146 |
+
"""
|
| 147 |
+
)
|
| 148 |
+
return False
|
| 149 |
+
|
| 150 |
+
try:
|
| 151 |
+
print(f"[AUTO-STOP] Stopping instance {TENSORDOCK_INSTANCE_ID}...")
|
| 152 |
+
resp = http_requests.post(
|
| 153 |
+
f"https://dashboard.tensordock.com/api/v2/instances/{TENSORDOCK_INSTANCE_ID}/stop",
|
| 154 |
+
headers={"Authorization": f"Bearer {TENSORDOCK_API_TOKEN}"},
|
| 155 |
+
timeout=30
|
| 156 |
+
)
|
| 157 |
+
if resp.status_code == 200:
|
| 158 |
+
print("[AUTO-STOP] Instance stopped successfully!")
|
| 159 |
+
return True
|
| 160 |
+
else:
|
| 161 |
+
error_msg = f"API returned {resp.status_code}: {resp.text}"
|
| 162 |
+
print(f"[AUTO-STOP] Failed to stop: {error_msg}")
|
| 163 |
+
send_alert_email(
|
| 164 |
+
"Auto-Stop FAILED - API Error",
|
| 165 |
+
f"""
|
| 166 |
+
<p><strong>Error:</strong> {error_msg}</p>
|
| 167 |
+
<p style="color: red;"><strong>⚠️ The instance is still running and costing money!</strong></p>
|
| 168 |
+
<p>Please stop the instance manually via TensorDock dashboard.</p>
|
| 169 |
+
"""
|
| 170 |
+
)
|
| 171 |
+
return False
|
| 172 |
+
except Exception as e:
|
| 173 |
+
error_msg = str(e)
|
| 174 |
+
print(f"[AUTO-STOP] Error stopping instance: {error_msg}")
|
| 175 |
+
send_alert_email(
|
| 176 |
+
"Auto-Stop FAILED - Exception",
|
| 177 |
+
f"""
|
| 178 |
+
<p><strong>Exception:</strong> {error_msg}</p>
|
| 179 |
+
<p style="color: red;"><strong>⚠️ The instance is still running and costing money!</strong></p>
|
| 180 |
+
<p>Please stop the instance manually via TensorDock dashboard.</p>
|
| 181 |
+
"""
|
| 182 |
+
)
|
| 183 |
+
return False
|
| 184 |
+
|
| 185 |
+
def idle_monitor():
|
| 186 |
+
"""Background thread that monitors idle time and stops instance"""
|
| 187 |
+
global last_activity, auto_stop_enabled
|
| 188 |
+
|
| 189 |
+
print(f"[AUTO-STOP] Monitor started. Timeout: {IDLE_TIMEOUT_SECONDS}s")
|
| 190 |
+
|
| 191 |
+
while auto_stop_enabled:
|
| 192 |
+
time.sleep(10) # Check every 10 seconds
|
| 193 |
+
|
| 194 |
+
elapsed = (datetime.now() - last_activity).total_seconds()
|
| 195 |
+
remaining = max(0, IDLE_TIMEOUT_SECONDS - elapsed)
|
| 196 |
+
|
| 197 |
+
if elapsed >= IDLE_TIMEOUT_SECONDS:
|
| 198 |
+
print(f"[AUTO-STOP] Idle for {elapsed:.0f}s, stopping instance...")
|
| 199 |
+
success = stop_instance()
|
| 200 |
+
if not success:
|
| 201 |
+
# Alert already sent by stop_instance, but log the failure
|
| 202 |
+
print("[AUTO-STOP] CRITICAL: Failed to stop instance! Will keep trying every 60s...")
|
| 203 |
+
# Keep trying every 60 seconds instead of giving up
|
| 204 |
+
while auto_stop_enabled:
|
| 205 |
+
time.sleep(60)
|
| 206 |
+
if stop_instance():
|
| 207 |
+
break
|
| 208 |
+
break
|
| 209 |
+
elif remaining <= 30:
|
| 210 |
+
print(f"[AUTO-STOP] Warning: stopping in {remaining:.0f}s if no activity")
|
| 211 |
+
|
| 212 |
+
# Start idle monitor thread
|
| 213 |
+
monitor_thread = threading.Thread(target=idle_monitor, daemon=True)
|
| 214 |
+
|
| 215 |
+
# ============================================================================
|
| 216 |
+
# TEXT CHUNKER - Divide texto em chunks para TTS streaming
|
| 217 |
+
# ============================================================================
|
| 218 |
+
@dataclass
|
| 219 |
+
class ChunkConfig:
|
| 220 |
+
"""Configuração do chunker"""
|
| 221 |
+
min_words: int = 3
|
| 222 |
+
max_words: int = 15
|
| 223 |
+
filler_words: list = None
|
| 224 |
+
|
| 225 |
+
def __post_init__(self):
|
| 226 |
+
if self.filler_words is None:
|
| 227 |
+
self.filler_words = ["hmm,", "bem,", "então,", "bom,", "olha,"]
|
| 228 |
+
|
| 229 |
+
|
| 230 |
+
class TextChunker:
|
| 231 |
+
"""
|
| 232 |
+
Divide streaming de texto em chunks para TTS.
|
| 233 |
+
|
| 234 |
+
Prioridades de quebra:
|
| 235 |
+
5: Fim de frase (. ! ?)
|
| 236 |
+
4: Quebras semânticas fortes (porém, entretanto, ; :)
|
| 237 |
+
3: Quebras médias (enquanto, embora, ,)
|
| 238 |
+
2: Conectivos (e, mas, porque)
|
| 239 |
+
1: Fallback por contagem de palavras
|
| 240 |
+
"""
|
| 241 |
+
|
| 242 |
+
def __init__(self, config: ChunkConfig = None):
|
| 243 |
+
self.config = config or ChunkConfig()
|
| 244 |
+
self.buffer = ""
|
| 245 |
+
self.word_count = 0
|
| 246 |
+
|
| 247 |
+
# Padrões de quebra com prioridades
|
| 248 |
+
self.break_patterns = {
|
| 249 |
+
5: [r'[.!?](?:\s|$)'], # Fim de frase
|
| 250 |
+
4: [r'[;:](?:\s|$)', r'\b(porém|entretanto|contudo|todavia|portanto)\b'],
|
| 251 |
+
3: [r',(?:\s|$)', r'\b(enquanto|embora|desde)\b'],
|
| 252 |
+
2: [r'\b(e|mas|porque|então|ou)\b'],
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
def add_token(self, token: str) -> Optional[str]:
|
| 256 |
+
"""
|
| 257 |
+
Adiciona token ao buffer e retorna chunk se pronto.
|
| 258 |
+
|
| 259 |
+
Returns:
|
| 260 |
+
Chunk de texto pronto para TTS, ou None se ainda acumulando.
|
| 261 |
+
"""
|
| 262 |
+
self.buffer += token
|
| 263 |
+
self.word_count = len(self.buffer.split())
|
| 264 |
+
|
| 265 |
+
# Verificar quebras por prioridade
|
| 266 |
+
for priority in [5, 4, 3, 2]:
|
| 267 |
+
for pattern in self.break_patterns.get(priority, []):
|
| 268 |
+
match = re.search(pattern, self.buffer, re.IGNORECASE)
|
| 269 |
+
if match and self.word_count >= self.config.min_words:
|
| 270 |
+
# Encontrou ponto de quebra
|
| 271 |
+
split_pos = match.end()
|
| 272 |
+
chunk = self.buffer[:split_pos].strip()
|
| 273 |
+
self.buffer = self.buffer[split_pos:].strip()
|
| 274 |
+
self.word_count = len(self.buffer.split())
|
| 275 |
+
return chunk
|
| 276 |
+
|
| 277 |
+
# Fallback: quebrar por contagem de palavras
|
| 278 |
+
if self.word_count >= self.config.max_words:
|
| 279 |
+
words = self.buffer.split()
|
| 280 |
+
chunk = " ".join(words[:self.config.max_words])
|
| 281 |
+
self.buffer = " ".join(words[self.config.max_words:])
|
| 282 |
+
self.word_count = len(self.buffer.split())
|
| 283 |
+
return chunk
|
| 284 |
+
|
| 285 |
+
return None
|
| 286 |
+
|
| 287 |
+
def flush(self) -> Optional[str]:
|
| 288 |
+
"""Retorna qualquer texto restante no buffer."""
|
| 289 |
+
if self.buffer.strip():
|
| 290 |
+
chunk = self.buffer.strip()
|
| 291 |
+
self.buffer = ""
|
| 292 |
+
self.word_count = 0
|
| 293 |
+
return chunk
|
| 294 |
+
return None
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
# ============================================================================
|
| 298 |
+
# MODELS
|
| 299 |
+
# ============================================================================
|
| 300 |
+
whisper_model = None
|
| 301 |
+
whisper_processor = None
|
| 302 |
+
vllm_engine = None
|
| 303 |
+
kokoro_pipeline = None
|
| 304 |
+
conversation_history = []
|
| 305 |
+
|
| 306 |
+
# CEFR Classifier
|
| 307 |
+
cefr_model = None
|
| 308 |
+
cefr_tokenizer = None
|
| 309 |
+
CEFR_MODEL = "marcosremar2/cefr-classifier-pt-mdeberta-v3-enem"
|
| 310 |
+
CEFR_LEVELS = ["A1", "A2", "B1", "B2", "C1"]
|
| 311 |
+
|
| 312 |
+
# CEFR tracking per session
|
| 313 |
+
user_message_buffer = [] # Buffer das mensagens do usuário
|
| 314 |
+
user_message_count = 0 # Contador de mensagens
|
| 315 |
+
current_cefr_level = "B1" # Nível padrão inicial
|
| 316 |
+
CEFR_CLASSIFY_EVERY = 2 # Classificar a cada N mensagens (reduzido para adaptação mais rápida)
|
| 317 |
+
CEFR_MIN_CHARS = 50 # Mínimo de caracteres para classificar (reduzido para A1)
|
| 318 |
+
CEFR_FIRST_MESSAGE_CLASSIFY = True # Classificar já na primeira mensagem se tiver chars suficientes
|
| 319 |
+
|
| 320 |
+
# Adaptive Speed System - Velocidade baseada em CEFR + espelhamento do aluno
|
| 321 |
+
CEFR_SPEED_MAP = {
|
| 322 |
+
"A1": 0.70, # Muito lento para iniciantes
|
| 323 |
+
"A2": 0.85, # Lento, bem articulado
|
| 324 |
+
"B1": 1.00, # Normal
|
| 325 |
+
"B2": 1.10, # Um pouco mais rápido
|
| 326 |
+
"C1": 1.25, # Fluente, rápido
|
| 327 |
+
}
|
| 328 |
+
|
| 329 |
+
CEFR_EXPECTED_WPM = {
|
| 330 |
+
"A1": 90, # Iniciantes falam ~90 palavras/min
|
| 331 |
+
"A2": 115,
|
| 332 |
+
"B1": 145,
|
| 333 |
+
"B2": 175,
|
| 334 |
+
"C1": 200, # Avançados falam ~200 palavras/min
|
| 335 |
+
}
|
| 336 |
+
|
| 337 |
+
# Estado da velocidade adaptativa
|
| 338 |
+
last_student_wpm = 0.0
|
| 339 |
+
suggested_avatar_speed = 1.0
|
| 340 |
+
|
| 341 |
+
LLM_MODEL = "RedHatAI/gemma-3-4b-it-quantized.w4a16"
|
| 342 |
+
WHISPER_MODEL = "openai/whisper-small"
|
| 343 |
+
|
| 344 |
+
# System prompts adaptados por nível CEFR
|
| 345 |
+
# {student_name} será substituído pelo nome do aluno
|
| 346 |
+
CEFR_SYSTEM_PROMPTS = {
|
| 347 |
+
"A1": """Você é Emma, professora de português para iniciantes.
|
| 348 |
+
O aluno se chama {student_name} e está no nível A1 (iniciante absoluto).
|
| 349 |
+
|
| 350 |
+
REGRAS OBRIGATÓRIAS:
|
| 351 |
+
- RESPONDA SEMPRE E SOMENTE EM PORTUGUÊS. NUNCA use inglês, francês ou outras línguas.
|
| 352 |
+
- Use 1-2 frases MUITO curtas (5-10 palavras cada).
|
| 353 |
+
- Vocabulário MUITO básico: saudações, números, cores, família, comida.
|
| 354 |
+
- Frases simples: sujeito + verbo + objeto. Ex: "Eu gosto de pizza."
|
| 355 |
+
- SEMPRE CORRIJA os erros do aluno de forma gentil, mostrando a forma correta.
|
| 356 |
+
Exemplo: Se o aluno disser "Eu gostar pizza", responda: "Muito bem! Em português dizemos 'Eu gosto de pizza'. Você gosta de pizza! 🍕"
|
| 357 |
+
- Se o aluno usar palavras em inglês/francês, ensine a palavra em português.
|
| 358 |
+
Exemplo: Se disser "happy", responda: "Feliz! Você está feliz! 😊"
|
| 359 |
+
- Celebre cada tentativa com entusiasmo.
|
| 360 |
+
- Faça perguntas simples: "Você gosta de...?" "O que é isso?"
|
| 361 |
+
- Use muitos emojis para tornar a conversa visual e amigável.""",
|
| 362 |
+
|
| 363 |
+
"A2": """Você é Emma, professora de português para nível básico.
|
| 364 |
+
O aluno se chama {student_name} e está no nível A2 (elementar).
|
| 365 |
+
|
| 366 |
+
REGRAS OBRIGATÓRIAS:
|
| 367 |
+
- RESPONDA SEMPRE E SOMENTE EM PORTUGUÊS. NUNCA use outras línguas.
|
| 368 |
+
- Use 2-3 frases curtas (10-15 palavras cada).
|
| 369 |
+
- Vocabulário do dia-a-dia: rotina, trabalho, hobbies, viagens.
|
| 370 |
+
- Conectivos básicos: e, mas, porque, quando, depois.
|
| 371 |
+
- Tempos verbais: presente, passado simples, "vou + infinitivo".
|
| 372 |
+
- CORRIJA erros importantes de forma natural e encorajadora.
|
| 373 |
+
Exemplo: "Ótimo! Só uma dica: dizemos 'fui ao cinema' em vez de 'fui no cinema'. Continue assim!"
|
| 374 |
+
- Se o aluno errar preposições ou conjugações, corrija gentilmente.
|
| 375 |
+
- Pergunte sobre rotina, família, hobbies, fins de semana.
|
| 376 |
+
- Seja paciente e encorajadora, mas ensine a forma correta.""",
|
| 377 |
+
|
| 378 |
+
"B1": """Você é Emma, professora de português para nível intermediário.
|
| 379 |
+
O aluno se chama {student_name} e está no nível B1 (intermediário).
|
| 380 |
+
|
| 381 |
+
REGRAS OBRIGATÓRIAS:
|
| 382 |
+
- RESPONDA SEMPRE E SOMENTE EM PORTUGUÊS.
|
| 383 |
+
- Use 2-3 frases de tamanho médio (15-25 palavras cada).
|
| 384 |
+
- Vocabulário variado com expressões comuns do português.
|
| 385 |
+
- Use diferentes tempos verbais naturalmente (presente, passado, futuro, condicional).
|
| 386 |
+
- Introduza o subjuntivo em contextos comuns: "Espero que você goste", "Talvez seja bom".
|
| 387 |
+
- Corrija erros de forma natural, integrada à conversa.
|
| 388 |
+
Exemplo: "Interessante! Eu também acho que seja importante... aliás, nesse caso dizemos 'é importante' no indicativo."
|
| 389 |
+
- Encoraje o aluno a elaborar mais: "Me conta mais sobre isso!"
|
| 390 |
+
- Tópicos: opiniões, experiências, planos, notícias, cultura.
|
| 391 |
+
- Faça perguntas que estimulem respostas mais longas.""",
|
| 392 |
+
|
| 393 |
+
"B2": """Você é Emma, professora de português para nível intermediário-avançado.
|
| 394 |
+
O aluno se chama {student_name} e está no nível B2 (intermediário superior).
|
| 395 |
+
|
| 396 |
+
REGRAS OBRIGATÓRIAS:
|
| 397 |
+
- RESPONDA SEMPRE EM PORTUGUÊS com naturalidade.
|
| 398 |
+
- Use 3-4 frases elaboradas (25-40 palavras cada).
|
| 399 |
+
- Vocabulário rico: expressões idiomáticas, phrasal verbs, colocações.
|
| 400 |
+
- Todas as estruturas gramaticais: subjuntivo, condicional, voz passiva.
|
| 401 |
+
- Discussões mais abstratas: política, sociedade, filosofia, arte.
|
| 402 |
+
- Correções sutis focando em nuances e estilo.
|
| 403 |
+
Exemplo: "Sua ideia está clara! Só um detalhe: em contextos mais formais, seria melhor usar 'embora' em vez de 'apesar que'."
|
| 404 |
+
- Desafie com perguntas argumentativas: "O que você pensa sobre...?" "Como você defende essa posição?"
|
| 405 |
+
- Use expressões coloquiais brasileiras naturalmente.
|
| 406 |
+
- Estimule debates e análises críticas.""",
|
| 407 |
+
|
| 408 |
+
"C1": """Você é Emma, professora de português para nível avançado.
|
| 409 |
+
O aluno se chama {student_name} e está no nível C1 (avançado/proficiente).
|
| 410 |
+
|
| 411 |
+
REGRAS OBRIGATÓRIAS:
|
| 412 |
+
- RESPONDA EM PORTUGUÊS com fluência nativa.
|
| 413 |
+
- Use 4-5 frases elaboradas e sofisticadas (40-60 palavras cada).
|
| 414 |
+
- Linguagem natural de falante nativo culto brasileiro.
|
| 415 |
+
- Vocabulário sofisticado: termos técnicos, acadêmicos, literários.
|
| 416 |
+
- Gírias, regionalismos, humor, ironia quando apropriado.
|
| 417 |
+
- Discussões complexas: filosofia, ciência, política internacional, arte, literatura.
|
| 418 |
+
- Correções apenas para refinamento estilístico ou nuances culturais.
|
| 419 |
+
Exemplo: "Argumento interessante! Talvez a expressão 'no que tange a' soe um pouco formal demais nesse contexto coloquial."
|
| 420 |
+
- Desafie intelectualmente: "Mas você não acha que há uma contradição entre...?"
|
| 421 |
+
- Explore nuances culturais brasileiras vs. portuguesas.
|
| 422 |
+
- Engaje em debates profundos e análises sofisticadas.
|
| 423 |
+
- Trate o aluno como um interlocutor intelectual.""",
|
| 424 |
+
}
|
| 425 |
+
|
| 426 |
+
# Configuração de max_tokens por nível CEFR
|
| 427 |
+
# Níveis mais baixos = respostas mais curtas, níveis altos = mais elaboradas
|
| 428 |
+
CEFR_MAX_TOKENS = {
|
| 429 |
+
"A1": 50, # 1-2 frases muito curtas
|
| 430 |
+
"A2": 70, # 2-3 frases curtas
|
| 431 |
+
"B1": 100, # 2-3 frases médias
|
| 432 |
+
"B2": 130, # 3-4 frases elaboradas
|
| 433 |
+
"C1": 180, # 4-5 frases sofisticadas
|
| 434 |
+
}
|
| 435 |
+
|
| 436 |
+
# Fallback prompts (mantidos para compatibilidade)
|
| 437 |
+
SYSTEM_PROMPTS = {
|
| 438 |
+
"chat": """Você é Emma, professora de idiomas. Ajude o usuário a praticar com conversação natural. Seja encorajadora, corrija erros gentilmente, mantenha respostas MUITO curtas (1-2 frases).""",
|
| 439 |
+
"default": """Você é Emma, uma professora de idiomas amigável e encorajadora. Ajude o usuário a aprender e praticar português. Mantenha respostas curtas e claras.""",
|
| 440 |
+
}
|
| 441 |
+
|
| 442 |
+
# ============================================================================
|
| 443 |
+
# FASTAPI APP
|
| 444 |
+
# ============================================================================
|
| 445 |
+
app = FastAPI(title="DumontTalker - Full Pipeline with WebSocket")
|
| 446 |
+
|
| 447 |
+
app.add_middleware(
|
| 448 |
+
CORSMiddleware,
|
| 449 |
+
allow_origins=["*"],
|
| 450 |
+
allow_credentials=True,
|
| 451 |
+
allow_methods=["*"],
|
| 452 |
+
allow_headers=["*"],
|
| 453 |
+
)
|
| 454 |
+
|
| 455 |
+
@app.on_event("startup")
|
| 456 |
+
async def load_models():
|
| 457 |
+
"""Load all models on startup"""
|
| 458 |
+
global whisper_model, whisper_processor, vllm_engine, kokoro_pipeline
|
| 459 |
+
global cefr_model, cefr_tokenizer
|
| 460 |
+
import torch
|
| 461 |
+
import numpy as np
|
| 462 |
+
|
| 463 |
+
print("=" * 60)
|
| 464 |
+
print("Loading DumontTalker Full Pipeline + WebSocket + CEFR")
|
| 465 |
+
print(f"Auto-stop after {IDLE_TIMEOUT_SECONDS}s of inactivity")
|
| 466 |
+
print("=" * 60)
|
| 467 |
+
|
| 468 |
+
# 1. Load vLLM FIRST (needs contiguous memory)
|
| 469 |
+
print(f"[1/4] Loading vLLM: {LLM_MODEL}...")
|
| 470 |
+
from vllm import LLM
|
| 471 |
+
|
| 472 |
+
vllm_engine = LLM(
|
| 473 |
+
model=LLM_MODEL,
|
| 474 |
+
dtype="auto",
|
| 475 |
+
gpu_memory_utilization=0.40, # 40% of 24GB = ~9.6GB for vLLM (increased for longer prompts)
|
| 476 |
+
max_model_len=2048, # Increased to handle C1 level prompts
|
| 477 |
+
trust_remote_code=True,
|
| 478 |
+
)
|
| 479 |
+
print(f"[1/4] vLLM loaded!")
|
| 480 |
+
|
| 481 |
+
# 2. Load Whisper
|
| 482 |
+
print(f"[2/4] Loading Whisper: {WHISPER_MODEL}...")
|
| 483 |
+
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
|
| 484 |
+
|
| 485 |
+
whisper_processor = AutoProcessor.from_pretrained(WHISPER_MODEL)
|
| 486 |
+
whisper_model = AutoModelForSpeechSeq2Seq.from_pretrained(
|
| 487 |
+
WHISPER_MODEL,
|
| 488 |
+
torch_dtype=torch.float16,
|
| 489 |
+
low_cpu_mem_usage=True,
|
| 490 |
+
).to("cuda")
|
| 491 |
+
print(f"[2/4] Whisper loaded!")
|
| 492 |
+
|
| 493 |
+
# 3. Load CEFR Classifier (FP16 - ~0.6GB VRAM)
|
| 494 |
+
print(f"[3/4] Loading CEFR Classifier: {CEFR_MODEL}...")
|
| 495 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 496 |
+
|
| 497 |
+
cefr_tokenizer = AutoTokenizer.from_pretrained(CEFR_MODEL)
|
| 498 |
+
cefr_model = AutoModelForSequenceClassification.from_pretrained(
|
| 499 |
+
CEFR_MODEL,
|
| 500 |
+
torch_dtype=torch.float16,
|
| 501 |
+
low_cpu_mem_usage=True,
|
| 502 |
+
).to("cuda")
|
| 503 |
+
cefr_model.eval() # Set to evaluation mode
|
| 504 |
+
print(f"[3/4] CEFR Classifier loaded! (FP16)")
|
| 505 |
+
|
| 506 |
+
# 4. Load Kokoro TTS
|
| 507 |
+
print(f"[4/4] Loading Kokoro TTS...")
|
| 508 |
+
from kokoro import KPipeline
|
| 509 |
+
|
| 510 |
+
kokoro_pipeline = KPipeline(lang_code='p', device='cuda')
|
| 511 |
+
print(f"[4/4] Kokoro loaded!")
|
| 512 |
+
|
| 513 |
+
# Memory status
|
| 514 |
+
allocated = torch.cuda.memory_allocated(0) / 1024**3
|
| 515 |
+
total = torch.cuda.get_device_properties(0).total_memory / 1024**3
|
| 516 |
+
print("=" * 60)
|
| 517 |
+
print(f"All models loaded! VRAM: {allocated:.1f}GB / {total:.1f}GB")
|
| 518 |
+
print(f"CEFR Classifier: {CEFR_MODEL}")
|
| 519 |
+
print(f"CEFR classification every {CEFR_CLASSIFY_EVERY} messages")
|
| 520 |
+
print("WebSocket endpoint: ws://host:8000/ws/stream")
|
| 521 |
+
print("=" * 60)
|
| 522 |
+
|
| 523 |
+
# Start idle monitor AFTER models are loaded
|
| 524 |
+
touch_activity() # Reset timer
|
| 525 |
+
monitor_thread.start()
|
| 526 |
+
print("[AUTO-STOP] Idle monitor started")
|
| 527 |
+
|
| 528 |
+
|
| 529 |
+
# ============================================================================
|
| 530 |
+
# HELPER FUNCTIONS
|
| 531 |
+
# ============================================================================
|
| 532 |
+
|
| 533 |
+
def classify_cefr(text: str) -> tuple:
|
| 534 |
+
"""
|
| 535 |
+
Classifica o nível CEFR de um texto.
|
| 536 |
+
|
| 537 |
+
Returns:
|
| 538 |
+
tuple: (level, confidence, all_probs)
|
| 539 |
+
"""
|
| 540 |
+
global cefr_model, cefr_tokenizer
|
| 541 |
+
import torch
|
| 542 |
+
|
| 543 |
+
if cefr_model is None or cefr_tokenizer is None:
|
| 544 |
+
print("[CEFR] Model not loaded, returning default B1")
|
| 545 |
+
return "B1", 0.0, {}
|
| 546 |
+
|
| 547 |
+
start = time.time()
|
| 548 |
+
|
| 549 |
+
# Tokenize
|
| 550 |
+
inputs = cefr_tokenizer(
|
| 551 |
+
text,
|
| 552 |
+
return_tensors="pt",
|
| 553 |
+
truncation=True,
|
| 554 |
+
max_length=512,
|
| 555 |
+
padding=True
|
| 556 |
+
)
|
| 557 |
+
inputs = {k: v.to("cuda") for k, v in inputs.items()}
|
| 558 |
+
|
| 559 |
+
# Inference
|
| 560 |
+
with torch.no_grad():
|
| 561 |
+
outputs = cefr_model(**inputs)
|
| 562 |
+
probs = torch.softmax(outputs.logits, dim=-1)
|
| 563 |
+
pred_idx = torch.argmax(probs, dim=-1).item()
|
| 564 |
+
confidence = probs[0][pred_idx].item()
|
| 565 |
+
|
| 566 |
+
level = CEFR_LEVELS[pred_idx]
|
| 567 |
+
all_probs = {CEFR_LEVELS[i]: probs[0][i].item() for i in range(len(CEFR_LEVELS))}
|
| 568 |
+
|
| 569 |
+
elapsed_ms = int((time.time() - start) * 1000)
|
| 570 |
+
print(f"[CEFR] {elapsed_ms}ms | Level: {level} ({confidence:.0%}) | Probs: {all_probs}")
|
| 571 |
+
|
| 572 |
+
return level, confidence, all_probs
|
| 573 |
+
|
| 574 |
+
|
| 575 |
+
def update_cefr_level(user_text: str) -> str:
|
| 576 |
+
"""
|
| 577 |
+
Atualiza o nível CEFR baseado nas mensagens do usuário.
|
| 578 |
+
Classifica quando atingir CEFR_CLASSIFY_EVERY mensagens E CEFR_MIN_CHARS caracteres.
|
| 579 |
+
|
| 580 |
+
Se CEFR_FIRST_MESSAGE_CLASSIFY=True, também classifica na primeira mensagem
|
| 581 |
+
se ela tiver caracteres suficientes (importante para adaptação imediata).
|
| 582 |
+
|
| 583 |
+
Returns:
|
| 584 |
+
str: Nível CEFR atual (pode ter sido atualizado ou não)
|
| 585 |
+
"""
|
| 586 |
+
global user_message_buffer, user_message_count, current_cefr_level
|
| 587 |
+
|
| 588 |
+
# Adiciona mensagem ao buffer
|
| 589 |
+
user_message_buffer.append(user_text)
|
| 590 |
+
user_message_count += 1
|
| 591 |
+
|
| 592 |
+
# Calcula tamanho total do buffer
|
| 593 |
+
combined_text = " ".join(user_message_buffer)
|
| 594 |
+
total_chars = len(combined_text)
|
| 595 |
+
|
| 596 |
+
print(f"[CEFR] Message {user_message_count}/{CEFR_CLASSIFY_EVERY} buffered | {total_chars}/{CEFR_MIN_CHARS} chars")
|
| 597 |
+
|
| 598 |
+
# Verifica se deve classificar:
|
| 599 |
+
# 1. Na primeira mensagem se CEFR_FIRST_MESSAGE_CLASSIFY=True e tiver chars suficientes
|
| 600 |
+
# 2. A cada CEFR_CLASSIFY_EVERY mensagens com chars suficientes
|
| 601 |
+
should_classify = False
|
| 602 |
+
|
| 603 |
+
if CEFR_FIRST_MESSAGE_CLASSIFY and user_message_count == 1 and total_chars >= CEFR_MIN_CHARS:
|
| 604 |
+
print(f"[CEFR] First message classification triggered ({total_chars} chars)")
|
| 605 |
+
should_classify = True
|
| 606 |
+
elif user_message_count >= CEFR_CLASSIFY_EVERY and total_chars >= CEFR_MIN_CHARS:
|
| 607 |
+
print(f"[CEFR] Periodic classification triggered ({total_chars} chars)")
|
| 608 |
+
should_classify = True
|
| 609 |
+
|
| 610 |
+
if should_classify:
|
| 611 |
+
print(f"[CEFR] Classifying combined text ({total_chars} chars)...")
|
| 612 |
+
|
| 613 |
+
# Classifica
|
| 614 |
+
new_level, confidence, probs = classify_cefr(combined_text)
|
| 615 |
+
|
| 616 |
+
# Atualiza nível se confiança > 50% (reduzido de 60% para melhor adaptação)
|
| 617 |
+
if confidence > 0.5:
|
| 618 |
+
old_level = current_cefr_level
|
| 619 |
+
current_cefr_level = new_level
|
| 620 |
+
if old_level != new_level:
|
| 621 |
+
print(f"[CEFR] Level changed: {old_level} → {new_level} (confidence: {confidence:.0%})")
|
| 622 |
+
else:
|
| 623 |
+
print(f"[CEFR] Level confirmed: {new_level} (confidence: {confidence:.0%})")
|
| 624 |
+
else:
|
| 625 |
+
print(f"[CEFR] Low confidence ({confidence:.0%}), keeping level: {current_cefr_level}")
|
| 626 |
+
|
| 627 |
+
# Reset buffer e contador
|
| 628 |
+
user_message_buffer = []
|
| 629 |
+
user_message_count = 0
|
| 630 |
+
|
| 631 |
+
elif user_message_count >= CEFR_CLASSIFY_EVERY:
|
| 632 |
+
# Atingiu mensagens mas não caracteres - continua acumulando
|
| 633 |
+
print(f"[CEFR] Need more text ({total_chars}/{CEFR_MIN_CHARS} chars), continuing to buffer...")
|
| 634 |
+
|
| 635 |
+
return current_cefr_level
|
| 636 |
+
|
| 637 |
+
|
| 638 |
+
def calculate_speech_metrics(audio_array, sample_rate: int, transcript: str) -> dict:
|
| 639 |
+
"""
|
| 640 |
+
Calcula métricas de fala do aluno.
|
| 641 |
+
|
| 642 |
+
Args:
|
| 643 |
+
audio_array: Array de áudio (numpy)
|
| 644 |
+
sample_rate: Taxa de amostragem
|
| 645 |
+
transcript: Texto transcrito
|
| 646 |
+
|
| 647 |
+
Returns:
|
| 648 |
+
dict com métricas: audio_duration_sec, word_count, wpm
|
| 649 |
+
"""
|
| 650 |
+
import numpy as np
|
| 651 |
+
|
| 652 |
+
# Duração do áudio em segundos
|
| 653 |
+
audio_duration_sec = len(audio_array) / sample_rate
|
| 654 |
+
|
| 655 |
+
# Contar palavras (simples, baseado em espaços)
|
| 656 |
+
words = transcript.strip().split()
|
| 657 |
+
word_count = len(words)
|
| 658 |
+
|
| 659 |
+
# Calcular WPM (palavras por minuto)
|
| 660 |
+
if audio_duration_sec > 0:
|
| 661 |
+
wpm = (word_count / audio_duration_sec) * 60
|
| 662 |
+
else:
|
| 663 |
+
wpm = 0
|
| 664 |
+
|
| 665 |
+
return {
|
| 666 |
+
"audio_duration_sec": round(audio_duration_sec, 2),
|
| 667 |
+
"word_count": word_count,
|
| 668 |
+
"wpm": round(wpm, 1)
|
| 669 |
+
}
|
| 670 |
+
|
| 671 |
+
|
| 672 |
+
def calculate_suggested_speed(cefr_level: str, student_wpm: float, manual_speed: Optional[float] = None) -> float:
|
| 673 |
+
"""
|
| 674 |
+
Calcula a velocidade sugerida para o avatar baseada em:
|
| 675 |
+
1. Nível CEFR do aluno
|
| 676 |
+
2. WPM do aluno (espelhamento)
|
| 677 |
+
3. Preferência manual (tem prioridade)
|
| 678 |
+
|
| 679 |
+
Args:
|
| 680 |
+
cefr_level: Nível CEFR atual do aluno (A1-C1)
|
| 681 |
+
student_wpm: Palavras por minuto do aluno
|
| 682 |
+
manual_speed: Velocidade definida manualmente (None = automático)
|
| 683 |
+
|
| 684 |
+
Returns:
|
| 685 |
+
float: Velocidade sugerida (0.5 a 1.5)
|
| 686 |
+
"""
|
| 687 |
+
global last_student_wpm, suggested_avatar_speed
|
| 688 |
+
|
| 689 |
+
# Se velocidade manual foi definida, ela tem prioridade
|
| 690 |
+
if manual_speed is not None:
|
| 691 |
+
return max(0.5, min(1.5, manual_speed))
|
| 692 |
+
|
| 693 |
+
# Velocidade base do nível CEFR
|
| 694 |
+
base_speed = CEFR_SPEED_MAP.get(cefr_level, 1.0)
|
| 695 |
+
|
| 696 |
+
# Espelhamento: ajusta baseado na diferença entre WPM do aluno e esperado
|
| 697 |
+
if student_wpm > 0:
|
| 698 |
+
expected_wpm = CEFR_EXPECTED_WPM.get(cefr_level, 145)
|
| 699 |
+
|
| 700 |
+
# Razão entre WPM real e esperado
|
| 701 |
+
# Se aluno fala mais devagar, ratio < 1, avatar desacelera
|
| 702 |
+
# Se aluno fala mais rápido, ratio > 1, avatar acelera (até o limite)
|
| 703 |
+
ratio = student_wpm / expected_wpm
|
| 704 |
+
|
| 705 |
+
# Limita o fator de espelhamento entre 0.7 e 1.3
|
| 706 |
+
mirror_factor = max(0.7, min(1.3, ratio))
|
| 707 |
+
|
| 708 |
+
# Velocidade sugerida = base * espelhamento
|
| 709 |
+
suggested = base_speed * mirror_factor
|
| 710 |
+
|
| 711 |
+
# Atualiza estado global
|
| 712 |
+
last_student_wpm = student_wpm
|
| 713 |
+
else:
|
| 714 |
+
# Sem dados de WPM, usa apenas a velocidade base do CEFR
|
| 715 |
+
suggested = base_speed
|
| 716 |
+
|
| 717 |
+
# Limita entre 0.5 e 1.5
|
| 718 |
+
suggested_avatar_speed = max(0.5, min(1.5, suggested))
|
| 719 |
+
|
| 720 |
+
print(f"[SPEED] CEFR={cefr_level}, WPM={student_wpm:.0f}, base={base_speed:.2f}, suggested={suggested_avatar_speed:.2f}")
|
| 721 |
+
|
| 722 |
+
return round(suggested_avatar_speed, 2)
|
| 723 |
+
|
| 724 |
+
|
| 725 |
+
def transcribe_audio(audio_data: bytes, language: str = "pt") -> dict:
|
| 726 |
+
"""
|
| 727 |
+
Transcreve áudio usando Whisper e calcula métricas de fala.
|
| 728 |
+
|
| 729 |
+
Args:
|
| 730 |
+
audio_data: Dados do áudio em bytes (WAV)
|
| 731 |
+
language: Código do idioma para forçar transcrição (pt, en, fr, es, etc.)
|
| 732 |
+
Default: "pt" (português)
|
| 733 |
+
|
| 734 |
+
Returns:
|
| 735 |
+
dict com: transcript, stt_ms, speech_metrics (audio_duration_sec, word_count, wpm)
|
| 736 |
+
"""
|
| 737 |
+
global whisper_model, whisper_processor
|
| 738 |
+
import torch
|
| 739 |
+
import soundfile as sf
|
| 740 |
+
import librosa
|
| 741 |
+
|
| 742 |
+
start = time.time()
|
| 743 |
+
|
| 744 |
+
audio_array, sr = sf.read(io.BytesIO(audio_data))
|
| 745 |
+
|
| 746 |
+
if sr != 16000:
|
| 747 |
+
audio_array = librosa.resample(audio_array, orig_sr=sr, target_sr=16000)
|
| 748 |
+
sr = 16000
|
| 749 |
+
|
| 750 |
+
inputs = whisper_processor(audio_array, sampling_rate=16000, return_tensors="pt")
|
| 751 |
+
inputs = {k: v.to("cuda", dtype=torch.float16) if v.dtype == torch.float32 else v.to("cuda")
|
| 752 |
+
for k, v in inputs.items()}
|
| 753 |
+
|
| 754 |
+
# Forçar idioma na transcrição para evitar confusão
|
| 755 |
+
# Whisper usa tokens especiais para idioma: <|pt|>, <|en|>, etc.
|
| 756 |
+
forced_decoder_ids = whisper_processor.get_decoder_prompt_ids(language=language, task="transcribe")
|
| 757 |
+
|
| 758 |
+
with torch.no_grad():
|
| 759 |
+
output_ids = whisper_model.generate(
|
| 760 |
+
**inputs,
|
| 761 |
+
max_new_tokens=128,
|
| 762 |
+
forced_decoder_ids=forced_decoder_ids
|
| 763 |
+
)
|
| 764 |
+
|
| 765 |
+
transcript = whisper_processor.batch_decode(output_ids, skip_special_tokens=True)[0]
|
| 766 |
+
elapsed_ms = int((time.time() - start) * 1000)
|
| 767 |
+
|
| 768 |
+
if not transcript.strip():
|
| 769 |
+
transcript = "Olá"
|
| 770 |
+
|
| 771 |
+
# Calcular m��tricas de fala (WPM, duração, etc.)
|
| 772 |
+
speech_metrics = calculate_speech_metrics(audio_array, sr, transcript)
|
| 773 |
+
|
| 774 |
+
print(f"[STT] {elapsed_ms}ms | lang={language} | WPM={speech_metrics['wpm']:.0f} | '{transcript}'")
|
| 775 |
+
|
| 776 |
+
return {
|
| 777 |
+
"transcript": transcript,
|
| 778 |
+
"stt_ms": elapsed_ms,
|
| 779 |
+
"speech_metrics": speech_metrics
|
| 780 |
+
}
|
| 781 |
+
|
| 782 |
+
|
| 783 |
+
def generate_response(
|
| 784 |
+
transcript: str,
|
| 785 |
+
mode: str = "chat",
|
| 786 |
+
student_name: str = "Aluno",
|
| 787 |
+
custom_system_prompt: Optional[str] = None,
|
| 788 |
+
custom_max_tokens: Optional[int] = None,
|
| 789 |
+
custom_temperature: Optional[float] = None
|
| 790 |
+
) -> tuple:
|
| 791 |
+
"""
|
| 792 |
+
Gera resposta com vLLM, adaptada ao nível CEFR do aluno.
|
| 793 |
+
|
| 794 |
+
O nível CEFR é atualizado a cada 2 mensagens do usuário.
|
| 795 |
+
O system prompt pode ser:
|
| 796 |
+
1. Enviado pelo frontend (custom_system_prompt) - PREFERIDO
|
| 797 |
+
2. Detectado automaticamente pelo backend (fallback)
|
| 798 |
+
|
| 799 |
+
Args:
|
| 800 |
+
transcript: Texto do usuário
|
| 801 |
+
mode: Modo de conversação (chat, default, cefr_adaptive)
|
| 802 |
+
student_name: Nome do aluno para personalizar o prompt
|
| 803 |
+
custom_system_prompt: System prompt customizado enviado pelo frontend (opcional)
|
| 804 |
+
custom_max_tokens: Max tokens customizado enviado pelo frontend (opcional)
|
| 805 |
+
custom_temperature: Temperature customizada enviada pelo frontend (opcional)
|
| 806 |
+
"""
|
| 807 |
+
global vllm_engine, conversation_history, current_cefr_level
|
| 808 |
+
from vllm import SamplingParams
|
| 809 |
+
from transformers import AutoTokenizer
|
| 810 |
+
|
| 811 |
+
start = time.time()
|
| 812 |
+
|
| 813 |
+
# 1. Atualiza nível CEFR baseado na mensagem do usuário (para tracking)
|
| 814 |
+
cefr_level = update_cefr_level(transcript)
|
| 815 |
+
|
| 816 |
+
# 2. Seleciona system prompt
|
| 817 |
+
# PRIORIDADE: custom_system_prompt do frontend > prompt interno baseado em CEFR
|
| 818 |
+
if custom_system_prompt:
|
| 819 |
+
# Usa o prompt enviado pelo frontend (já adaptado ao nível CEFR)
|
| 820 |
+
system = custom_system_prompt
|
| 821 |
+
print(f"[LLM] Using CUSTOM system prompt from frontend (length: {len(system)})")
|
| 822 |
+
elif mode in ["chat", "default", "cefr_adaptive"]:
|
| 823 |
+
# Fallback: usa prompt interno baseado no nível detectado
|
| 824 |
+
system_template = CEFR_SYSTEM_PROMPTS.get(cefr_level, CEFR_SYSTEM_PROMPTS["B1"])
|
| 825 |
+
system = system_template.format(student_name=student_name)
|
| 826 |
+
print(f"[LLM] Using INTERNAL CEFR prompt for level: {cefr_level}, student: {student_name}")
|
| 827 |
+
else:
|
| 828 |
+
# Modo específico (fallback)
|
| 829 |
+
system = SYSTEM_PROMPTS.get(mode, SYSTEM_PROMPTS["default"])
|
| 830 |
+
|
| 831 |
+
messages = [{"role": "system", "content": system}]
|
| 832 |
+
messages.extend(conversation_history[-10:])
|
| 833 |
+
messages.append({"role": "user", "content": transcript})
|
| 834 |
+
|
| 835 |
+
tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL)
|
| 836 |
+
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 837 |
+
|
| 838 |
+
# Usa parâmetros customizados se fornecidos, senão usa baseado no nível CEFR
|
| 839 |
+
max_tokens = custom_max_tokens if custom_max_tokens else CEFR_MAX_TOKENS.get(cefr_level, 100)
|
| 840 |
+
temperature = custom_temperature if custom_temperature else 0.7
|
| 841 |
+
|
| 842 |
+
params = SamplingParams(temperature=temperature, top_p=0.8, max_tokens=max_tokens)
|
| 843 |
+
outputs = vllm_engine.generate(prompt, params)
|
| 844 |
+
response = outputs[0].outputs[0].text.strip()
|
| 845 |
+
|
| 846 |
+
print(f"[LLM] max_tokens={max_tokens}, temp={temperature}, CEFR={cefr_level}")
|
| 847 |
+
|
| 848 |
+
conversation_history.append({"role": "user", "content": transcript})
|
| 849 |
+
conversation_history.append({"role": "assistant", "content": response})
|
| 850 |
+
if len(conversation_history) > 20:
|
| 851 |
+
conversation_history = conversation_history[-20:]
|
| 852 |
+
|
| 853 |
+
elapsed_ms = int((time.time() - start) * 1000)
|
| 854 |
+
print(f"[LLM] {elapsed_ms}ms | CEFR:{cefr_level} | '{response}'")
|
| 855 |
+
|
| 856 |
+
return response, elapsed_ms
|
| 857 |
+
|
| 858 |
+
|
| 859 |
+
def remove_emojis(text: str) -> str:
|
| 860 |
+
"""Remove emojis e caracteres especiais do texto antes do TTS"""
|
| 861 |
+
# Pattern para remover emojis
|
| 862 |
+
emoji_pattern = re.compile("["
|
| 863 |
+
u"\U0001F600-\U0001F64F" # emoticons
|
| 864 |
+
u"\U0001F300-\U0001F5FF" # symbols & pictographs
|
| 865 |
+
u"\U0001F680-\U0001F6FF" # transport & map symbols
|
| 866 |
+
u"\U0001F1E0-\U0001F1FF" # flags
|
| 867 |
+
u"\U00002702-\U000027B0" # dingbats
|
| 868 |
+
u"\U000024C2-\U0001F251" # enclosed characters
|
| 869 |
+
u"\U0001f926-\U0001f937" # gestures
|
| 870 |
+
u"\U00010000-\U0010ffff" # supplementary
|
| 871 |
+
u"\u2640-\u2642" # gender symbols
|
| 872 |
+
u"\u2600-\u2B55" # misc symbols
|
| 873 |
+
u"\u200d" # zero width joiner
|
| 874 |
+
u"\u23cf" # eject symbol
|
| 875 |
+
u"\u23e9" # fast forward
|
| 876 |
+
u"\u231a" # watch
|
| 877 |
+
u"\ufe0f" # variation selector
|
| 878 |
+
u"\u3030" # wavy dash
|
| 879 |
+
"]+", flags=re.UNICODE)
|
| 880 |
+
return emoji_pattern.sub('', text).strip()
|
| 881 |
+
|
| 882 |
+
|
| 883 |
+
def synthesize_audio(text: str, voice: str = "af_bella") -> tuple:
|
| 884 |
+
"""Sintetiza áudio com Kokoro TTS"""
|
| 885 |
+
global kokoro_pipeline
|
| 886 |
+
import numpy as np
|
| 887 |
+
import soundfile as sf
|
| 888 |
+
|
| 889 |
+
start = time.time()
|
| 890 |
+
|
| 891 |
+
# Remove emojis antes do TTS
|
| 892 |
+
clean_text = remove_emojis(text)
|
| 893 |
+
print(f"[TTS] Original: '{text}' -> Clean: '{clean_text}'")
|
| 894 |
+
|
| 895 |
+
audio_chunks = []
|
| 896 |
+
for gs, ps, audio_chunk in kokoro_pipeline(clean_text, voice=voice):
|
| 897 |
+
if audio_chunk is not None and len(audio_chunk) > 0:
|
| 898 |
+
audio_chunks.append(audio_chunk)
|
| 899 |
+
|
| 900 |
+
if not audio_chunks:
|
| 901 |
+
raise Exception("TTS failed to generate audio")
|
| 902 |
+
|
| 903 |
+
audio_output = np.concatenate(audio_chunks)
|
| 904 |
+
|
| 905 |
+
buffer = io.BytesIO()
|
| 906 |
+
sf.write(buffer, audio_output, 24000, format='WAV')
|
| 907 |
+
audio_bytes = buffer.getvalue()
|
| 908 |
+
|
| 909 |
+
elapsed_ms = int((time.time() - start) * 1000)
|
| 910 |
+
print(f"[TTS] {elapsed_ms}ms | {len(audio_bytes)} bytes")
|
| 911 |
+
|
| 912 |
+
return audio_bytes, elapsed_ms
|
| 913 |
+
|
| 914 |
+
|
| 915 |
+
# ============================================================================
|
| 916 |
+
# HTTP ENDPOINTS
|
| 917 |
+
# ============================================================================
|
| 918 |
+
|
| 919 |
+
@app.get("/health")
|
| 920 |
+
def health():
|
| 921 |
+
import torch
|
| 922 |
+
global last_activity, current_cefr_level, user_message_count
|
| 923 |
+
|
| 924 |
+
elapsed = (datetime.now() - last_activity).total_seconds()
|
| 925 |
+
remaining = max(0, IDLE_TIMEOUT_SECONDS - elapsed)
|
| 926 |
+
|
| 927 |
+
allocated = torch.cuda.memory_allocated(0) / 1024**3 if torch.cuda.is_available() else 0
|
| 928 |
+
return {
|
| 929 |
+
"status": "healthy",
|
| 930 |
+
# Frontend compatibility fields
|
| 931 |
+
"whisper_loaded": whisper_model is not None,
|
| 932 |
+
"vllm_loaded": vllm_engine is not None,
|
| 933 |
+
"kokoro_loaded": kokoro_pipeline is not None,
|
| 934 |
+
"cefr_loaded": cefr_model is not None,
|
| 935 |
+
# Additional info
|
| 936 |
+
"models": {
|
| 937 |
+
"stt": WHISPER_MODEL,
|
| 938 |
+
"llm": LLM_MODEL,
|
| 939 |
+
"tts": "kokoro",
|
| 940 |
+
"cefr": CEFR_MODEL,
|
| 941 |
+
},
|
| 942 |
+
"cefr": {
|
| 943 |
+
"current_level": current_cefr_level,
|
| 944 |
+
"messages_until_classify": max(0, CEFR_CLASSIFY_EVERY - user_message_count),
|
| 945 |
+
"classify_every": CEFR_CLASSIFY_EVERY,
|
| 946 |
+
"min_chars": CEFR_MIN_CHARS,
|
| 947 |
+
"current_chars": len(" ".join(user_message_buffer)),
|
| 948 |
+
},
|
| 949 |
+
"vram_gb": f"{allocated:.1f}",
|
| 950 |
+
"websocket": "/ws/stream",
|
| 951 |
+
"auto_stop": {
|
| 952 |
+
"enabled": auto_stop_enabled,
|
| 953 |
+
"timeout_seconds": IDLE_TIMEOUT_SECONDS,
|
| 954 |
+
"idle_seconds": int(elapsed),
|
| 955 |
+
"stop_in_seconds": int(remaining),
|
| 956 |
+
}
|
| 957 |
+
}
|
| 958 |
+
|
| 959 |
+
|
| 960 |
+
class CEFRClassifyRequest(BaseModel):
|
| 961 |
+
"""Request para classificação CEFR manual"""
|
| 962 |
+
text: str
|
| 963 |
+
|
| 964 |
+
|
| 965 |
+
@app.post("/api/cefr/classify")
|
| 966 |
+
async def api_cefr_classify(request: CEFRClassifyRequest):
|
| 967 |
+
"""
|
| 968 |
+
Classifica manualmente o nível CEFR de um texto.
|
| 969 |
+
Não afeta o nível atual da sessão.
|
| 970 |
+
"""
|
| 971 |
+
touch_activity()
|
| 972 |
+
|
| 973 |
+
level, confidence, probs = classify_cefr(request.text)
|
| 974 |
+
|
| 975 |
+
return {
|
| 976 |
+
"level": level,
|
| 977 |
+
"confidence": confidence,
|
| 978 |
+
"probabilities": probs,
|
| 979 |
+
"text_length": len(request.text),
|
| 980 |
+
}
|
| 981 |
+
|
| 982 |
+
|
| 983 |
+
@app.get("/api/cefr/status")
|
| 984 |
+
async def api_cefr_status():
|
| 985 |
+
"""Retorna o status atual do CEFR"""
|
| 986 |
+
global current_cefr_level, user_message_count, user_message_buffer
|
| 987 |
+
|
| 988 |
+
current_chars = len(" ".join(user_message_buffer))
|
| 989 |
+
return {
|
| 990 |
+
"current_level": current_cefr_level,
|
| 991 |
+
"message_count": user_message_count,
|
| 992 |
+
"messages_until_classify": max(0, CEFR_CLASSIFY_EVERY - user_message_count),
|
| 993 |
+
"buffer_size": len(user_message_buffer),
|
| 994 |
+
"current_chars": current_chars,
|
| 995 |
+
"min_chars": CEFR_MIN_CHARS,
|
| 996 |
+
"chars_until_classify": max(0, CEFR_MIN_CHARS - current_chars),
|
| 997 |
+
"ready_to_classify": user_message_count >= CEFR_CLASSIFY_EVERY and current_chars >= CEFR_MIN_CHARS,
|
| 998 |
+
}
|
| 999 |
+
|
| 1000 |
+
|
| 1001 |
+
@app.post("/api/cefr/reset")
|
| 1002 |
+
async def api_cefr_reset():
|
| 1003 |
+
"""Reseta o nível CEFR para o padrão (B1)"""
|
| 1004 |
+
global current_cefr_level, user_message_count, user_message_buffer
|
| 1005 |
+
|
| 1006 |
+
old_level = current_cefr_level
|
| 1007 |
+
current_cefr_level = "B1"
|
| 1008 |
+
user_message_count = 0
|
| 1009 |
+
user_message_buffer = []
|
| 1010 |
+
|
| 1011 |
+
return {
|
| 1012 |
+
"status": "reset",
|
| 1013 |
+
"old_level": old_level,
|
| 1014 |
+
"new_level": current_cefr_level,
|
| 1015 |
+
}
|
| 1016 |
+
|
| 1017 |
+
|
| 1018 |
+
@app.post("/api/cefr/set")
|
| 1019 |
+
async def api_cefr_set(level: str = Form(...)):
|
| 1020 |
+
"""Define manualmente o nível CEFR"""
|
| 1021 |
+
global current_cefr_level
|
| 1022 |
+
|
| 1023 |
+
if level not in CEFR_LEVELS:
|
| 1024 |
+
raise HTTPException(status_code=400, detail=f"Invalid level. Must be one of: {CEFR_LEVELS}")
|
| 1025 |
+
|
| 1026 |
+
old_level = current_cefr_level
|
| 1027 |
+
current_cefr_level = level
|
| 1028 |
+
|
| 1029 |
+
return {
|
| 1030 |
+
"status": "set",
|
| 1031 |
+
"old_level": old_level,
|
| 1032 |
+
"new_level": current_cefr_level,
|
| 1033 |
+
}
|
| 1034 |
+
|
| 1035 |
+
|
| 1036 |
+
@app.post("/chat")
|
| 1037 |
+
async def chat(
|
| 1038 |
+
message: str = Form(...),
|
| 1039 |
+
mode: str = Form("chat"),
|
| 1040 |
+
):
|
| 1041 |
+
"""Text-only chat"""
|
| 1042 |
+
touch_activity()
|
| 1043 |
+
|
| 1044 |
+
response, llm_ms = generate_response(message, mode)
|
| 1045 |
+
|
| 1046 |
+
return {
|
| 1047 |
+
"response": response,
|
| 1048 |
+
"provider": "tensordock",
|
| 1049 |
+
"model": LLM_MODEL,
|
| 1050 |
+
"inference_ms": llm_ms,
|
| 1051 |
+
}
|
| 1052 |
+
|
| 1053 |
+
|
| 1054 |
+
@app.post("/process-audio")
|
| 1055 |
+
async def process_audio(
|
| 1056 |
+
audio: UploadFile = File(...),
|
| 1057 |
+
mode: str = Form("chat"),
|
| 1058 |
+
):
|
| 1059 |
+
"""Full pipeline: Audio -> STT -> LLM -> TTS -> Audio"""
|
| 1060 |
+
touch_activity()
|
| 1061 |
+
|
| 1062 |
+
overall_start = time.time()
|
| 1063 |
+
|
| 1064 |
+
try:
|
| 1065 |
+
audio_data = await audio.read()
|
| 1066 |
+
|
| 1067 |
+
# 1. STT (retorna dict com métricas)
|
| 1068 |
+
stt_result = transcribe_audio(audio_data)
|
| 1069 |
+
transcript = stt_result["transcript"]
|
| 1070 |
+
stt_ms = stt_result["stt_ms"]
|
| 1071 |
+
|
| 1072 |
+
# 2. LLM
|
| 1073 |
+
response, llm_ms = generate_response(transcript, mode)
|
| 1074 |
+
|
| 1075 |
+
# 3. TTS
|
| 1076 |
+
audio_bytes, tts_ms = synthesize_audio(response)
|
| 1077 |
+
|
| 1078 |
+
total_ms = int((time.time() - overall_start) * 1000)
|
| 1079 |
+
|
| 1080 |
+
return JSONResponse({
|
| 1081 |
+
"transcript": transcript,
|
| 1082 |
+
"response": response,
|
| 1083 |
+
"audio": base64.b64encode(audio_bytes).decode('utf-8'),
|
| 1084 |
+
"timing": {
|
| 1085 |
+
"stt_ms": stt_ms,
|
| 1086 |
+
"llm_ms": llm_ms,
|
| 1087 |
+
"tts_ms": tts_ms,
|
| 1088 |
+
"total_ms": total_ms,
|
| 1089 |
+
},
|
| 1090 |
+
"model": LLM_MODEL,
|
| 1091 |
+
"speech_metrics": stt_result["speech_metrics"],
|
| 1092 |
+
})
|
| 1093 |
+
|
| 1094 |
+
except Exception as e:
|
| 1095 |
+
print(f"[ERROR] {e}")
|
| 1096 |
+
import traceback
|
| 1097 |
+
traceback.print_exc()
|
| 1098 |
+
raise HTTPException(status_code=500, detail=str(e))
|
| 1099 |
+
|
| 1100 |
+
|
| 1101 |
+
@app.post("/keep-alive")
|
| 1102 |
+
def keep_alive():
|
| 1103 |
+
"""Reset idle timer without doing inference"""
|
| 1104 |
+
touch_activity()
|
| 1105 |
+
return {"status": "ok", "message": "Timer reset"}
|
| 1106 |
+
|
| 1107 |
+
|
| 1108 |
+
@app.post("/clear")
|
| 1109 |
+
def clear_history():
|
| 1110 |
+
"""Clear conversation history and reset CEFR"""
|
| 1111 |
+
global conversation_history, current_cefr_level, user_message_count, user_message_buffer
|
| 1112 |
+
|
| 1113 |
+
conversation_history = []
|
| 1114 |
+
current_cefr_level = "B1" # Reset to default
|
| 1115 |
+
user_message_count = 0
|
| 1116 |
+
user_message_buffer = []
|
| 1117 |
+
|
| 1118 |
+
touch_activity()
|
| 1119 |
+
return {"status": "cleared", "cefr_reset": True, "cefr_level": current_cefr_level}
|
| 1120 |
+
|
| 1121 |
+
|
| 1122 |
+
# ============================================================================
|
| 1123 |
+
# FRONTEND-COMPATIBLE API ENDPOINTS
|
| 1124 |
+
# ============================================================================
|
| 1125 |
+
|
| 1126 |
+
@app.post("/api/audio")
|
| 1127 |
+
async def api_audio(request: AudioRequest):
|
| 1128 |
+
"""
|
| 1129 |
+
Frontend-compatible audio endpoint.
|
| 1130 |
+
Accepts JSON with base64 audio, returns response in expected format.
|
| 1131 |
+
Includes speech metrics and suggested speed for adaptive avatar.
|
| 1132 |
+
"""
|
| 1133 |
+
touch_activity()
|
| 1134 |
+
overall_start = time.time()
|
| 1135 |
+
|
| 1136 |
+
try:
|
| 1137 |
+
# Decode base64 audio
|
| 1138 |
+
audio_data = base64.b64decode(request.audio)
|
| 1139 |
+
print(f"[API] Received audio: {len(audio_data)} bytes, mode: {request.mode}, lang: {request.language}, student: {request.student_name}")
|
| 1140 |
+
|
| 1141 |
+
# 1. STT - Forçar idioma para evitar confusão (retorna dict com métricas)
|
| 1142 |
+
stt_result = transcribe_audio(audio_data, language=request.language)
|
| 1143 |
+
transcript = stt_result["transcript"]
|
| 1144 |
+
stt_ms = stt_result["stt_ms"]
|
| 1145 |
+
speech_metrics = stt_result["speech_metrics"]
|
| 1146 |
+
|
| 1147 |
+
# 2. Calcular velocidade sugerida baseada em CEFR + WPM do aluno
|
| 1148 |
+
student_wpm = speech_metrics.get("wpm", 0)
|
| 1149 |
+
suggested_speed = calculate_suggested_speed(
|
| 1150 |
+
current_cefr_level,
|
| 1151 |
+
student_wpm,
|
| 1152 |
+
manual_speed=request.speed_rate # None se não definido manualmente
|
| 1153 |
+
)
|
| 1154 |
+
|
| 1155 |
+
# 3. LLM - Passar parâmetros customizados se fornecidos pelo frontend
|
| 1156 |
+
response_text, llm_ms = generate_response(
|
| 1157 |
+
transcript,
|
| 1158 |
+
request.mode,
|
| 1159 |
+
student_name=request.student_name,
|
| 1160 |
+
custom_system_prompt=request.system_prompt,
|
| 1161 |
+
custom_max_tokens=request.max_tokens,
|
| 1162 |
+
custom_temperature=request.temperature
|
| 1163 |
+
)
|
| 1164 |
+
|
| 1165 |
+
# 4. TTS
|
| 1166 |
+
audio_bytes, tts_ms = synthesize_audio(response_text)
|
| 1167 |
+
audio_duration = len(audio_bytes) / (24000 * 2) # Approximate duration
|
| 1168 |
+
|
| 1169 |
+
total_ms = int((time.time() - overall_start) * 1000)
|
| 1170 |
+
|
| 1171 |
+
# Return in frontend-expected format with speech metrics and suggested speed
|
| 1172 |
+
return JSONResponse({
|
| 1173 |
+
"transcription": {
|
| 1174 |
+
"text": transcript,
|
| 1175 |
+
"language": request.language,
|
| 1176 |
+
"confidence": 1.0,
|
| 1177 |
+
},
|
| 1178 |
+
"response": {
|
| 1179 |
+
"text": response_text,
|
| 1180 |
+
"emotion": "neutral",
|
| 1181 |
+
"language": request.language,
|
| 1182 |
+
},
|
| 1183 |
+
"speech": {
|
| 1184 |
+
"audio": base64.b64encode(audio_bytes).decode('utf-8'),
|
| 1185 |
+
"visemes": [], # Visemes not implemented yet
|
| 1186 |
+
"duration": audio_duration,
|
| 1187 |
+
"sample_rate": 24000,
|
| 1188 |
+
"format": "wav",
|
| 1189 |
+
},
|
| 1190 |
+
"timing": {
|
| 1191 |
+
"stt_ms": stt_ms,
|
| 1192 |
+
"llm_ms": llm_ms,
|
| 1193 |
+
"tts_ms": tts_ms,
|
| 1194 |
+
"total_ms": total_ms,
|
| 1195 |
+
},
|
| 1196 |
+
"cefr": {
|
| 1197 |
+
"current_level": current_cefr_level,
|
| 1198 |
+
"messages_until_classify": CEFR_CLASSIFY_EVERY - user_message_count,
|
| 1199 |
+
},
|
| 1200 |
+
# Novas métricas para velocidade adaptativa
|
| 1201 |
+
"speech_metrics": speech_metrics,
|
| 1202 |
+
"adaptive_speed": {
|
| 1203 |
+
"suggested_speed": suggested_speed,
|
| 1204 |
+
"student_wpm": student_wpm,
|
| 1205 |
+
"speed_mode": "manual" if request.speed_rate else "auto",
|
| 1206 |
+
},
|
| 1207 |
+
})
|
| 1208 |
+
|
| 1209 |
+
except Exception as e:
|
| 1210 |
+
print(f"[API ERROR] {e}")
|
| 1211 |
+
import traceback
|
| 1212 |
+
traceback.print_exc()
|
| 1213 |
+
raise HTTPException(status_code=500, detail=str(e))
|
| 1214 |
+
|
| 1215 |
+
|
| 1216 |
+
@app.post("/api/text")
|
| 1217 |
+
async def api_text(request: TextRequest):
|
| 1218 |
+
"""
|
| 1219 |
+
Frontend-compatible text endpoint.
|
| 1220 |
+
Accepts JSON with text, returns LLM response with TTS audio.
|
| 1221 |
+
"""
|
| 1222 |
+
touch_activity()
|
| 1223 |
+
overall_start = time.time()
|
| 1224 |
+
|
| 1225 |
+
try:
|
| 1226 |
+
print(f"[API] Received text: '{request.text[:50]}...', mode: {request.mode}, student: {request.student_name}")
|
| 1227 |
+
|
| 1228 |
+
# 1. LLM - Passar nome do aluno e parâmetros customizados do frontend
|
| 1229 |
+
response_text, llm_ms = generate_response(
|
| 1230 |
+
request.text,
|
| 1231 |
+
request.mode,
|
| 1232 |
+
student_name=request.student_name,
|
| 1233 |
+
custom_system_prompt=request.system_prompt,
|
| 1234 |
+
custom_max_tokens=request.max_tokens,
|
| 1235 |
+
custom_temperature=request.temperature
|
| 1236 |
+
)
|
| 1237 |
+
|
| 1238 |
+
# 2. TTS
|
| 1239 |
+
audio_bytes, tts_ms = synthesize_audio(response_text)
|
| 1240 |
+
audio_duration = len(audio_bytes) / (24000 * 2) # Approximate duration
|
| 1241 |
+
|
| 1242 |
+
total_ms = int((time.time() - overall_start) * 1000)
|
| 1243 |
+
|
| 1244 |
+
# Return in frontend-expected format
|
| 1245 |
+
return JSONResponse({
|
| 1246 |
+
"response": {
|
| 1247 |
+
"text": response_text,
|
| 1248 |
+
"emotion": "neutral",
|
| 1249 |
+
"language": request.language,
|
| 1250 |
+
},
|
| 1251 |
+
"speech": {
|
| 1252 |
+
"audio": base64.b64encode(audio_bytes).decode('utf-8'),
|
| 1253 |
+
"visemes": [], # Visemes not implemented yet
|
| 1254 |
+
"duration": audio_duration,
|
| 1255 |
+
"sample_rate": 24000,
|
| 1256 |
+
"format": "wav",
|
| 1257 |
+
},
|
| 1258 |
+
"timing": {
|
| 1259 |
+
"llm_ms": llm_ms,
|
| 1260 |
+
"tts_ms": tts_ms,
|
| 1261 |
+
"total_ms": total_ms,
|
| 1262 |
+
},
|
| 1263 |
+
"cefr": {
|
| 1264 |
+
"current_level": current_cefr_level,
|
| 1265 |
+
"messages_until_classify": CEFR_CLASSIFY_EVERY - user_message_count,
|
| 1266 |
+
},
|
| 1267 |
+
})
|
| 1268 |
+
|
| 1269 |
+
except Exception as e:
|
| 1270 |
+
print(f"[API ERROR] {e}")
|
| 1271 |
+
import traceback
|
| 1272 |
+
traceback.print_exc()
|
| 1273 |
+
raise HTTPException(status_code=500, detail=str(e))
|
| 1274 |
+
|
| 1275 |
+
|
| 1276 |
+
@app.post("/api/reset")
|
| 1277 |
+
async def api_reset():
|
| 1278 |
+
"""Reset conversation history and CEFR - frontend compatible"""
|
| 1279 |
+
global conversation_history, current_cefr_level, user_message_count, user_message_buffer
|
| 1280 |
+
|
| 1281 |
+
conversation_history = []
|
| 1282 |
+
current_cefr_level = "B1"
|
| 1283 |
+
user_message_count = 0
|
| 1284 |
+
user_message_buffer = []
|
| 1285 |
+
|
| 1286 |
+
touch_activity()
|
| 1287 |
+
return {"status": "ok", "cefr_level": current_cefr_level}
|
| 1288 |
+
|
| 1289 |
+
|
| 1290 |
+
# ============================================================================
|
| 1291 |
+
# WEBSOCKET ENDPOINT - Streaming Audio
|
| 1292 |
+
# ============================================================================
|
| 1293 |
+
|
| 1294 |
+
@app.websocket("/ws/stream")
|
| 1295 |
+
async def websocket_stream(websocket: WebSocket):
|
| 1296 |
+
"""
|
| 1297 |
+
WebSocket para streaming de áudio bidirecional.
|
| 1298 |
+
|
| 1299 |
+
Protocolo:
|
| 1300 |
+
1. Cliente envia áudio (binary) ou JSON com config
|
| 1301 |
+
2. Servidor envia chunks de áudio de resposta (binary)
|
| 1302 |
+
3. Servidor envia métricas no final (JSON)
|
| 1303 |
+
|
| 1304 |
+
Exemplo JavaScript:
|
| 1305 |
+
```javascript
|
| 1306 |
+
const ws = new WebSocket('ws://host:8000/ws/stream');
|
| 1307 |
+
|
| 1308 |
+
ws.onmessage = (event) => {
|
| 1309 |
+
if (event.data instanceof Blob) {
|
| 1310 |
+
// Chunk de áudio WAV - tocar imediatamente
|
| 1311 |
+
playAudioChunk(event.data);
|
| 1312 |
+
} else {
|
| 1313 |
+
// JSON com métricas ou status
|
| 1314 |
+
const data = JSON.parse(event.data);
|
| 1315 |
+
console.log('Metrics:', data);
|
| 1316 |
+
}
|
| 1317 |
+
};
|
| 1318 |
+
|
| 1319 |
+
// Enviar áudio gravado
|
| 1320 |
+
ws.send(audioBlob);
|
| 1321 |
+
```
|
| 1322 |
+
"""
|
| 1323 |
+
await websocket.accept()
|
| 1324 |
+
print("[WS] Client connected")
|
| 1325 |
+
|
| 1326 |
+
try:
|
| 1327 |
+
while True:
|
| 1328 |
+
touch_activity()
|
| 1329 |
+
|
| 1330 |
+
# Receber dados do cliente
|
| 1331 |
+
data = await websocket.receive()
|
| 1332 |
+
|
| 1333 |
+
if "bytes" in data:
|
| 1334 |
+
# Áudio binary
|
| 1335 |
+
audio_data = data["bytes"]
|
| 1336 |
+
overall_start = time.time()
|
| 1337 |
+
|
| 1338 |
+
# Enviar status de processamento
|
| 1339 |
+
await websocket.send_json({"status": "processing", "stage": "stt"})
|
| 1340 |
+
|
| 1341 |
+
# 1. STT
|
| 1342 |
+
transcript, stt_ms = transcribe_audio(audio_data)
|
| 1343 |
+
await websocket.send_json({
|
| 1344 |
+
"status": "processing",
|
| 1345 |
+
"stage": "llm",
|
| 1346 |
+
"transcript": transcript,
|
| 1347 |
+
"stt_ms": stt_ms
|
| 1348 |
+
})
|
| 1349 |
+
|
| 1350 |
+
# 2. LLM
|
| 1351 |
+
response, llm_ms = generate_response(transcript)
|
| 1352 |
+
await websocket.send_json({
|
| 1353 |
+
"status": "processing",
|
| 1354 |
+
"stage": "tts",
|
| 1355 |
+
"response": response,
|
| 1356 |
+
"llm_ms": llm_ms
|
| 1357 |
+
})
|
| 1358 |
+
|
| 1359 |
+
# 3. TTS - enviar áudio
|
| 1360 |
+
tts_start = time.time()
|
| 1361 |
+
|
| 1362 |
+
import numpy as np
|
| 1363 |
+
import soundfile as sf
|
| 1364 |
+
|
| 1365 |
+
audio_chunks = []
|
| 1366 |
+
chunk_count = 0
|
| 1367 |
+
|
| 1368 |
+
# Remove emojis antes do TTS
|
| 1369 |
+
clean_response = remove_emojis(response)
|
| 1370 |
+
for gs, ps, audio_chunk in kokoro_pipeline(clean_response, voice='af_bella'):
|
| 1371 |
+
if audio_chunk is not None and len(audio_chunk) > 0:
|
| 1372 |
+
audio_chunks.append(audio_chunk)
|
| 1373 |
+
chunk_count += 1
|
| 1374 |
+
|
| 1375 |
+
# Enviar cada chunk como WAV
|
| 1376 |
+
buffer = io.BytesIO()
|
| 1377 |
+
sf.write(buffer, audio_chunk, 24000, format='WAV')
|
| 1378 |
+
await websocket.send_bytes(buffer.getvalue())
|
| 1379 |
+
|
| 1380 |
+
tts_ms = int((time.time() - tts_start) * 1000)
|
| 1381 |
+
total_ms = int((time.time() - overall_start) * 1000)
|
| 1382 |
+
|
| 1383 |
+
# Enviar métricas finais
|
| 1384 |
+
await websocket.send_json({
|
| 1385 |
+
"status": "complete",
|
| 1386 |
+
"transcript": transcript,
|
| 1387 |
+
"response": response,
|
| 1388 |
+
"timing": {
|
| 1389 |
+
"stt_ms": stt_ms,
|
| 1390 |
+
"llm_ms": llm_ms,
|
| 1391 |
+
"tts_ms": tts_ms,
|
| 1392 |
+
"total_ms": total_ms,
|
| 1393 |
+
},
|
| 1394 |
+
"chunks_sent": chunk_count,
|
| 1395 |
+
"model": LLM_MODEL,
|
| 1396 |
+
})
|
| 1397 |
+
|
| 1398 |
+
print(f"[WS] Complete: STT={stt_ms}ms, LLM={llm_ms}ms, TTS={tts_ms}ms, Total={total_ms}ms")
|
| 1399 |
+
|
| 1400 |
+
elif "text" in data:
|
| 1401 |
+
# JSON text (config ou texto para TTS)
|
| 1402 |
+
try:
|
| 1403 |
+
msg = json.loads(data["text"])
|
| 1404 |
+
|
| 1405 |
+
if msg.get("type") == "ping":
|
| 1406 |
+
await websocket.send_json({"type": "pong"})
|
| 1407 |
+
|
| 1408 |
+
elif msg.get("type") == "text":
|
| 1409 |
+
# Chat de texto com TTS
|
| 1410 |
+
text = msg.get("message", "")
|
| 1411 |
+
mode = msg.get("mode", "chat")
|
| 1412 |
+
|
| 1413 |
+
overall_start = time.time()
|
| 1414 |
+
|
| 1415 |
+
# LLM
|
| 1416 |
+
response, llm_ms = generate_response(text, mode)
|
| 1417 |
+
|
| 1418 |
+
# TTS streaming
|
| 1419 |
+
tts_start = time.time()
|
| 1420 |
+
chunk_count = 0
|
| 1421 |
+
|
| 1422 |
+
import numpy as np
|
| 1423 |
+
import soundfile as sf
|
| 1424 |
+
|
| 1425 |
+
# Remove emojis antes do TTS
|
| 1426 |
+
clean_response = remove_emojis(response)
|
| 1427 |
+
for gs, ps, audio_chunk in kokoro_pipeline(clean_response, voice='af_bella'):
|
| 1428 |
+
if audio_chunk is not None and len(audio_chunk) > 0:
|
| 1429 |
+
chunk_count += 1
|
| 1430 |
+
buffer = io.BytesIO()
|
| 1431 |
+
sf.write(buffer, audio_chunk, 24000, format='WAV')
|
| 1432 |
+
await websocket.send_bytes(buffer.getvalue())
|
| 1433 |
+
|
| 1434 |
+
tts_ms = int((time.time() - tts_start) * 1000)
|
| 1435 |
+
total_ms = int((time.time() - overall_start) * 1000)
|
| 1436 |
+
|
| 1437 |
+
await websocket.send_json({
|
| 1438 |
+
"status": "complete",
|
| 1439 |
+
"response": response,
|
| 1440 |
+
"timing": {
|
| 1441 |
+
"llm_ms": llm_ms,
|
| 1442 |
+
"tts_ms": tts_ms,
|
| 1443 |
+
"total_ms": total_ms,
|
| 1444 |
+
},
|
| 1445 |
+
"chunks_sent": chunk_count,
|
| 1446 |
+
})
|
| 1447 |
+
|
| 1448 |
+
elif msg.get("type") == "tts":
|
| 1449 |
+
# TTS apenas (sem LLM)
|
| 1450 |
+
text = msg.get("text", "")
|
| 1451 |
+
|
| 1452 |
+
tts_start = time.time()
|
| 1453 |
+
chunk_count = 0
|
| 1454 |
+
|
| 1455 |
+
import numpy as np
|
| 1456 |
+
import soundfile as sf
|
| 1457 |
+
|
| 1458 |
+
# Remove emojis antes do TTS
|
| 1459 |
+
clean_text = remove_emojis(text)
|
| 1460 |
+
for gs, ps, audio_chunk in kokoro_pipeline(clean_text, voice='af_bella'):
|
| 1461 |
+
if audio_chunk is not None and len(audio_chunk) > 0:
|
| 1462 |
+
chunk_count += 1
|
| 1463 |
+
buffer = io.BytesIO()
|
| 1464 |
+
sf.write(buffer, audio_chunk, 24000, format='WAV')
|
| 1465 |
+
await websocket.send_bytes(buffer.getvalue())
|
| 1466 |
+
|
| 1467 |
+
tts_ms = int((time.time() - tts_start) * 1000)
|
| 1468 |
+
|
| 1469 |
+
await websocket.send_json({
|
| 1470 |
+
"status": "complete",
|
| 1471 |
+
"timing": {"tts_ms": tts_ms},
|
| 1472 |
+
"chunks_sent": chunk_count,
|
| 1473 |
+
})
|
| 1474 |
+
|
| 1475 |
+
except json.JSONDecodeError:
|
| 1476 |
+
await websocket.send_json({"error": "Invalid JSON"})
|
| 1477 |
+
|
| 1478 |
+
except WebSocketDisconnect:
|
| 1479 |
+
print("[WS] Client disconnected")
|
| 1480 |
+
except Exception as e:
|
| 1481 |
+
print(f"[WS] Error: {e}")
|
| 1482 |
+
import traceback
|
| 1483 |
+
traceback.print_exc()
|
| 1484 |
+
try:
|
| 1485 |
+
await websocket.send_json({"error": str(e)})
|
| 1486 |
+
except:
|
| 1487 |
+
pass
|
| 1488 |
+
finally:
|
| 1489 |
+
try:
|
| 1490 |
+
await websocket.close()
|
| 1491 |
+
except:
|
| 1492 |
+
pass
|
| 1493 |
+
|
| 1494 |
+
|
| 1495 |
+
if __name__ == "__main__":
|
| 1496 |
+
import uvicorn
|
| 1497 |
+
uvicorn.run(app, host="0.0.0.0", port=8000)
|
cefr/.gitkeep
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CEFR Classifier Model
|
| 2 |
+
# HuggingFace: marcosremar2/cefr-classifier-pt-mdeberta-v3-enem
|
checkpoint.sh
ADDED
|
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Create checkpoint of PARLE backend with all models loaded
|
| 3 |
+
# Requires: patched CRIU (criu-patched), io_uring disabled
|
| 4 |
+
#
|
| 5 |
+
# Usage: ./checkpoint.sh [--stop]
|
| 6 |
+
# --stop: Stop the process after checkpoint (default: keep running)
|
| 7 |
+
|
| 8 |
+
set -e
|
| 9 |
+
|
| 10 |
+
CHECKPOINT_DIR="/var/lib/parle-checkpoints"
|
| 11 |
+
CHECKPOINT_NAME="parle-$(date +%Y%m%d-%H%M%S)"
|
| 12 |
+
CHECKPOINT_PATH="$CHECKPOINT_DIR/$CHECKPOINT_NAME"
|
| 13 |
+
LATEST_LINK="$CHECKPOINT_DIR/latest"
|
| 14 |
+
LEAVE_RUNNING="--leave-running"
|
| 15 |
+
|
| 16 |
+
# Parse arguments
|
| 17 |
+
if [ "$1" = "--stop" ]; then
|
| 18 |
+
LEAVE_RUNNING=""
|
| 19 |
+
echo "Will STOP process after checkpoint"
|
| 20 |
+
fi
|
| 21 |
+
|
| 22 |
+
echo "=============================================="
|
| 23 |
+
echo "PARLE Backend Checkpoint"
|
| 24 |
+
echo "=============================================="
|
| 25 |
+
|
| 26 |
+
# Find Python process
|
| 27 |
+
PYTHON_PID=$(pgrep -f "python.*app.py" | head -1)
|
| 28 |
+
if [ -z "$PYTHON_PID" ]; then
|
| 29 |
+
echo "ERROR: No Python backend process found"
|
| 30 |
+
echo "Start the backend first with: ./start.sh"
|
| 31 |
+
exit 1
|
| 32 |
+
fi
|
| 33 |
+
echo "Found backend process: PID $PYTHON_PID"
|
| 34 |
+
|
| 35 |
+
# Check health
|
| 36 |
+
echo ""
|
| 37 |
+
echo "[1/3] Checking backend health..."
|
| 38 |
+
HEALTH=$(curl -s --max-time 5 localhost:8000/health 2>/dev/null)
|
| 39 |
+
if [ -z "$HEALTH" ]; then
|
| 40 |
+
echo "ERROR: Backend not responding to health check"
|
| 41 |
+
exit 1
|
| 42 |
+
fi
|
| 43 |
+
|
| 44 |
+
VLLM=$(echo "$HEALTH" | grep -o '"vllm_loaded":true' || true)
|
| 45 |
+
WHISPER=$(echo "$HEALTH" | grep -o '"whisper_loaded":true' || true)
|
| 46 |
+
KOKORO=$(echo "$HEALTH" | grep -o '"kokoro_loaded":true' || true)
|
| 47 |
+
|
| 48 |
+
if [ -z "$VLLM" ] || [ -z "$WHISPER" ] || [ -z "$KOKORO" ]; then
|
| 49 |
+
echo "ERROR: Not all models are loaded yet"
|
| 50 |
+
echo "Wait for all models to load before checkpointing"
|
| 51 |
+
echo "Health: $HEALTH"
|
| 52 |
+
exit 1
|
| 53 |
+
fi
|
| 54 |
+
echo "All models loaded!"
|
| 55 |
+
|
| 56 |
+
# Check io_uring is disabled
|
| 57 |
+
echo ""
|
| 58 |
+
echo "[2/3] Checking system configuration..."
|
| 59 |
+
IO_URING=$(cat /proc/sys/kernel/io_uring_disabled 2>/dev/null || echo "unknown")
|
| 60 |
+
if [ "$IO_URING" != "2" ]; then
|
| 61 |
+
echo "WARNING: io_uring not disabled (value: $IO_URING)"
|
| 62 |
+
echo "Run: sudo sysctl -w kernel.io_uring_disabled=2"
|
| 63 |
+
echo "Continuing anyway..."
|
| 64 |
+
fi
|
| 65 |
+
|
| 66 |
+
# Check CRIU
|
| 67 |
+
if [ ! -f /usr/local/bin/criu-patched ]; then
|
| 68 |
+
echo "ERROR: Patched CRIU not found at /usr/local/bin/criu-patched"
|
| 69 |
+
echo "Run setup-criu-patched.sh first"
|
| 70 |
+
exit 1
|
| 71 |
+
fi
|
| 72 |
+
echo "Patched CRIU found"
|
| 73 |
+
|
| 74 |
+
# Create checkpoint directory
|
| 75 |
+
mkdir -p "$CHECKPOINT_PATH"
|
| 76 |
+
|
| 77 |
+
echo ""
|
| 78 |
+
echo "[3/3] Creating checkpoint..."
|
| 79 |
+
echo "Path: $CHECKPOINT_PATH"
|
| 80 |
+
echo "This may take 30-60 seconds..."
|
| 81 |
+
echo ""
|
| 82 |
+
|
| 83 |
+
START_TIME=$(date +%s)
|
| 84 |
+
|
| 85 |
+
# Run CRIU checkpoint
|
| 86 |
+
CRIU_PLUGINS_DIR=/usr/lib/criu /usr/local/bin/criu-patched dump \
|
| 87 |
+
-t $PYTHON_PID \
|
| 88 |
+
-D "$CHECKPOINT_PATH" \
|
| 89 |
+
--shell-job \
|
| 90 |
+
--tcp-established \
|
| 91 |
+
--file-locks \
|
| 92 |
+
--ext-unix-sk \
|
| 93 |
+
$LEAVE_RUNNING \
|
| 94 |
+
-v2 \
|
| 95 |
+
-o "$CHECKPOINT_PATH/dump.log" 2>&1 || {
|
| 96 |
+
echo ""
|
| 97 |
+
echo "ERROR: CRIU dump failed"
|
| 98 |
+
echo "Check log: $CHECKPOINT_PATH/dump.log"
|
| 99 |
+
tail -20 "$CHECKPOINT_PATH/dump.log"
|
| 100 |
+
exit 1
|
| 101 |
+
}
|
| 102 |
+
|
| 103 |
+
END_TIME=$(date +%s)
|
| 104 |
+
DURATION=$((END_TIME - START_TIME))
|
| 105 |
+
|
| 106 |
+
# Update latest symlink
|
| 107 |
+
rm -f "$LATEST_LINK"
|
| 108 |
+
ln -s "$CHECKPOINT_PATH" "$LATEST_LINK"
|
| 109 |
+
|
| 110 |
+
# Get checkpoint size
|
| 111 |
+
SIZE=$(du -sh "$CHECKPOINT_PATH" | cut -f1)
|
| 112 |
+
|
| 113 |
+
echo ""
|
| 114 |
+
echo "=============================================="
|
| 115 |
+
echo "Checkpoint created successfully!"
|
| 116 |
+
echo "=============================================="
|
| 117 |
+
echo "Path: $CHECKPOINT_PATH"
|
| 118 |
+
echo "Size: $SIZE"
|
| 119 |
+
echo "Time: ${DURATION}s"
|
| 120 |
+
echo "Symlink: $LATEST_LINK"
|
| 121 |
+
echo ""
|
| 122 |
+
if [ -z "$LEAVE_RUNNING" ]; then
|
| 123 |
+
echo "Process was STOPPED. To restore: ./restore.sh"
|
| 124 |
+
else
|
| 125 |
+
echo "Process is still running. To restore later: ./restore.sh"
|
| 126 |
+
fi
|
deploy-criu.sh
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Deploy CRIU + cuda-checkpoint to TensorDock
|
| 3 |
+
# This script copies the necessary files to the server and sets up CRIU
|
| 4 |
+
|
| 5 |
+
set -e
|
| 6 |
+
|
| 7 |
+
# Configuration
|
| 8 |
+
SERVER="8.17.147.158"
|
| 9 |
+
SSH_PORT="10038"
|
| 10 |
+
SSH_USER="root"
|
| 11 |
+
REMOTE_DIR="/home/user/parle-backend"
|
| 12 |
+
|
| 13 |
+
echo "=================================================="
|
| 14 |
+
echo "Deploying CRIU + cuda-checkpoint to TensorDock"
|
| 15 |
+
echo "=================================================="
|
| 16 |
+
echo "Server: $SERVER:$SSH_PORT"
|
| 17 |
+
echo ""
|
| 18 |
+
|
| 19 |
+
# Check if we can connect
|
| 20 |
+
echo "[1/4] Testing SSH connection..."
|
| 21 |
+
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=10 -p $SSH_PORT $SSH_USER@$SERVER "echo 'SSH connection OK'" || {
|
| 22 |
+
echo "ERROR: Cannot connect to server via SSH"
|
| 23 |
+
echo ""
|
| 24 |
+
echo "Manual steps:"
|
| 25 |
+
echo "1. SSH into the server: ssh -p $SSH_PORT $SSH_USER@$SERVER"
|
| 26 |
+
echo "2. Copy these scripts to /home/user/parle-backend/"
|
| 27 |
+
echo "3. Run: sudo ./setup-criu.sh"
|
| 28 |
+
echo "4. Test: ./start-smart.sh"
|
| 29 |
+
exit 1
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
# Copy scripts
|
| 33 |
+
echo ""
|
| 34 |
+
echo "[2/4] Copying scripts to server..."
|
| 35 |
+
SCRIPT_DIR="$(dirname "$0")"
|
| 36 |
+
scp -P $SSH_PORT \
|
| 37 |
+
"$SCRIPT_DIR/setup-criu.sh" \
|
| 38 |
+
"$SCRIPT_DIR/checkpoint.sh" \
|
| 39 |
+
"$SCRIPT_DIR/restore.sh" \
|
| 40 |
+
"$SCRIPT_DIR/start-smart.sh" \
|
| 41 |
+
"$SCRIPT_DIR/start.sh" \
|
| 42 |
+
"$SCRIPT_DIR/app.py" \
|
| 43 |
+
$SSH_USER@$SERVER:$REMOTE_DIR/
|
| 44 |
+
|
| 45 |
+
echo "Scripts copied successfully"
|
| 46 |
+
|
| 47 |
+
# Run setup
|
| 48 |
+
echo ""
|
| 49 |
+
echo "[3/4] Running CRIU setup on server..."
|
| 50 |
+
ssh -p $SSH_PORT $SSH_USER@$SERVER "cd $REMOTE_DIR && chmod +x *.sh && sudo ./setup-criu.sh"
|
| 51 |
+
|
| 52 |
+
# Test
|
| 53 |
+
echo ""
|
| 54 |
+
echo "[4/4] Testing installation..."
|
| 55 |
+
ssh -p $SSH_PORT $SSH_USER@$SERVER "cuda-checkpoint --help > /dev/null && echo 'cuda-checkpoint: OK' || echo 'cuda-checkpoint: FAILED'"
|
| 56 |
+
ssh -p $SSH_PORT $SSH_USER@$SERVER "criu --version"
|
| 57 |
+
|
| 58 |
+
echo ""
|
| 59 |
+
echo "=================================================="
|
| 60 |
+
echo "Deployment complete!"
|
| 61 |
+
echo "=================================================="
|
| 62 |
+
echo ""
|
| 63 |
+
echo "Next steps:"
|
| 64 |
+
echo "1. SSH into server: ssh -p $SSH_PORT $SSH_USER@$SERVER"
|
| 65 |
+
echo "2. Start backend: cd $REMOTE_DIR && ./start.sh"
|
| 66 |
+
echo "3. Wait for models to load (~2 min)"
|
| 67 |
+
echo "4. Create checkpoint: ./checkpoint.sh"
|
| 68 |
+
echo "5. Test restore: ./restore.sh"
|
| 69 |
+
echo ""
|
fast_startup.py
ADDED
|
@@ -0,0 +1,409 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Fast Startup Module - Otimizacoes para Cold Start Rapido
|
| 3 |
+
|
| 4 |
+
Estrategias implementadas:
|
| 5 |
+
1. fastsafetensors - Loading 4.8x-7.5x mais rapido
|
| 6 |
+
2. CUDA Graph caching - Economiza ~54s
|
| 7 |
+
3. Parallel model loading - Carrega modelos simultaneamente
|
| 8 |
+
4. Lazy loading - CEFR classifier carrega depois
|
| 9 |
+
5. Pre-download models - Cache local no SSD
|
| 10 |
+
|
| 11 |
+
Target: Cold start de ~487s para ~60s
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
import os
|
| 15 |
+
import sys
|
| 16 |
+
import time
|
| 17 |
+
import asyncio
|
| 18 |
+
import threading
|
| 19 |
+
from concurrent.futures import ThreadPoolExecutor
|
| 20 |
+
from typing import Optional, Callable
|
| 21 |
+
from dataclasses import dataclass
|
| 22 |
+
|
| 23 |
+
# Environment variables para otimizacao
|
| 24 |
+
os.environ["USE_FASTSAFETENSOR"] = "true" # Enable fastsafetensors
|
| 25 |
+
os.environ["VLLM_USE_MODELSCOPE"] = "false"
|
| 26 |
+
os.environ["TOKENIZERS_PARALLELISM"] = "false" # Avoid warnings
|
| 27 |
+
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" # Better memory
|
| 28 |
+
|
| 29 |
+
# Cache directories
|
| 30 |
+
CACHE_DIR = "/var/cache/parle-models"
|
| 31 |
+
VLLM_CACHE_DIR = f"{CACHE_DIR}/vllm"
|
| 32 |
+
HF_CACHE_DIR = f"{CACHE_DIR}/huggingface"
|
| 33 |
+
|
| 34 |
+
os.environ["HF_HOME"] = HF_CACHE_DIR
|
| 35 |
+
os.environ["VLLM_CACHE_DIR"] = VLLM_CACHE_DIR
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
@dataclass
|
| 39 |
+
class LoadingMetrics:
|
| 40 |
+
"""Metricas de carregamento"""
|
| 41 |
+
vllm_ms: int = 0
|
| 42 |
+
whisper_ms: int = 0
|
| 43 |
+
cefr_ms: int = 0
|
| 44 |
+
kokoro_ms: int = 0
|
| 45 |
+
total_ms: int = 0
|
| 46 |
+
parallel: bool = False
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
class FastModelLoader:
|
| 50 |
+
"""
|
| 51 |
+
Carregador otimizado de modelos com:
|
| 52 |
+
- Parallel loading
|
| 53 |
+
- Progress callbacks
|
| 54 |
+
- Lazy loading para modelos secundarios
|
| 55 |
+
"""
|
| 56 |
+
|
| 57 |
+
def __init__(
|
| 58 |
+
self,
|
| 59 |
+
vllm_model: str = "RedHatAI/gemma-3-4b-it-quantized.w4a16",
|
| 60 |
+
whisper_model: str = "openai/whisper-small",
|
| 61 |
+
cefr_model: str = "marcosremar2/cefr-classifier-pt-mdeberta-v3-enem",
|
| 62 |
+
gpu_memory_utilization: float = 0.40,
|
| 63 |
+
on_progress: Optional[Callable[[str, float], None]] = None,
|
| 64 |
+
):
|
| 65 |
+
self.vllm_model = vllm_model
|
| 66 |
+
self.whisper_model = whisper_model
|
| 67 |
+
self.cefr_model = cefr_model
|
| 68 |
+
self.gpu_memory_utilization = gpu_memory_utilization
|
| 69 |
+
self.on_progress = on_progress
|
| 70 |
+
|
| 71 |
+
# Model instances
|
| 72 |
+
self.vllm_engine = None
|
| 73 |
+
self.whisper_model_instance = None
|
| 74 |
+
self.whisper_processor = None
|
| 75 |
+
self.cefr_model_instance = None
|
| 76 |
+
self.cefr_tokenizer = None
|
| 77 |
+
self.kokoro_pipeline = None
|
| 78 |
+
|
| 79 |
+
# Loading state
|
| 80 |
+
self.metrics = LoadingMetrics()
|
| 81 |
+
self._loading_lock = threading.Lock()
|
| 82 |
+
|
| 83 |
+
def _progress(self, message: str, percentage: float):
|
| 84 |
+
"""Report progress"""
|
| 85 |
+
print(f"[{percentage:.0f}%] {message}")
|
| 86 |
+
if self.on_progress:
|
| 87 |
+
self.on_progress(message, percentage)
|
| 88 |
+
|
| 89 |
+
def _ensure_cache_dirs(self):
|
| 90 |
+
"""Criar diretorios de cache"""
|
| 91 |
+
os.makedirs(CACHE_DIR, exist_ok=True)
|
| 92 |
+
os.makedirs(VLLM_CACHE_DIR, exist_ok=True)
|
| 93 |
+
os.makedirs(HF_CACHE_DIR, exist_ok=True)
|
| 94 |
+
|
| 95 |
+
def load_vllm(self) -> int:
|
| 96 |
+
"""
|
| 97 |
+
Carrega vLLM com otimizacoes:
|
| 98 |
+
- fastsafetensors (se disponivel)
|
| 99 |
+
- load_format="auto" (detecta melhor formato)
|
| 100 |
+
- CUDA graph caching
|
| 101 |
+
"""
|
| 102 |
+
start = time.time()
|
| 103 |
+
self._progress("Loading vLLM (optimized)...", 10)
|
| 104 |
+
|
| 105 |
+
from vllm import LLM
|
| 106 |
+
|
| 107 |
+
# Check if fastsafetensors is available
|
| 108 |
+
try:
|
| 109 |
+
import fastsafetensors
|
| 110 |
+
load_format = "fastsafetensors"
|
| 111 |
+
self._progress("Using fastsafetensors (4-7x faster)", 12)
|
| 112 |
+
except ImportError:
|
| 113 |
+
load_format = "auto"
|
| 114 |
+
self._progress("fastsafetensors not found, using auto", 12)
|
| 115 |
+
|
| 116 |
+
self.vllm_engine = LLM(
|
| 117 |
+
model=self.vllm_model,
|
| 118 |
+
dtype="auto",
|
| 119 |
+
gpu_memory_utilization=self.gpu_memory_utilization,
|
| 120 |
+
max_model_len=2048,
|
| 121 |
+
trust_remote_code=True,
|
| 122 |
+
# Otimizacoes de loading
|
| 123 |
+
load_format=load_format,
|
| 124 |
+
# CUDA graph optimization
|
| 125 |
+
enforce_eager=False, # Enable CUDA graphs
|
| 126 |
+
# Disable unnecessary features for faster startup
|
| 127 |
+
enable_prefix_caching=False,
|
| 128 |
+
disable_custom_all_reduce=True,
|
| 129 |
+
)
|
| 130 |
+
|
| 131 |
+
elapsed_ms = int((time.time() - start) * 1000)
|
| 132 |
+
self.metrics.vllm_ms = elapsed_ms
|
| 133 |
+
self._progress(f"vLLM loaded in {elapsed_ms/1000:.1f}s", 40)
|
| 134 |
+
|
| 135 |
+
return elapsed_ms
|
| 136 |
+
|
| 137 |
+
def load_whisper(self) -> int:
|
| 138 |
+
"""Carrega Whisper STT"""
|
| 139 |
+
start = time.time()
|
| 140 |
+
self._progress("Loading Whisper STT...", 45)
|
| 141 |
+
|
| 142 |
+
import torch
|
| 143 |
+
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
|
| 144 |
+
|
| 145 |
+
self.whisper_processor = AutoProcessor.from_pretrained(
|
| 146 |
+
self.whisper_model,
|
| 147 |
+
cache_dir=HF_CACHE_DIR,
|
| 148 |
+
)
|
| 149 |
+
|
| 150 |
+
self.whisper_model_instance = AutoModelForSpeechSeq2Seq.from_pretrained(
|
| 151 |
+
self.whisper_model,
|
| 152 |
+
torch_dtype=torch.float16,
|
| 153 |
+
low_cpu_mem_usage=True,
|
| 154 |
+
cache_dir=HF_CACHE_DIR,
|
| 155 |
+
).to("cuda")
|
| 156 |
+
|
| 157 |
+
elapsed_ms = int((time.time() - start) * 1000)
|
| 158 |
+
self.metrics.whisper_ms = elapsed_ms
|
| 159 |
+
self._progress(f"Whisper loaded in {elapsed_ms/1000:.1f}s", 60)
|
| 160 |
+
|
| 161 |
+
return elapsed_ms
|
| 162 |
+
|
| 163 |
+
def load_kokoro(self) -> int:
|
| 164 |
+
"""Carrega Kokoro TTS"""
|
| 165 |
+
start = time.time()
|
| 166 |
+
self._progress("Loading Kokoro TTS...", 65)
|
| 167 |
+
|
| 168 |
+
from kokoro import KPipeline
|
| 169 |
+
|
| 170 |
+
self.kokoro_pipeline = KPipeline(lang_code='p', device='cuda')
|
| 171 |
+
|
| 172 |
+
elapsed_ms = int((time.time() - start) * 1000)
|
| 173 |
+
self.metrics.kokoro_ms = elapsed_ms
|
| 174 |
+
self._progress(f"Kokoro loaded in {elapsed_ms/1000:.1f}s", 80)
|
| 175 |
+
|
| 176 |
+
return elapsed_ms
|
| 177 |
+
|
| 178 |
+
def load_cefr(self) -> int:
|
| 179 |
+
"""Carrega CEFR Classifier (pode ser lazy)"""
|
| 180 |
+
start = time.time()
|
| 181 |
+
self._progress("Loading CEFR Classifier...", 85)
|
| 182 |
+
|
| 183 |
+
import torch
|
| 184 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 185 |
+
|
| 186 |
+
self.cefr_tokenizer = AutoTokenizer.from_pretrained(
|
| 187 |
+
self.cefr_model,
|
| 188 |
+
cache_dir=HF_CACHE_DIR,
|
| 189 |
+
)
|
| 190 |
+
|
| 191 |
+
self.cefr_model_instance = AutoModelForSequenceClassification.from_pretrained(
|
| 192 |
+
self.cefr_model,
|
| 193 |
+
torch_dtype=torch.float16,
|
| 194 |
+
low_cpu_mem_usage=True,
|
| 195 |
+
cache_dir=HF_CACHE_DIR,
|
| 196 |
+
).to("cuda")
|
| 197 |
+
self.cefr_model_instance.eval()
|
| 198 |
+
|
| 199 |
+
elapsed_ms = int((time.time() - start) * 1000)
|
| 200 |
+
self.metrics.cefr_ms = elapsed_ms
|
| 201 |
+
self._progress(f"CEFR loaded in {elapsed_ms/1000:.1f}s", 95)
|
| 202 |
+
|
| 203 |
+
return elapsed_ms
|
| 204 |
+
|
| 205 |
+
def load_all_sequential(self) -> LoadingMetrics:
|
| 206 |
+
"""Carrega todos os modelos sequencialmente"""
|
| 207 |
+
overall_start = time.time()
|
| 208 |
+
self._ensure_cache_dirs()
|
| 209 |
+
|
| 210 |
+
self._progress("Starting sequential model loading...", 0)
|
| 211 |
+
|
| 212 |
+
# Order: vLLM first (needs contiguous memory)
|
| 213 |
+
self.load_vllm()
|
| 214 |
+
self.load_whisper()
|
| 215 |
+
self.load_kokoro()
|
| 216 |
+
self.load_cefr()
|
| 217 |
+
|
| 218 |
+
self.metrics.total_ms = int((time.time() - overall_start) * 1000)
|
| 219 |
+
self.metrics.parallel = False
|
| 220 |
+
|
| 221 |
+
self._progress(f"All models loaded in {self.metrics.total_ms/1000:.1f}s", 100)
|
| 222 |
+
|
| 223 |
+
return self.metrics
|
| 224 |
+
|
| 225 |
+
def load_all_parallel(self) -> LoadingMetrics:
|
| 226 |
+
"""
|
| 227 |
+
Carrega modelos em paralelo onde possivel.
|
| 228 |
+
|
| 229 |
+
Ordem otimizada:
|
| 230 |
+
1. vLLM primeiro (precisa de memoria contigua)
|
| 231 |
+
2. Whisper + Kokoro em paralelo
|
| 232 |
+
3. CEFR lazy (carrega em background depois)
|
| 233 |
+
"""
|
| 234 |
+
overall_start = time.time()
|
| 235 |
+
self._ensure_cache_dirs()
|
| 236 |
+
|
| 237 |
+
self._progress("Starting optimized parallel loading...", 0)
|
| 238 |
+
|
| 239 |
+
# Step 1: vLLM first (needs contiguous GPU memory)
|
| 240 |
+
self.load_vllm()
|
| 241 |
+
|
| 242 |
+
# Step 2: Whisper + Kokoro in parallel
|
| 243 |
+
self._progress("Loading Whisper + Kokoro in parallel...", 45)
|
| 244 |
+
|
| 245 |
+
with ThreadPoolExecutor(max_workers=2) as executor:
|
| 246 |
+
whisper_future = executor.submit(self.load_whisper)
|
| 247 |
+
kokoro_future = executor.submit(self.load_kokoro)
|
| 248 |
+
|
| 249 |
+
whisper_future.result()
|
| 250 |
+
kokoro_future.result()
|
| 251 |
+
|
| 252 |
+
# Step 3: CEFR (can be lazy loaded later)
|
| 253 |
+
self.load_cefr()
|
| 254 |
+
|
| 255 |
+
self.metrics.total_ms = int((time.time() - overall_start) * 1000)
|
| 256 |
+
self.metrics.parallel = True
|
| 257 |
+
|
| 258 |
+
self._progress(f"All models loaded in {self.metrics.total_ms/1000:.1f}s (parallel)", 100)
|
| 259 |
+
|
| 260 |
+
return self.metrics
|
| 261 |
+
|
| 262 |
+
def load_essential_only(self) -> LoadingMetrics:
|
| 263 |
+
"""
|
| 264 |
+
Carrega apenas modelos essenciais para responder rapidamente.
|
| 265 |
+
CEFR eh carregado em background.
|
| 266 |
+
|
| 267 |
+
Tempo estimado: ~50-70% do tempo total
|
| 268 |
+
"""
|
| 269 |
+
overall_start = time.time()
|
| 270 |
+
self._ensure_cache_dirs()
|
| 271 |
+
|
| 272 |
+
self._progress("Loading essential models only...", 0)
|
| 273 |
+
|
| 274 |
+
# Essential models
|
| 275 |
+
self.load_vllm()
|
| 276 |
+
|
| 277 |
+
with ThreadPoolExecutor(max_workers=2) as executor:
|
| 278 |
+
whisper_future = executor.submit(self.load_whisper)
|
| 279 |
+
kokoro_future = executor.submit(self.load_kokoro)
|
| 280 |
+
|
| 281 |
+
whisper_future.result()
|
| 282 |
+
kokoro_future.result()
|
| 283 |
+
|
| 284 |
+
self.metrics.total_ms = int((time.time() - overall_start) * 1000)
|
| 285 |
+
self.metrics.parallel = True
|
| 286 |
+
|
| 287 |
+
self._progress(f"Essential models loaded in {self.metrics.total_ms/1000:.1f}s", 90)
|
| 288 |
+
|
| 289 |
+
# Start CEFR loading in background
|
| 290 |
+
self._progress("Starting CEFR background loading...", 92)
|
| 291 |
+
threading.Thread(target=self._load_cefr_background, daemon=True).start()
|
| 292 |
+
|
| 293 |
+
return self.metrics
|
| 294 |
+
|
| 295 |
+
def _load_cefr_background(self):
|
| 296 |
+
"""Carrega CEFR em background"""
|
| 297 |
+
try:
|
| 298 |
+
self.load_cefr()
|
| 299 |
+
print("[BACKGROUND] CEFR classifier loaded!")
|
| 300 |
+
except Exception as e:
|
| 301 |
+
print(f"[BACKGROUND] Failed to load CEFR: {e}")
|
| 302 |
+
|
| 303 |
+
def is_ready(self) -> bool:
|
| 304 |
+
"""Verifica se modelos essenciais estao prontos"""
|
| 305 |
+
return (
|
| 306 |
+
self.vllm_engine is not None and
|
| 307 |
+
self.whisper_model_instance is not None and
|
| 308 |
+
self.kokoro_pipeline is not None
|
| 309 |
+
)
|
| 310 |
+
|
| 311 |
+
def is_fully_ready(self) -> bool:
|
| 312 |
+
"""Verifica se todos os modelos estao prontos"""
|
| 313 |
+
return self.is_ready() and self.cefr_model_instance is not None
|
| 314 |
+
|
| 315 |
+
|
| 316 |
+
def predownload_models(
|
| 317 |
+
vllm_model: str = "RedHatAI/gemma-3-4b-it-quantized.w4a16",
|
| 318 |
+
whisper_model: str = "openai/whisper-small",
|
| 319 |
+
cefr_model: str = "marcosremar2/cefr-classifier-pt-mdeberta-v3-enem",
|
| 320 |
+
):
|
| 321 |
+
"""
|
| 322 |
+
Pre-download models to local cache.
|
| 323 |
+
Run this during VM setup, not during cold start.
|
| 324 |
+
"""
|
| 325 |
+
print("=" * 60)
|
| 326 |
+
print("Pre-downloading models to local cache...")
|
| 327 |
+
print("=" * 60)
|
| 328 |
+
|
| 329 |
+
os.makedirs(HF_CACHE_DIR, exist_ok=True)
|
| 330 |
+
|
| 331 |
+
from huggingface_hub import snapshot_download
|
| 332 |
+
|
| 333 |
+
models = [
|
| 334 |
+
(vllm_model, "vLLM"),
|
| 335 |
+
(whisper_model, "Whisper"),
|
| 336 |
+
(cefr_model, "CEFR"),
|
| 337 |
+
]
|
| 338 |
+
|
| 339 |
+
for model_id, name in models:
|
| 340 |
+
print(f"\n[{name}] Downloading {model_id}...")
|
| 341 |
+
start = time.time()
|
| 342 |
+
|
| 343 |
+
try:
|
| 344 |
+
snapshot_download(
|
| 345 |
+
model_id,
|
| 346 |
+
cache_dir=HF_CACHE_DIR,
|
| 347 |
+
local_dir_use_symlinks=False,
|
| 348 |
+
)
|
| 349 |
+
elapsed = time.time() - start
|
| 350 |
+
print(f"[{name}] Downloaded in {elapsed:.1f}s")
|
| 351 |
+
except Exception as e:
|
| 352 |
+
print(f"[{name}] Error: {e}")
|
| 353 |
+
|
| 354 |
+
print("\n" + "=" * 60)
|
| 355 |
+
print("Pre-download complete!")
|
| 356 |
+
print("=" * 60)
|
| 357 |
+
|
| 358 |
+
|
| 359 |
+
def install_fastsafetensors():
|
| 360 |
+
"""Instala fastsafetensors para loading 4-7x mais rapido"""
|
| 361 |
+
import subprocess
|
| 362 |
+
|
| 363 |
+
print("Installing fastsafetensors...")
|
| 364 |
+
result = subprocess.run(
|
| 365 |
+
[sys.executable, "-m", "pip", "install", "fastsafetensors"],
|
| 366 |
+
capture_output=True,
|
| 367 |
+
text=True,
|
| 368 |
+
)
|
| 369 |
+
|
| 370 |
+
if result.returncode == 0:
|
| 371 |
+
print("fastsafetensors installed successfully!")
|
| 372 |
+
else:
|
| 373 |
+
print(f"Failed to install fastsafetensors: {result.stderr}")
|
| 374 |
+
|
| 375 |
+
|
| 376 |
+
if __name__ == "__main__":
|
| 377 |
+
import argparse
|
| 378 |
+
|
| 379 |
+
parser = argparse.ArgumentParser(description="Fast Model Loader")
|
| 380 |
+
parser.add_argument("--predownload", action="store_true", help="Pre-download models")
|
| 381 |
+
parser.add_argument("--install-fast", action="store_true", help="Install fastsafetensors")
|
| 382 |
+
parser.add_argument("--test-load", action="store_true", help="Test model loading")
|
| 383 |
+
parser.add_argument("--parallel", action="store_true", help="Use parallel loading")
|
| 384 |
+
|
| 385 |
+
args = parser.parse_args()
|
| 386 |
+
|
| 387 |
+
if args.install_fast:
|
| 388 |
+
install_fastsafetensors()
|
| 389 |
+
|
| 390 |
+
if args.predownload:
|
| 391 |
+
predownload_models()
|
| 392 |
+
|
| 393 |
+
if args.test_load:
|
| 394 |
+
loader = FastModelLoader()
|
| 395 |
+
|
| 396 |
+
if args.parallel:
|
| 397 |
+
metrics = loader.load_all_parallel()
|
| 398 |
+
else:
|
| 399 |
+
metrics = loader.load_all_sequential()
|
| 400 |
+
|
| 401 |
+
print("\n" + "=" * 60)
|
| 402 |
+
print("Loading Metrics:")
|
| 403 |
+
print(f" vLLM: {metrics.vllm_ms/1000:.1f}s")
|
| 404 |
+
print(f" Whisper: {metrics.whisper_ms/1000:.1f}s")
|
| 405 |
+
print(f" Kokoro: {metrics.kokoro_ms/1000:.1f}s")
|
| 406 |
+
print(f" CEFR: {metrics.cefr_ms/1000:.1f}s")
|
| 407 |
+
print(f" TOTAL: {metrics.total_ms/1000:.1f}s")
|
| 408 |
+
print(f" Parallel: {metrics.parallel}")
|
| 409 |
+
print("=" * 60)
|
llm/.gitkeep
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Gemma LLM Model
|
| 2 |
+
# HuggingFace: RedHatAI/gemma-3-4b-it-quantized.w4a16
|
models/cefr/.gitkeep
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CEFR Classifier Model
|
| 2 |
+
# HuggingFace: marcosremar2/cefr-classifier-pt-mdeberta-v3-enem
|
models/llm/.gitkeep
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Gemma LLM Model
|
| 2 |
+
# HuggingFace: RedHatAI/gemma-3-4b-it-quantized.w4a16
|
models/stt/.gitkeep
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Whisper STT Model
|
| 2 |
+
# HuggingFace: openai/whisper-small
|
models/tts/.gitkeep
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Kokoro TTS Model
|
| 2 |
+
# HuggingFace: hexgrad/Kokoro-82M
|
requirements.txt
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# v5-tensordock-websocket requirements
|
| 2 |
+
# Para RTX 3090 (24GB VRAM)
|
| 3 |
+
|
| 4 |
+
# Web framework
|
| 5 |
+
fastapi>=0.104.0
|
| 6 |
+
uvicorn[standard]>=0.24.0
|
| 7 |
+
pydantic>=2.0.0
|
| 8 |
+
websockets>=12.0
|
| 9 |
+
|
| 10 |
+
# ML/AI
|
| 11 |
+
torch>=2.1.0
|
| 12 |
+
transformers>=4.36.0
|
| 13 |
+
vllm>=0.2.7
|
| 14 |
+
|
| 15 |
+
# Audio processing
|
| 16 |
+
soundfile>=0.12.0
|
| 17 |
+
librosa>=0.10.0
|
| 18 |
+
numpy>=1.24.0
|
| 19 |
+
|
| 20 |
+
# Voice Activity Detection for WPM calculation
|
| 21 |
+
pyannote-audio>=3.1.0
|
| 22 |
+
|
| 23 |
+
# TTS
|
| 24 |
+
kokoro>=0.1.0
|
| 25 |
+
|
| 26 |
+
# HTTP client (for TensorDock API)
|
| 27 |
+
requests>=2.31.0
|
restore.sh
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Restore PARLE backend from checkpoint
|
| 3 |
+
# Fast startup path - restores pre-loaded models from checkpoint
|
| 4 |
+
#
|
| 5 |
+
# Requires: patched CRIU (criu-patched), io_uring disabled
|
| 6 |
+
# Usage: ./restore.sh [checkpoint-path]
|
| 7 |
+
|
| 8 |
+
set -e
|
| 9 |
+
|
| 10 |
+
CHECKPOINT_DIR="/var/lib/parle-checkpoints"
|
| 11 |
+
CHECKPOINT_PATH="${1:-$CHECKPOINT_DIR/latest}"
|
| 12 |
+
|
| 13 |
+
echo "=============================================="
|
| 14 |
+
echo "PARLE Backend Restore"
|
| 15 |
+
echo "=============================================="
|
| 16 |
+
|
| 17 |
+
# Check if checkpoint exists
|
| 18 |
+
if [ ! -d "$CHECKPOINT_PATH" ] && [ ! -L "$CHECKPOINT_PATH" ]; then
|
| 19 |
+
echo "ERROR: Checkpoint not found: $CHECKPOINT_PATH"
|
| 20 |
+
echo ""
|
| 21 |
+
echo "Available checkpoints:"
|
| 22 |
+
ls -la "$CHECKPOINT_DIR" 2>/dev/null || echo " (none)"
|
| 23 |
+
echo ""
|
| 24 |
+
echo "To create a checkpoint:"
|
| 25 |
+
echo " 1. Start normally: ./start.sh"
|
| 26 |
+
echo " 2. Wait for models to load (~45s)"
|
| 27 |
+
echo " 3. Create checkpoint: ./checkpoint.sh"
|
| 28 |
+
exit 1
|
| 29 |
+
fi
|
| 30 |
+
|
| 31 |
+
# Resolve symlink if needed
|
| 32 |
+
if [ -L "$CHECKPOINT_PATH" ]; then
|
| 33 |
+
CHECKPOINT_PATH=$(readlink -f "$CHECKPOINT_PATH")
|
| 34 |
+
fi
|
| 35 |
+
|
| 36 |
+
echo "Checkpoint: $CHECKPOINT_PATH"
|
| 37 |
+
echo "Size: $(du -sh "$CHECKPOINT_PATH" | cut -f1)"
|
| 38 |
+
|
| 39 |
+
# Check if another instance is running
|
| 40 |
+
if pgrep -f "python.*app.py" > /dev/null; then
|
| 41 |
+
echo ""
|
| 42 |
+
echo "WARNING: Backend already running"
|
| 43 |
+
echo "Kill it first: pkill -9 -f 'python.*app.py'"
|
| 44 |
+
exit 1
|
| 45 |
+
fi
|
| 46 |
+
|
| 47 |
+
# Check CRIU
|
| 48 |
+
if [ ! -f /usr/local/bin/criu-patched ]; then
|
| 49 |
+
echo "ERROR: Patched CRIU not found at /usr/local/bin/criu-patched"
|
| 50 |
+
echo "Run setup-criu-patched.sh first"
|
| 51 |
+
exit 1
|
| 52 |
+
fi
|
| 53 |
+
|
| 54 |
+
# Change to the correct directory
|
| 55 |
+
cd /home/user
|
| 56 |
+
|
| 57 |
+
echo ""
|
| 58 |
+
echo "Restoring from checkpoint..."
|
| 59 |
+
START_TIME=$(date +%s)
|
| 60 |
+
|
| 61 |
+
# Restore with patched CRIU (runs in background)
|
| 62 |
+
CRIU_PLUGINS_DIR=/usr/lib/criu /usr/local/bin/criu-patched restore \
|
| 63 |
+
-D "$CHECKPOINT_PATH" \
|
| 64 |
+
--shell-job \
|
| 65 |
+
--tcp-established \
|
| 66 |
+
--file-locks \
|
| 67 |
+
--ext-unix-sk \
|
| 68 |
+
-v0 \
|
| 69 |
+
-o "$CHECKPOINT_PATH/restore.log" 2>/dev/null &
|
| 70 |
+
|
| 71 |
+
RESTORE_PID=$!
|
| 72 |
+
|
| 73 |
+
# Wait for backend to be ready
|
| 74 |
+
echo "Waiting for backend health..."
|
| 75 |
+
for i in {1..60}; do
|
| 76 |
+
HEALTH=$(curl -s --max-time 2 http://localhost:8000/health 2>/dev/null)
|
| 77 |
+
if [ ! -z "$HEALTH" ]; then
|
| 78 |
+
END_TIME=$(date +%s)
|
| 79 |
+
DURATION=$((END_TIME - START_TIME))
|
| 80 |
+
|
| 81 |
+
# Get process info
|
| 82 |
+
PYTHON_PID=$(pgrep -f "python.*app.py" | head -1)
|
| 83 |
+
|
| 84 |
+
echo ""
|
| 85 |
+
echo "=============================================="
|
| 86 |
+
echo "Backend restored successfully!"
|
| 87 |
+
echo "=============================================="
|
| 88 |
+
echo "Restore time: ${DURATION}s"
|
| 89 |
+
echo "Process PID: $PYTHON_PID"
|
| 90 |
+
echo ""
|
| 91 |
+
echo "Health check:"
|
| 92 |
+
echo "$HEALTH" | python3 -c "import sys,json; d=json.load(sys.stdin); print(f' Status: {d[\"status\"]}'); print(f' vLLM: {d[\"vllm_loaded\"]}'); print(f' Whisper: {d[\"whisper_loaded\"]}'); print(f' Kokoro: {d[\"kokoro_loaded\"]}')" 2>/dev/null || echo "$HEALTH"
|
| 93 |
+
echo ""
|
| 94 |
+
echo "Backend ready at http://localhost:8000"
|
| 95 |
+
exit 0
|
| 96 |
+
fi
|
| 97 |
+
|
| 98 |
+
if [ $((i % 10)) -eq 0 ]; then
|
| 99 |
+
echo " Still waiting... ($i/60s)"
|
| 100 |
+
fi
|
| 101 |
+
sleep 1
|
| 102 |
+
done
|
| 103 |
+
|
| 104 |
+
echo ""
|
| 105 |
+
echo "ERROR: Backend did not respond within 60 seconds"
|
| 106 |
+
echo "Check restore log: $CHECKPOINT_PATH/restore.log"
|
| 107 |
+
tail -20 "$CHECKPOINT_PATH/restore.log" 2>/dev/null
|
| 108 |
+
exit 1
|
setup-criu-patched.sh
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Setup patched CRIU for PyTorch checkpoint/restore on TensorDock
|
| 3 |
+
# This script compiles CRIU with patches to skip unsupported nvidia device FDs
|
| 4 |
+
|
| 5 |
+
set -e
|
| 6 |
+
|
| 7 |
+
echo "=============================================="
|
| 8 |
+
echo "Setting up patched CRIU for PyTorch C/R"
|
| 9 |
+
echo "=============================================="
|
| 10 |
+
|
| 11 |
+
# Install dependencies
|
| 12 |
+
echo "[1/5] Installing build dependencies..."
|
| 13 |
+
apt-get update
|
| 14 |
+
apt-get install -y build-essential pkg-config libprotobuf-dev libprotobuf-c-dev \
|
| 15 |
+
protobuf-c-compiler protobuf-compiler python3-protobuf libbsd-dev \
|
| 16 |
+
libcap-dev libnl-3-dev libnet1-dev libaio-dev libgnutls28-dev \
|
| 17 |
+
python3-future asciidoc xmlto git
|
| 18 |
+
|
| 19 |
+
# Clone CRIU
|
| 20 |
+
echo "[2/5] Cloning CRIU..."
|
| 21 |
+
cd /tmp
|
| 22 |
+
rm -rf criu-patched
|
| 23 |
+
git clone --depth 1 https://github.com/checkpoint-restore/criu.git criu-patched
|
| 24 |
+
cd criu-patched
|
| 25 |
+
|
| 26 |
+
# Apply patch to files-ext.c (skip unsupported FDs during dump)
|
| 27 |
+
echo "[3/5] Applying dump patch..."
|
| 28 |
+
perl -i -0pe 's/(int dump_unsupp_fd.*?if \(ret == -ENOTSUP\))\s*pr_err\("Can.t dump file.*?\n\s*return -1;/$1 {\n\t\tpr_warn("Skipping file %d of that type [%o] (%s %s)\\n", p->fd, p->stat.st_mode, more, info);\n\t\treturn 0; \/\/ PATCHED: skip unsupported FDs\n\t}\n\treturn -1;/s' criu/files-ext.c
|
| 29 |
+
|
| 30 |
+
# Apply patch to files.c (skip missing FDs during restore)
|
| 31 |
+
echo "[4/5] Applying restore patch..."
|
| 32 |
+
perl -i -0pe 's/(fdesc = find_file_desc\(e\);\s*if \(fdesc == NULL\) \{)\s*pr_err\("No file for fd.*?\n\s*return -1;/$1\n\t\tpr_warn("No file for fd %d id %#x, skipping (PATCHED)\\n", e->fd, e->id);\n\t\treturn 0; \/\/ PATCHED: skip missing FDs/s' criu/files.c
|
| 33 |
+
|
| 34 |
+
# Build
|
| 35 |
+
echo "[5/5] Building patched CRIU..."
|
| 36 |
+
make -j$(nproc)
|
| 37 |
+
|
| 38 |
+
# Install
|
| 39 |
+
cp criu/criu /usr/local/bin/criu-patched
|
| 40 |
+
mkdir -p /usr/lib/criu
|
| 41 |
+
cp plugins/cuda/cuda_plugin.so /usr/lib/criu/
|
| 42 |
+
|
| 43 |
+
# Verify
|
| 44 |
+
echo ""
|
| 45 |
+
echo "=============================================="
|
| 46 |
+
echo "Patched CRIU installed!"
|
| 47 |
+
echo "=============================================="
|
| 48 |
+
/usr/local/bin/criu-patched --version
|
| 49 |
+
|
| 50 |
+
# Setup io_uring disable (persists across reboots)
|
| 51 |
+
echo ""
|
| 52 |
+
echo "Disabling io_uring at kernel level..."
|
| 53 |
+
sysctl -w kernel.io_uring_disabled=2
|
| 54 |
+
echo "kernel.io_uring_disabled=2" >> /etc/sysctl.conf
|
| 55 |
+
|
| 56 |
+
echo ""
|
| 57 |
+
echo "Setup complete! Run checkpoint.sh after models are loaded."
|
setup-criu.sh
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# CRIU + cuda-checkpoint Setup Script for TensorDock
|
| 3 |
+
# Run this once on a fresh VM to install all dependencies
|
| 4 |
+
|
| 5 |
+
set -e
|
| 6 |
+
|
| 7 |
+
echo "=================================================="
|
| 8 |
+
echo "Setting up CRIU + cuda-checkpoint for fast restore"
|
| 9 |
+
echo "=================================================="
|
| 10 |
+
|
| 11 |
+
# Check if running as root
|
| 12 |
+
if [ "$EUID" -ne 0 ]; then
|
| 13 |
+
echo "Please run as root (sudo ./setup-criu.sh)"
|
| 14 |
+
exit 1
|
| 15 |
+
fi
|
| 16 |
+
|
| 17 |
+
# Check NVIDIA driver version (needs 550+)
|
| 18 |
+
echo ""
|
| 19 |
+
echo "[1/5] Checking NVIDIA driver version..."
|
| 20 |
+
DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader | head -1)
|
| 21 |
+
MAJOR_VERSION=$(echo $DRIVER_VERSION | cut -d'.' -f1)
|
| 22 |
+
|
| 23 |
+
echo "Driver version: $DRIVER_VERSION"
|
| 24 |
+
|
| 25 |
+
if [ "$MAJOR_VERSION" -lt 550 ]; then
|
| 26 |
+
echo "ERROR: NVIDIA driver 550+ required for cuda-checkpoint"
|
| 27 |
+
echo "Current version: $DRIVER_VERSION"
|
| 28 |
+
echo ""
|
| 29 |
+
echo "To upgrade driver:"
|
| 30 |
+
echo " sudo apt-get update"
|
| 31 |
+
echo " sudo apt-get install nvidia-driver-550"
|
| 32 |
+
exit 1
|
| 33 |
+
fi
|
| 34 |
+
|
| 35 |
+
echo "Driver version OK!"
|
| 36 |
+
|
| 37 |
+
# Install CRIU
|
| 38 |
+
echo ""
|
| 39 |
+
echo "[2/5] Installing CRIU..."
|
| 40 |
+
apt-get update
|
| 41 |
+
apt-get install -y criu
|
| 42 |
+
|
| 43 |
+
# Verify CRIU installation
|
| 44 |
+
CRIU_VERSION=$(criu --version | head -1)
|
| 45 |
+
echo "CRIU installed: $CRIU_VERSION"
|
| 46 |
+
|
| 47 |
+
# Clone cuda-checkpoint
|
| 48 |
+
echo ""
|
| 49 |
+
echo "[3/5] Setting up cuda-checkpoint..."
|
| 50 |
+
CUDA_CHECKPOINT_DIR="/opt/cuda-checkpoint"
|
| 51 |
+
|
| 52 |
+
if [ -d "$CUDA_CHECKPOINT_DIR" ]; then
|
| 53 |
+
echo "cuda-checkpoint already exists, updating..."
|
| 54 |
+
cd "$CUDA_CHECKPOINT_DIR"
|
| 55 |
+
git pull
|
| 56 |
+
else
|
| 57 |
+
git clone https://github.com/NVIDIA/cuda-checkpoint.git "$CUDA_CHECKPOINT_DIR"
|
| 58 |
+
fi
|
| 59 |
+
|
| 60 |
+
# Create symlink for easy access
|
| 61 |
+
ln -sf "$CUDA_CHECKPOINT_DIR/bin/cuda-checkpoint" /usr/local/bin/cuda-checkpoint
|
| 62 |
+
chmod +x /usr/local/bin/cuda-checkpoint
|
| 63 |
+
|
| 64 |
+
echo "cuda-checkpoint installed at /usr/local/bin/cuda-checkpoint"
|
| 65 |
+
|
| 66 |
+
# Create checkpoint directory
|
| 67 |
+
echo ""
|
| 68 |
+
echo "[4/5] Creating checkpoint directory..."
|
| 69 |
+
CHECKPOINT_DIR="/var/lib/parle-checkpoints"
|
| 70 |
+
mkdir -p "$CHECKPOINT_DIR"
|
| 71 |
+
chmod 755 "$CHECKPOINT_DIR"
|
| 72 |
+
|
| 73 |
+
echo "Checkpoint directory: $CHECKPOINT_DIR"
|
| 74 |
+
|
| 75 |
+
# Test cuda-checkpoint
|
| 76 |
+
echo ""
|
| 77 |
+
echo "[5/5] Testing cuda-checkpoint..."
|
| 78 |
+
cuda-checkpoint --help > /dev/null 2>&1 && echo "cuda-checkpoint: OK" || echo "cuda-checkpoint: FAILED"
|
| 79 |
+
criu check > /dev/null 2>&1 && echo "CRIU check: OK" || echo "CRIU check: WARNING (some features may not work)"
|
| 80 |
+
|
| 81 |
+
echo ""
|
| 82 |
+
echo "=================================================="
|
| 83 |
+
echo "Setup complete!"
|
| 84 |
+
echo "=================================================="
|
| 85 |
+
echo ""
|
| 86 |
+
echo "Next steps:"
|
| 87 |
+
echo "1. Start the backend normally: ./start.sh"
|
| 88 |
+
echo "2. Wait for models to load (~2 min)"
|
| 89 |
+
echo "3. Create checkpoint: ./checkpoint.sh"
|
| 90 |
+
echo "4. Next time, restore: ./restore.sh (should be ~5-10s)"
|
| 91 |
+
echo ""
|
setup-fast-coldstart.sh
ADDED
|
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# =============================================================================
|
| 3 |
+
# FAST COLD START SETUP
|
| 4 |
+
# =============================================================================
|
| 5 |
+
# Este script prepara a VM TensorDock para cold starts rapidos (~60s vs ~487s)
|
| 6 |
+
#
|
| 7 |
+
# Otimizacoes:
|
| 8 |
+
# 1. Pre-download models para SSD local
|
| 9 |
+
# 2. Instala fastsafetensors (loading 4-7x mais rapido)
|
| 10 |
+
# 3. Configura CUDA graph caching
|
| 11 |
+
# 4. Configura environment variables otimizados
|
| 12 |
+
#
|
| 13 |
+
# Uso: ./setup-fast-coldstart.sh
|
| 14 |
+
# =============================================================================
|
| 15 |
+
|
| 16 |
+
set -e
|
| 17 |
+
|
| 18 |
+
echo "=============================================="
|
| 19 |
+
echo "FAST COLD START SETUP"
|
| 20 |
+
echo "=============================================="
|
| 21 |
+
|
| 22 |
+
# Directories
|
| 23 |
+
CACHE_DIR="/var/cache/parle-models"
|
| 24 |
+
HF_CACHE="$CACHE_DIR/huggingface"
|
| 25 |
+
VLLM_CACHE="$CACHE_DIR/vllm"
|
| 26 |
+
CUDA_CACHE="$CACHE_DIR/cuda-cache"
|
| 27 |
+
|
| 28 |
+
# Create directories
|
| 29 |
+
echo "[1/5] Creating cache directories..."
|
| 30 |
+
sudo mkdir -p $CACHE_DIR
|
| 31 |
+
sudo mkdir -p $HF_CACHE
|
| 32 |
+
sudo mkdir -p $VLLM_CACHE
|
| 33 |
+
sudo mkdir -p $CUDA_CACHE
|
| 34 |
+
sudo chmod -R 777 $CACHE_DIR
|
| 35 |
+
|
| 36 |
+
# Set environment variables permanently
|
| 37 |
+
echo "[2/5] Setting environment variables..."
|
| 38 |
+
cat >> ~/.bashrc << 'EOF'
|
| 39 |
+
|
| 40 |
+
# PARLE Fast Cold Start Environment
|
| 41 |
+
export HF_HOME=/var/cache/parle-models/huggingface
|
| 42 |
+
export VLLM_CACHE_DIR=/var/cache/parle-models/vllm
|
| 43 |
+
export CUDA_CACHE_PATH=/var/cache/parle-models/cuda-cache
|
| 44 |
+
export USE_FASTSAFETENSOR=true
|
| 45 |
+
export TOKENIZERS_PARALLELISM=false
|
| 46 |
+
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
|
| 47 |
+
|
| 48 |
+
# vLLM optimizations
|
| 49 |
+
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
|
| 50 |
+
export VLLM_USE_TRITON_FLASH_ATTN=1
|
| 51 |
+
EOF
|
| 52 |
+
|
| 53 |
+
# Source the new environment
|
| 54 |
+
source ~/.bashrc
|
| 55 |
+
|
| 56 |
+
# Install fastsafetensors
|
| 57 |
+
echo "[3/5] Installing fastsafetensors (4-7x faster loading)..."
|
| 58 |
+
pip install fastsafetensors 2>/dev/null || {
|
| 59 |
+
echo "Warning: fastsafetensors installation failed, will use default loader"
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
# Install NVIDIA Model Streamer (optional, for S3 loading)
|
| 63 |
+
echo "[4/5] Installing nvidia-model-streamer (optional)..."
|
| 64 |
+
pip install nvidia-model-streamer 2>/dev/null || {
|
| 65 |
+
echo "Warning: nvidia-model-streamer not available"
|
| 66 |
+
}
|
| 67 |
+
|
| 68 |
+
# Pre-download models
|
| 69 |
+
echo "[5/5] Pre-downloading models to local cache..."
|
| 70 |
+
echo "This may take 10-30 minutes depending on network speed..."
|
| 71 |
+
|
| 72 |
+
python3 << 'PYTHON_SCRIPT'
|
| 73 |
+
import os
|
| 74 |
+
import time
|
| 75 |
+
|
| 76 |
+
os.environ["HF_HOME"] = "/var/cache/parle-models/huggingface"
|
| 77 |
+
|
| 78 |
+
from huggingface_hub import snapshot_download
|
| 79 |
+
|
| 80 |
+
models = [
|
| 81 |
+
("RedHatAI/gemma-3-4b-it-quantized.w4a16", "vLLM (Gemma 4B)"),
|
| 82 |
+
("openai/whisper-small", "Whisper STT"),
|
| 83 |
+
("marcosremar2/cefr-classifier-pt-mdeberta-v3-enem", "CEFR Classifier"),
|
| 84 |
+
]
|
| 85 |
+
|
| 86 |
+
print("\n" + "=" * 50)
|
| 87 |
+
for model_id, name in models:
|
| 88 |
+
print(f"\nDownloading {name}: {model_id}")
|
| 89 |
+
start = time.time()
|
| 90 |
+
|
| 91 |
+
try:
|
| 92 |
+
path = snapshot_download(
|
| 93 |
+
model_id,
|
| 94 |
+
cache_dir="/var/cache/parle-models/huggingface",
|
| 95 |
+
)
|
| 96 |
+
elapsed = time.time() - start
|
| 97 |
+
print(f" Downloaded to {path} in {elapsed:.1f}s")
|
| 98 |
+
except Exception as e:
|
| 99 |
+
print(f" ERROR: {e}")
|
| 100 |
+
|
| 101 |
+
# Also download Kokoro voices
|
| 102 |
+
print("\nDownloading Kokoro TTS voices...")
|
| 103 |
+
try:
|
| 104 |
+
from kokoro import KPipeline
|
| 105 |
+
pipeline = KPipeline(lang_code='p', device='cpu') # Just to trigger download
|
| 106 |
+
print(" Kokoro voices downloaded!")
|
| 107 |
+
except Exception as e:
|
| 108 |
+
print(f" Kokoro download skipped: {e}")
|
| 109 |
+
|
| 110 |
+
print("\n" + "=" * 50)
|
| 111 |
+
print("Pre-download complete!")
|
| 112 |
+
print("=" * 50)
|
| 113 |
+
PYTHON_SCRIPT
|
| 114 |
+
|
| 115 |
+
echo ""
|
| 116 |
+
echo "=============================================="
|
| 117 |
+
echo "SETUP COMPLETE!"
|
| 118 |
+
echo "=============================================="
|
| 119 |
+
echo ""
|
| 120 |
+
echo "Expected cold start improvement:"
|
| 121 |
+
echo " Before: ~487s (8 min)"
|
| 122 |
+
echo " After: ~60-90s (1-1.5 min)"
|
| 123 |
+
echo ""
|
| 124 |
+
echo "Optimizations applied:"
|
| 125 |
+
echo " - Models cached locally on SSD"
|
| 126 |
+
echo " - fastsafetensors for 4-7x faster loading"
|
| 127 |
+
echo " - CUDA graph caching enabled"
|
| 128 |
+
echo " - Environment variables optimized"
|
| 129 |
+
echo ""
|
| 130 |
+
echo "To test: ./start-smart.sh"
|
| 131 |
+
echo "=============================================="
|
start-optimized.sh
ADDED
|
@@ -0,0 +1,198 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# =============================================================================
|
| 3 |
+
# OPTIMIZED PARLE BACKEND STARTUP
|
| 4 |
+
# =============================================================================
|
| 5 |
+
# Startup script com medicoes de tempo para cada fase
|
| 6 |
+
#
|
| 7 |
+
# Fases:
|
| 8 |
+
# 1. Environment setup
|
| 9 |
+
# 2. Check/restore from checkpoint (se disponivel)
|
| 10 |
+
# 3. Fast model loading (otimizado)
|
| 11 |
+
# 4. Health check
|
| 12 |
+
# =============================================================================
|
| 13 |
+
|
| 14 |
+
set -e
|
| 15 |
+
|
| 16 |
+
SCRIPT_DIR="$(dirname "$0")"
|
| 17 |
+
LOG_FILE="$SCRIPT_DIR/startup.log"
|
| 18 |
+
|
| 19 |
+
# Timing function
|
| 20 |
+
timestamp() {
|
| 21 |
+
date +%s.%N
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
log() {
|
| 25 |
+
echo "[$(date '+%H:%M:%S')] $1" | tee -a "$LOG_FILE"
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
# Start timing
|
| 29 |
+
TOTAL_START=$(timestamp)
|
| 30 |
+
|
| 31 |
+
echo "=============================================="
|
| 32 |
+
echo "PARLE Backend - Optimized Startup"
|
| 33 |
+
echo "=============================================="
|
| 34 |
+
echo "" > "$LOG_FILE"
|
| 35 |
+
|
| 36 |
+
# =============================================================================
|
| 37 |
+
# PHASE 1: Environment Setup
|
| 38 |
+
# =============================================================================
|
| 39 |
+
PHASE1_START=$(timestamp)
|
| 40 |
+
log "PHASE 1: Environment Setup"
|
| 41 |
+
|
| 42 |
+
# Set optimized environment
|
| 43 |
+
export HF_HOME=/var/cache/parle-models/huggingface
|
| 44 |
+
export VLLM_CACHE_DIR=/var/cache/parle-models/vllm
|
| 45 |
+
export CUDA_CACHE_PATH=/var/cache/parle-models/cuda-cache
|
| 46 |
+
export USE_FASTSAFETENSOR=true
|
| 47 |
+
export TOKENIZERS_PARALLELISM=false
|
| 48 |
+
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
|
| 49 |
+
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
|
| 50 |
+
|
| 51 |
+
# Check if models are pre-cached
|
| 52 |
+
if [ -d "/var/cache/parle-models/huggingface" ]; then
|
| 53 |
+
CACHE_SIZE=$(du -sh /var/cache/parle-models/huggingface 2>/dev/null | cut -f1)
|
| 54 |
+
log " Model cache found: $CACHE_SIZE"
|
| 55 |
+
else
|
| 56 |
+
log " WARNING: No model cache found. First run will be slow."
|
| 57 |
+
fi
|
| 58 |
+
|
| 59 |
+
PHASE1_END=$(timestamp)
|
| 60 |
+
PHASE1_TIME=$(echo "$PHASE1_END - $PHASE1_START" | bc)
|
| 61 |
+
log " Phase 1 complete: ${PHASE1_TIME}s"
|
| 62 |
+
|
| 63 |
+
# =============================================================================
|
| 64 |
+
# PHASE 2: Checkpoint Restore (if available)
|
| 65 |
+
# =============================================================================
|
| 66 |
+
PHASE2_START=$(timestamp)
|
| 67 |
+
log "PHASE 2: Checkpoint Check"
|
| 68 |
+
|
| 69 |
+
CHECKPOINT_DIR="/var/lib/parle-checkpoints"
|
| 70 |
+
CHECKPOINT_PATH="$CHECKPOINT_DIR/latest"
|
| 71 |
+
|
| 72 |
+
if [ -d "$CHECKPOINT_PATH" ] || [ -L "$CHECKPOINT_PATH" ]; then
|
| 73 |
+
log " Checkpoint found! Attempting restore..."
|
| 74 |
+
|
| 75 |
+
if "$SCRIPT_DIR/restore.sh" "$CHECKPOINT_PATH" 2>/dev/null; then
|
| 76 |
+
PHASE2_END=$(timestamp)
|
| 77 |
+
PHASE2_TIME=$(echo "$PHASE2_END - $PHASE2_START" | bc)
|
| 78 |
+
TOTAL_TIME=$(echo "$PHASE2_END - $TOTAL_START" | bc)
|
| 79 |
+
|
| 80 |
+
log " Restored from checkpoint!"
|
| 81 |
+
log ""
|
| 82 |
+
log "=============================================="
|
| 83 |
+
log "STARTUP COMPLETE (from checkpoint)"
|
| 84 |
+
log " Phase 1 (env): ${PHASE1_TIME}s"
|
| 85 |
+
log " Phase 2 (restore): ${PHASE2_TIME}s"
|
| 86 |
+
log " TOTAL: ${TOTAL_TIME}s"
|
| 87 |
+
log "=============================================="
|
| 88 |
+
exit 0
|
| 89 |
+
else
|
| 90 |
+
log " Checkpoint restore failed, continuing with cold start"
|
| 91 |
+
fi
|
| 92 |
+
else
|
| 93 |
+
log " No checkpoint found, proceeding with cold start"
|
| 94 |
+
fi
|
| 95 |
+
|
| 96 |
+
PHASE2_END=$(timestamp)
|
| 97 |
+
PHASE2_TIME=$(echo "$PHASE2_END - $PHASE2_START" | bc)
|
| 98 |
+
log " Phase 2 complete: ${PHASE2_TIME}s"
|
| 99 |
+
|
| 100 |
+
# =============================================================================
|
| 101 |
+
# PHASE 3: Model Loading (Optimized)
|
| 102 |
+
# =============================================================================
|
| 103 |
+
PHASE3_START=$(timestamp)
|
| 104 |
+
log "PHASE 3: Model Loading"
|
| 105 |
+
|
| 106 |
+
# Start the server with optimized loading
|
| 107 |
+
cd "$SCRIPT_DIR"
|
| 108 |
+
|
| 109 |
+
# Create a Python script for optimized loading
|
| 110 |
+
python3 << 'PYTHON_SCRIPT' &
|
| 111 |
+
import os
|
| 112 |
+
import sys
|
| 113 |
+
import time
|
| 114 |
+
|
| 115 |
+
# Ensure environment
|
| 116 |
+
os.environ["HF_HOME"] = "/var/cache/parle-models/huggingface"
|
| 117 |
+
os.environ["USE_FASTSAFETENSOR"] = "true"
|
| 118 |
+
|
| 119 |
+
print("[STARTUP] Starting optimized model loading...")
|
| 120 |
+
start = time.time()
|
| 121 |
+
|
| 122 |
+
# Import app module (will trigger load_models on startup)
|
| 123 |
+
import uvicorn
|
| 124 |
+
|
| 125 |
+
# Run server
|
| 126 |
+
uvicorn.run(
|
| 127 |
+
"app:app",
|
| 128 |
+
host="0.0.0.0",
|
| 129 |
+
port=8000,
|
| 130 |
+
log_level="info",
|
| 131 |
+
)
|
| 132 |
+
PYTHON_SCRIPT
|
| 133 |
+
|
| 134 |
+
SERVER_PID=$!
|
| 135 |
+
log " Server started (PID: $SERVER_PID)"
|
| 136 |
+
|
| 137 |
+
# =============================================================================
|
| 138 |
+
# PHASE 4: Health Check
|
| 139 |
+
# =============================================================================
|
| 140 |
+
PHASE4_START=$(timestamp)
|
| 141 |
+
log "PHASE 4: Waiting for health..."
|
| 142 |
+
|
| 143 |
+
# Wait for backend to be healthy
|
| 144 |
+
MAX_WAIT=300 # 5 minutes max
|
| 145 |
+
WAIT_INTERVAL=2
|
| 146 |
+
|
| 147 |
+
for i in $(seq 1 $((MAX_WAIT / WAIT_INTERVAL))); do
|
| 148 |
+
HEALTH=$(curl -s --max-time 2 http://localhost:8000/health 2>/dev/null || echo "")
|
| 149 |
+
|
| 150 |
+
if [ ! -z "$HEALTH" ]; then
|
| 151 |
+
# Check if all models are loaded
|
| 152 |
+
WHISPER=$(echo "$HEALTH" | grep -o '"whisper_loaded":true' || true)
|
| 153 |
+
VLLM=$(echo "$HEALTH" | grep -o '"vllm_loaded":true' || true)
|
| 154 |
+
KOKORO=$(echo "$HEALTH" | grep -o '"kokoro_loaded":true' || true)
|
| 155 |
+
|
| 156 |
+
if [ ! -z "$WHISPER" ] && [ ! -z "$VLLM" ] && [ ! -z "$KOKORO" ]; then
|
| 157 |
+
PHASE4_END=$(timestamp)
|
| 158 |
+
PHASE3_TIME=$(echo "$PHASE4_START - $PHASE3_START" | bc)
|
| 159 |
+
PHASE4_TIME=$(echo "$PHASE4_END - $PHASE4_START" | bc)
|
| 160 |
+
TOTAL_TIME=$(echo "$PHASE4_END - $TOTAL_START" | bc)
|
| 161 |
+
|
| 162 |
+
echo ""
|
| 163 |
+
log "=============================================="
|
| 164 |
+
log "STARTUP COMPLETE (cold start)"
|
| 165 |
+
log " Phase 1 (env): ${PHASE1_TIME}s"
|
| 166 |
+
log " Phase 2 (checkpoint): ${PHASE2_TIME}s"
|
| 167 |
+
log " Phase 3 (loading): ${PHASE3_TIME}s"
|
| 168 |
+
log " Phase 4 (health): ${PHASE4_TIME}s"
|
| 169 |
+
log " TOTAL: ${TOTAL_TIME}s"
|
| 170 |
+
log "=============================================="
|
| 171 |
+
log ""
|
| 172 |
+
log "Server running at http://localhost:8000"
|
| 173 |
+
log "Health endpoint: http://localhost:8000/health"
|
| 174 |
+
log ""
|
| 175 |
+
|
| 176 |
+
# Create checkpoint for faster next startup
|
| 177 |
+
if [ ! -d "$CHECKPOINT_PATH" ]; then
|
| 178 |
+
log "TIP: Create checkpoint for faster startup:"
|
| 179 |
+
log " ./checkpoint.sh"
|
| 180 |
+
fi
|
| 181 |
+
|
| 182 |
+
# Keep script running
|
| 183 |
+
wait $SERVER_PID
|
| 184 |
+
exit 0
|
| 185 |
+
fi
|
| 186 |
+
fi
|
| 187 |
+
|
| 188 |
+
# Progress update every 10 seconds
|
| 189 |
+
if [ $((i % 5)) -eq 0 ]; then
|
| 190 |
+
ELAPSED=$((i * WAIT_INTERVAL))
|
| 191 |
+
log " Still loading... (${ELAPSED}s)"
|
| 192 |
+
fi
|
| 193 |
+
|
| 194 |
+
sleep $WAIT_INTERVAL
|
| 195 |
+
done
|
| 196 |
+
|
| 197 |
+
log "ERROR: Timeout waiting for backend (${MAX_WAIT}s)"
|
| 198 |
+
exit 1
|
start-smart.sh
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Smart PARLE Backend Startup Script
|
| 3 |
+
# Attempts restore from checkpoint first, falls back to cold start
|
| 4 |
+
#
|
| 5 |
+
# Usage: ./start-smart.sh
|
| 6 |
+
|
| 7 |
+
set -e
|
| 8 |
+
|
| 9 |
+
CHECKPOINT_DIR="/var/lib/parle-checkpoints"
|
| 10 |
+
CHECKPOINT_PATH="$CHECKPOINT_DIR/latest"
|
| 11 |
+
SCRIPT_DIR="$(dirname "$0")"
|
| 12 |
+
|
| 13 |
+
echo "=============================================="
|
| 14 |
+
echo "PARLE Backend Smart Startup"
|
| 15 |
+
echo "=============================================="
|
| 16 |
+
|
| 17 |
+
# Check if checkpoint exists
|
| 18 |
+
if [ -d "$CHECKPOINT_PATH" ] || [ -L "$CHECKPOINT_PATH" ]; then
|
| 19 |
+
echo "Checkpoint found! Attempting fast restore..."
|
| 20 |
+
echo ""
|
| 21 |
+
|
| 22 |
+
START_TIME=$(date +%s)
|
| 23 |
+
|
| 24 |
+
# Try to restore
|
| 25 |
+
if "$SCRIPT_DIR/restore.sh" "$CHECKPOINT_PATH"; then
|
| 26 |
+
END_TIME=$(date +%s)
|
| 27 |
+
DURATION=$((END_TIME - START_TIME))
|
| 28 |
+
echo ""
|
| 29 |
+
echo "Fast restore completed in ${DURATION}s!"
|
| 30 |
+
exit 0
|
| 31 |
+
else
|
| 32 |
+
echo ""
|
| 33 |
+
echo "Restore failed, falling back to cold start..."
|
| 34 |
+
echo ""
|
| 35 |
+
fi
|
| 36 |
+
else
|
| 37 |
+
echo "No checkpoint found at $CHECKPOINT_PATH"
|
| 38 |
+
echo "Performing cold start..."
|
| 39 |
+
echo ""
|
| 40 |
+
fi
|
| 41 |
+
|
| 42 |
+
# Cold start fallback
|
| 43 |
+
echo "=============================================="
|
| 44 |
+
echo "Cold Start Mode"
|
| 45 |
+
echo "=============================================="
|
| 46 |
+
|
| 47 |
+
START_TIME=$(date +%s)
|
| 48 |
+
|
| 49 |
+
# Run the normal start script
|
| 50 |
+
"$SCRIPT_DIR/start.sh" &
|
| 51 |
+
SERVER_PID=$!
|
| 52 |
+
|
| 53 |
+
# Wait for backend to be healthy
|
| 54 |
+
echo "Waiting for backend to be ready..."
|
| 55 |
+
for i in {1..180}; do
|
| 56 |
+
HEALTH=$(curl -s --max-time 2 http://localhost:8000/health 2>/dev/null)
|
| 57 |
+
if [ ! -z "$HEALTH" ]; then
|
| 58 |
+
# Check if all models are loaded
|
| 59 |
+
WHISPER=$(echo "$HEALTH" | grep -o '"whisper_loaded":true' || true)
|
| 60 |
+
VLLM=$(echo "$HEALTH" | grep -o '"vllm_loaded":true' || true)
|
| 61 |
+
KOKORO=$(echo "$HEALTH" | grep -o '"kokoro_loaded":true' || true)
|
| 62 |
+
|
| 63 |
+
if [ ! -z "$WHISPER" ] && [ ! -z "$VLLM" ] && [ ! -z "$KOKORO" ]; then
|
| 64 |
+
END_TIME=$(date +%s)
|
| 65 |
+
DURATION=$((END_TIME - START_TIME))
|
| 66 |
+
|
| 67 |
+
echo ""
|
| 68 |
+
echo "=============================================="
|
| 69 |
+
echo "Backend ready! (cold start: ${DURATION}s)"
|
| 70 |
+
echo "=============================================="
|
| 71 |
+
echo ""
|
| 72 |
+
|
| 73 |
+
# Offer to create checkpoint
|
| 74 |
+
echo "TIP: Create a checkpoint now for faster startup next time:"
|
| 75 |
+
echo " ./checkpoint.sh"
|
| 76 |
+
echo ""
|
| 77 |
+
|
| 78 |
+
# Keep the script running to maintain the server
|
| 79 |
+
wait $SERVER_PID
|
| 80 |
+
exit 0
|
| 81 |
+
fi
|
| 82 |
+
fi
|
| 83 |
+
|
| 84 |
+
if [ $((i % 10)) -eq 0 ]; then
|
| 85 |
+
echo " Still loading... ($i/180)"
|
| 86 |
+
fi
|
| 87 |
+
sleep 1
|
| 88 |
+
done
|
| 89 |
+
|
| 90 |
+
echo "ERROR: Timeout waiting for backend to be ready"
|
| 91 |
+
exit 1
|
start.sh
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# PARLE Backend Startup Script
|
| 3 |
+
# This script sets up environment variables and starts the FastAPI server
|
| 4 |
+
|
| 5 |
+
# ============================================================================
|
| 6 |
+
# CONFIGURATION - Edit these values before deploying
|
| 7 |
+
# ============================================================================
|
| 8 |
+
|
| 9 |
+
# TensorDock Auto-Stop Configuration
|
| 10 |
+
export TENSORDOCK_API_TOKEN="WBE5UPHOC6Ed1HeLYL2TjqbBqVEwn5MF"
|
| 11 |
+
export TENSORDOCK_INSTANCE_ID="befc5b17-7516-4ccd-a0ff-da2d4ecdb874"
|
| 12 |
+
export IDLE_TIMEOUT_SECONDS="120" # 2 minutes
|
| 13 |
+
|
| 14 |
+
# Email Alerts (get key from https://resend.com)
|
| 15 |
+
export RESEND_API_KEY="" # Set this to receive email alerts
|
| 16 |
+
export ALERT_EMAIL="marcos@marcosrp.com"
|
| 17 |
+
|
| 18 |
+
# ============================================================================
|
| 19 |
+
# STARTUP
|
| 20 |
+
# ============================================================================
|
| 21 |
+
|
| 22 |
+
echo "=================================================="
|
| 23 |
+
echo "PARLE Backend Starting..."
|
| 24 |
+
echo "=================================================="
|
| 25 |
+
echo "Instance ID: $TENSORDOCK_INSTANCE_ID"
|
| 26 |
+
echo "Idle Timeout: ${IDLE_TIMEOUT_SECONDS}s"
|
| 27 |
+
echo "Alert Email: $ALERT_EMAIL"
|
| 28 |
+
echo "Resend Key: $([ -n "$RESEND_API_KEY" ] && echo "SET" || echo "NOT SET")"
|
| 29 |
+
echo "=================================================="
|
| 30 |
+
|
| 31 |
+
# Change to script directory
|
| 32 |
+
cd "$(dirname "$0")"
|
| 33 |
+
|
| 34 |
+
# Activate virtual environment if exists
|
| 35 |
+
if [ -f "/home/user/venv/bin/activate" ]; then
|
| 36 |
+
source /home/user/venv/bin/activate
|
| 37 |
+
fi
|
| 38 |
+
|
| 39 |
+
# Start the server
|
| 40 |
+
exec uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1
|
stt/.gitkeep
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Whisper STT Model
|
| 2 |
+
# HuggingFace: openai/whisper-small
|
tts/.gitkeep
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Kokoro TTS Model
|
| 2 |
+
# HuggingFace: hexgrad/Kokoro-82M
|