Spaces:
Running
Fix for word randomization in large datasets
Hi! I’m a hobbyist. Thanks for the great app! Gemini helped me resolve a randomization bug, and I hope the solution helps others too.
Peter
The Fix in generation-panel.tsx: Add the Shuffle Helper:
function seededShuffle(array: any[], seed: number) {
let m = array.length, t, i;
const random = () => {
seed = (seed * 1664525 + 1013904223) % 4294967296;
return seed / 4294967296;
};
while (m) {
i = Math.floor(random() * m--);
t = array[m];
array[m] = array[i];
array[i] = t;
}
return array;
}
Update startGeneration (around line 124):
const originalData = getTextData();
const data = seededShuffle([...originalData], config.dataset.seed || 42);
This fixed the issue for me and allowed me to generate datasets from my entire text file! Thanks again for the great tool.
Hi Peter, @cherokeeGasoline
Thanks so much for using the tool! I'm really grateful that you not only found the bug but also took the time to work out a solution and share it with the community. Your seeded shuffle implementation is solid and will definitely help others who encounter the same issue.
It's contributions like yours that make open-source projects better. Really appreciate you sharing this fix!
i will soon update the fix
Best regards