Fix for word randomization in large datasets

#1
by cherokeeGasoline - opened

Hi! I’m a hobbyist. Thanks for the great app! Gemini helped me resolve a randomization bug, and I hope the solution helps others too.
Peter

The Fix in generation-panel.tsx: Add the Shuffle Helper:

function seededShuffle(array: any[], seed: number) {
let m = array.length, t, i;
const random = () => {
seed = (seed * 1664525 + 1013904223) % 4294967296;
return seed / 4294967296;
};
while (m) {
i = Math.floor(random() * m--);
t = array[m];
array[m] = array[i];
array[i] = t;
}
return array;
}

Update startGeneration (around line 124):

const originalData = getTextData();
const data = seededShuffle([...originalData], config.dataset.seed || 42);

This fixed the issue for me and allowed me to generate datasets from my entire text file! Thanks again for the great tool.

Hi Peter, @cherokeeGasoline

Thanks so much for using the tool! I'm really grateful that you not only found the bug but also took the time to work out a solution and share it with the community. Your seeded shuffle implementation is solid and will definitely help others who encounter the same issue.

It's contributions like yours that make open-source projects better. Really appreciate you sharing this fix!
i will soon update the fix
Best regards

Sign up or log in to comment