Hi,
Just like the title says:
I’m try to run:
With:
- koboldcpp:v1.43 using HIPBLAS on a 7900XTX / Arch Linux
Running :
--stream --unbantokens --threads 8 --usecublas normal
I get very limited output with lots of repetition.
I mostly didn’t touch the default settings:
Does anyone know how I can make things run better?
EDIT: Sorry for multiple posts, Fediverse bugged out.
I would guess that this is possibly an issue due to the model being a “SuperHOT” model. This affects the way that the context is encoded and if the software that uses the model isn’t set up correctly for it you will get issues such as repeated output or incoherent rambling with words that are only vaguely related to the topic.
Unfortunately I haven’t used these models myself so I don’t have any personal experience here but hopefully this is a starting point for your searches. Check out the contextsize
and ropeconfig
parameters. If you are using the wrong context size or scaling factor then you will get incorrect results.
It might help if you posted a screenshot of your model settings (the screenshot that you posted is of your sampler settings). I’m not sure if you configured this in the GUI or if the only model settings that you have are the command-line ones (which are all defaults and probably not correct for an 8k model).