We don’t know it would be effective.
It would write legalese well, it would recall important cases too, but we don’t know that more data equates to being good at the task.
As an example ChatGPT 4 can’t alphabetize an arbitrary string of text.
Alphabetize the word antidisestablishmentarianism
The word “antidisestablishmentarianism” alphabetized is: “aaaaabdeehiiilmnnsstt”
It doesn’t understand the task. It mathematically cannot do this task. No amount of training can allow it to perform this task with the current LLM infrastructure.
We can’t assume it has real intelligence, we can’t assume that all tasks can be performed or internally represented, and we can’t assume that more data equals clearly better results.
That’s a matter of working on the prompt interpreter.
For what I was saying, there’s no assumption: models trained on more data and more specific data can definitely do the usual information summary tasks more accurately. This is already being used to create specialized models for legal, programming and accounting.
You’re right about information summary, and the models are getting better at that.
I guess my point is just be careful. We assume a lot about AI’s abilities and it’s objectively very impressive, but some fundamental things will always be hard or impossible for it until we discover new architectures.
I agree that while it’s powerful and the capabilities are novel, it’s more limited than many think. Some people believe current “ai” systems/models can do just anything, like legal briefs or entire working programs in any language.The truth and accuracy flaws necessitate some serious rethinking. There are, like your above example, major flaws when you try to do something like simple arithmetic, since the system is not really thinking about it.