You are viewing a single thread.
View all comments View context
-2 points

@zogwarg

Consider traditional databases which let you search for strings. Vector databases let you search the meaning.

For one client, someone could search for “videos about cats”. With stemming and stop words, that becomes “cat” and the results might be lists of videos about house cats and maybe the unix “cat” command. Tigers, lions, cheetahs? Nope.

Vector database will return tigers/lions/cheetahs because it “knows” they are cats. A much smarter search. I’ve built that for a client.

permalink
report
parent
reply
-1 points
Removed by mod
permalink
report
parent
reply
5 points
*

I realize it’s probably a toy example but specifically for “cats” you could achieve the similar results by running a thesaurus/synonym-set on your stem words. With the added benefit that a client could add custom synonyms, for more domain-specific stuff that the LLM would probably not know, and not reliably learn through in-prompt or with fine-tuning. (Although i’d argue that if i’m looking for cats, I don’t want to also see videos of tigers, or based on the “understanding” of the LLM of what a cat might be)

For the labeling of videos itself, the most valuable labels would be added by humans, and/or full-text search on the transcript of the video if applicable, speech-to-text being more in the realm of traditional ML than in the realm of GenAI.

As a minor quibble your use case of GenAI is not really “Generative” which is the main thing it’s being sold as.

permalink
report
parent
reply
-1 points
Removed by mod
permalink
report
parent
reply

TechTakes

!techtakes@awful.systems

Create post

Big brain tech dude got yet another clueless take over at HackerNews etc? Here’s the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

Community stats

  • 2K

    Monthly active users

  • 432

    Posts

  • 9.6K

    Comments

Community moderators