In an age of LLMs, is it time to reconsider human-edited web directories?

Back in the early-to-mid '90s, one of the main ways of finding anything on the web was to browse through a web directory.

These directories generally had a list of categories on their front page. News/Sport/Entertainment/Arts/Technology/Fashion/etc.

Each of those categories had subcategories, and sub-subcategories that you clicked through until you got to a list of websites. These lists were maintained by actual humans.

Typically, these directories also had a limited web search that would crawl through the pages of websites listed in the directory.

Lycos, Excite, and of course Yahoo all offered web directories of this sort.

(EDIT: I initially also mentioned AltaVista. It did offer a web directory by the late '90s, but this was something it tacked on much later.)

By the late '90s, the standard narrative goes, the web got too big to index websites manually.

Google promised the world its algorithms would weed out the spam automatically.

And for a time, it worked.

But then SEO and SEM became a multi-billion-dollar industry. The spambots proliferated. Google itself began promoting its own content and advertisers above search results.

And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.

My question is, if a lot of the web is turning to crap, do we even want to search the entire web anymore?

Do we really want to search every single website on the web?

Or just those that aren’t filled with LLM-generated SEO spam?

Or just those that don’t feature 200 tracking scripts, and passive-aggressive privacy warnings, and paywalls, and popovers, and newsletters, and increasingly obnoxious banner ads, and dark patterns to prevent you cancelling your “free trial” subscription?

At some point, does it become more desirable to go back to search engines that only crawl pages on human-curated lists of trustworthy, quality websites?

And is it time to begin considering what a modern version of those early web directories might look like?

@degoogle #tech #google #web #internet #LLM #LLMs #enshittification #technology #search #SearchEngines #SEO #SEM

20 points

Main problems are:

  1. Link rot

  2. Sneakily inserted sponsored links

permalink
report
reply
7 points

@Moonrise2473 @ajsadauskas
3. Infinitely growing list of categories.
4. Mis-categorisation

i remember learning HTML (4.0) and reading that you should put info in a <meta> tag about the categories your page fits in, and that would help search engines. Did it also help web directories?

permalink
report
parent
reply
4 points

@ajsadauskas Back when, UW Madison hosted an outfit called The Internet Scout Project that was in the curation business for web resources. The decaying state of search (alternatively the growth of web resources intended to serve interests other than their visitors’) has me thinking it would be good to work with public libraries to convene and host this sort of thing.

Librarianship is the right sort of ethos for it, and libraries are infrastructure for human-mediated discoverability.

@degoogle

permalink
report
reply

@ajsadauskas @degoogle Curlie https://curlie.org/ is the continuation of the ODP

permalink
report
reply
1 point

Oooh, I like it! Thank you so much for sharing this here! :)

permalink
report
parent
reply
2 points

And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.

True, but these things can also be used by us, to curate/maintain a high quality link collection. However, I’m not sure ‘pages’ will be read by humans in 5 years, so I have a feeling we wont need such a collection anymore. Well, not for humans but probably for our individual LLM’s.

permalink
report
reply
2 points

Just to add to your list of steps and consequences: I also think academic studies about information retrieval, indexing and crawling became less popular. Aspirant students hearing the message: those studies / workfields will become obsolete once AI does all that.

permalink
report
reply

DeGoogle Yourself

!degoogle@lemmy.ml

Create post

A community for those that would like to get away from Google.

Here you may post anything related to DeGoogling, why we should do it or good software alternatives!

Rules

  1. Be respectful even in disagreement

  2. No advertising unless it is very relevent and justified. Do not do this excessively.

  3. No low value posts / memes. We or you need to learn, or discuss something.

Related communities

!privacyguides@lemmy.one !privacy@lemmy.ml !privatelife@lemmy.ml !linuxphones@lemmy.ml !fossdroid@social.fossware.space !fdroid@lemmy.ml

Community stats

  • 612

    Monthly active users

  • 285

    Posts

  • 4.3K

    Comments