Everyone loves the idea of scraping, no one likes maintaining scrapers that break once a week because the CSS or HTML changed.
I loved scraping until my ip was blocked for botting lol. I know there’s ways around it it’s just work though
I successfully scraped millions of Amazon product listings simply by routing through TOR and cycling the exit node every 10 seconds.
I’m down with scraping, but “parses HTML with regex” has got me fucked up.
Just a heads up for companies thinking it’s wrong to scrap: if you don’t want info to be scraped, don’t put it on the internet.
The sad part is that scrapping is often easier then using the api.
Much less beholden to arbitrary rules also. Way too many times companies will just up and lift their API access or push through restrictions. No ty, I’ll just access it myself then
API starter kit
- Outdated and unsupported and hasn’t been replaced yet but is the standard way to use the service.
- Lots of authorization tokens.
- The example in the docs doesn’t work (if there is one).
- You have no idea where the online tutorial got the information because it doesn’t have links to resources and the docs have barely anything even though its giant.
- Uses asynchronous programming to make it faster but its still much much slower then scrapping without asynchronous programming.
I scrape with bash lord help me.