Right now, robots.txt on lemmy.ca is configured this way
User-Agent: *
Disallow: /login
Disallow: /login_reset
Disallow: /settings
Disallow: /create_community
Disallow: /create_post
Disallow: /create_private_message
Disallow: /inbox
Disallow: /setup
Disallow: /admin
Disallow: /password_change
Disallow: /search/
Disallow: /modlog
Would it be a good idea privacy-wise to deny GPTBot from scrapping content from the server?
User-agent: GPTBot
Disallow: /
Thanks!
I’m on board for this, but I feel obliged to point out that it’s basically symbolic and won’t mean anything. Since all the data is federated out, they have a plethora of places to harvest it from - or more likely just run their own activitypub harvester.
I’ve thrown a block into nginx so I don’t need to muck with robots.txt inside the lemmy-ui container.
# curl -H 'User-agent: GPTBot' https://lemmy.ca/ -i HTTP/2 403
I imagine they rate limit their requests too so I doubt you’ll notice any difference in resource usage. OVH is Unmetered* so bandwidth isn’t really a concern either.
I don’t think it will hurt anything but adding it is kind of pointless for the reasons you said.