A purported leak of 2,500 pages of internal documentation from Google sheds light on how Search, the most powerful arbiter of the internet, operates.

The leaked documents touch on topics like what kind of data Google collects and uses, which sites Google elevates for sensitive topics like elections, how Google handles small websites, and more. Some information in the documents appears to be in conflict with public statements by Google representatives, according to Fishkin and King.

    • redcalcium@lemmy.institute
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      It’s not a data leak, it’s a a leak of internal documentation in a google api client which supposedly contains “leaks” of how the google algorithm might works, e.g. the existence of domain authority attribute that google denied for years. I haven’t actually dig in to see if its really a leak or was overblown though.

      • douglasg14b@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 months ago

        Internal documentation leaking is still a data leak, it’s just a subset of a data leak.

        If it was sensitive information that commit would have been purged by now. The original PR (on the Google Clients repo) has no mention of problems, and there are no issues of discussions around rewriting the git history on that item.

        This makes me think this isn’t actually a problem.

        My org is less practiced on operational security than Google and we would purge that information within minutes of any of us hearing about it. And this has been on blog posts for a while now.