Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'.

  • 2024-12-06 08:00:00
  • 404 Media

Following an impressive series of questionable choices made by US entrepreneur Elon Musk, a large number of X (formerly known as Twitter) users decided to remove themselves from the platform, in favour of a new social media still in its infancy. BlueSky, an application born out of a previous internal Twitter project, thus became the perfect destination to escape from the clutches of AI.

The first weeks spent navigating that blue sky have solidified the choice of those intent on experiencing social media as it ‘used to be’: we don't realise the enormous privilege that is a sensible algorithm until we lose it. However, the dream of a platform uncorrupted by artificial intelligence seems to have been short-lived.

An employee of Hugging Face recently created and shared, on BlueSky, a dataset containing one million posts published on the platform itself, intended for research on machine learning and experimentation with social network data. The data within it, moreover, is not anonymous - it includes decentralised user identifiers.