Library of Congress Limits Tweet Archive

Library of Congress Limits Tweet Archive

Would you use Twitter differently if you knew your tweets were being maintained in an archive for future generations?

In 2006, Twitter handed over an archive of 21 billion tweets to the Library of Congress to continue an archival project that has lasted in the same manner until December, 2017.

However, as of January 1, 2018, The Library will only collect and archive “on a very selective basis.”

The organization published a white paper about the history, process, changes and reason for the change.

Growing Data

By 2013, 170 Million tweets had been archived.

Today 500 million tweets per day are created. Meaning a single year would produce 182.5 billion tweets.

The archival collection wasn’t just the 140 character tweet data, but also meta data along with each tweet. The meta data included date, account owner, followers, likes, comments, shares and other data.

The massive amount of data is thought to be a detriment not only ongoing storage but also the ability to manage requests for research information as is done with much of the Library’s archival information. The Library is aware of this need and has even identified sample uses for the data:

“What kind of information might researchers learn from the Twitter archive?

Some examples of the types of requests the Library has received indicate how researchers might use this archive to inform future scholarship:

  • A master’s student is interested in understanding the role of citizens in disruptive events. The student is focusing on real-time micro-blogging of terrorist attacks. The questions focus on the timeliness and accuracy of tweets during specified events.
  • A post-doctoral researcher is looking at the language used to spread information about charities’ activities and solicitations via social media during and immediately following natural disasters. The questions focus on audience targets and effectiveness.” 

What to Collect

Though the definition of selective tweets is not completely defined, the Library of Congress gave a definition of intent as stated on NPR:

“The Twitter Archive may prove to be one of this generation’s most significant legacies to future generations,” the library says. “Future generations will learn much about this rich period in our history, the information flows, and social and political forces that help define the current generation.”