The Internet Archive reportedly has over 50 petabytes of data archived. That seriously, mind-bogglingly, utterly not very small at all.

@liw Makes me wonder what the write-to-read ratio of that content is. I suspect it's far above 1.

345 billion web pages saved. That compares with 39 million books in the US Library of Congress (largest collection in the world) and about 130 million books ever published (growing at 300k/yr trad, 1.5m/yr nontrad publishing).

TIA's 2016 report suggests ~90 kB/pg, or ~2% of a book (~5 MB PDF text).

TIA have 5 billion books worth of data.

It's just like my workplace!

Takes a while to rebalance storage, even with pretty snappy networking.

@liw it's a git-annex powered, crowd sourced effort to back up Internet archive. Joey was involved in some way

@Jmtd Oh, I think I did hear something about that. Awesome stuff.


@Jmtd @joeyh That would have been the perfect time to use the name "Internet2". :)

