Digital permanence refers to the long-term preservation and accessibility of digital assets, ensuring they remain intact and usable despite technological advancements and degradation risks. It involves strategies such as redundancy, format standardization, and cryptographic integrity checks to safeguard data against obsolescence, corruption, and loss.
Key approaches include decentralized storage, robust metadata management, and migration to future-proof formats. Effective digital permanence requires a balance of technological resilience, legal considerations, and sustainable infrastructure.
Researchers have encoded the entire human genome onto a “5D memory crystal”. For over a decade, the gold standard for the most durable data storage medium has been crystal. More specifically, a nanostructured glass disc developed in 2014. The 360 terabyte data crystal will remain stable at room temperature for 300 quintillion years, a lifespan that only drops down to 13.8 billion years (i.e., the universe’s current age) if heated to 190°C. In addition, it can resist both higher and lower temperatures, direct impact forces up to 10 tons per square centimetre, as well as lengthy exposure to cosmic radiation.
This is a BBC article that starts with the statement that 25% of web pages posted between 2013 and 2023 have vanished.
It is an introduction to the Internet Archive, an American non-profit based in San Francisco, started in 1996 as a passion project by internet pioneer Brewster Kahl. The organisation has embarked what may be the most ambitious digital archiving project of all time, gathering 866 billion web pages, 44 million books, 10.6 million videos of films and television programmes and more.
I make occasional contributions to both Wikipedia and the Internet Archive.
This is an important issue, so read the article to find the answer.

The article “There’s a library on the moon now. It might last billions of years” tells us that 30 million pages, 25,000 songs and a whole bunch of art was left on the Moon by the Galactic Legacy Archive. Sunlight and gamma rays which bombard the Moon’s surface would break down paper, so the archive is etched in nickel, on thin layers so tiny that you need a microscope to read them. For the music and images there is a nickel-etched primer describing the digital encoding used.
This is an article on digital permanence, but what does that mean? In it simplest form it means “what you read is what was written”. So in its most basic form it is actually about data integrity.
Data integrity is the maintenance of the accuracy and consistency of stored information. Accuracy means that the data is stored as the set of values that were intended. Consistency means that these stored values remain the same over time, so they do not unintentionally waver or morph as time passes.
Digital permanence refers to the techniques used to anticipate and then meet the expected lifetime of data stored in digital media. Digital permanence not only considers data integrity, but also targets guarantees of relevance and accessibility, that is the ability to recall stored data and to recall it with predicted latency and at a rate acceptable to the applications that require that information.
To illustrate the aspects of relevance and accessibility, consider two counter examples. Journals that were safely stored redundantly on Zip drives or punch cards may as well not exist if the hardware required to read the media into a current computing system isn’t available. Nor is it very useful to have receipts and ledgers stored on a tape medium that will take eight days to read in when you need the information for an audit on Thursday.
I am a particular fan of this article because it describes well a problem that most people ignore. Below I’m quoting directly from the article…
Information storage in the digital age has evolved to fit the scale of access (frequent) and volume (high) by moving to storage media that record and deliver information in an almost intangible state. Such media have distinct advantages, for example, electrical impulses and the polarity of magnetised ferric compounds can be moved around at great speed and density. These media, unfortunately, also score higher in another measure, that is fragility. Paper and clay can survive large amounts of neglect and punishment, but a stray electromagnetic discharge or microscopic rupture can render a digital library inaccessible or unrecognizable.
It stands to reason that storing permanent records in some immutable and indestructible medium would be ideal, something that, once altered to encode information, could never be altered again, either by an overwrite or destruction. Experience shows that such ideals are rarely realised, for example, with enough force and will, the hardest stone can be broken and the most permanent markings defaced.
In considering and ensuring digital permanence, you want to guard against two different failures, firstly the destruction of the storage medium, and secondly a loss of the integrity or ‘truthfulness’ of the records. Once you accept that no ideal medium exists, you can guard against both of these failures through redundancy. You can make a number of copies and isolate them in different ways so some of them can be counted on to survive any foreseeable disaster. With sufficient copies kept under observation through frequent audits and comparison, you can rely on a quorum of those copies to detect and protect the record from accidental or deliberate alteration.
The article goes on to look at how failures can be categorised, how risks can be mitigated, how accessibility can be maintained, and how you can still recover from ‘bitrot‘.