Your average sentient being has likely never heard of the term “silent data corruption;” however, I can guarantee they have experienced it in some manifestation or another. Before I began writing this blog, I asked a few privileged “bystanders” to divulge what came to mind when this clearly malevolent term was brought to their attention. I feel it’s only fair to point out that the extent of their knowledge around the subject of “data integrity” encompassed no more than the peripheries of calling their local cable service provider because their network broadband was running extremely slow when streaming last week’s episode of The Walking Dead.
The three responses I was able to solicit were quite interesting, to say the least.
Let’s start with my wife (God bless her utter and complete lack of interest in the industry I work in).
Me: “Honey, what comes to mind when I ask you to define silent data corruption?”
Wife: “Let me see. Isn’t that what I do to you when you upset me (which is quite often)?”
Me: “Um…well actually, that would be called the silent treatment.”
Wife: “Yeah, whatever. You get the point.”
The response I was made privy to from my buddy (who happens to be a huge movie buff) also deserves an honorable mention.
Me: “Hey bro, what comes to mind when I ask you to define silent data corruption?”
Buddy: “Let me see. Oh, that’s easy. I would think it was a form of torture that uber genius scientist and villain Dr. No used on James Bond to extract the exact location of the radio beam weapon he planned on using to destroy the world. Obviously, using the dreaded silent data corruption technique on James Bond’s brain would be the optimal torture method to use. Duh.”
Me: “You know what. You are totally right.”
The last response from my conspiracy-theorist laden neighbor was probably the closest thing I got to some semblance of a somewhat tangible answer (Yeah, about as tangible as the dead muskrat on Donald Trump’s head he calls hair).
Me: “Hey there Jim, random question that’s work related. What comes to mind when I ask you to define silent data corruption?”
Neighbor: “It’s exactly what I’ve been telling all you naïve people for years. The government has been infiltrated by spies like Dennis Rodman working for Kim Jong Un’s North Korean broadcast agency. They are planning to release a silent data corruption virus on our networks so we can’t watch quality entertainment like Duck Dynasty and Bayou Billionaires!”
Me: (Not wanting to upset an already agitated individual) – “I hear you Jim. What’s next? North Korea spreading a silent data corruption virus to prevent us from watching Here Comes Honey Boo Boo? The travesty!”
With that, I politely left his premises and barricaded myself in my house.
Silent Data Corruption is the Real Enemy – Not Dr. No
Unfortunately, silent data corruption is very much real and if it could be used as weapon of mass destruction, you bet your bottom dollar that Goldfinger would have used it on Sean Connery instead of that totally boring laser.
Silent data corruption can best be defined as the non-malicious loss of data resulting from component failure or inadvertent administrative action. Like my wife, you are probably asking, “and what’s the big deal?” Well, the big deal is that data is the lifeblood of organizations. Whether you are a multibillion dollar corporation like Google dealing with 7.2 billion page views and more than 20 petabytes of data processed a day, or Joe’s Organic Coffee cart down the street running 20-30 financial transactions a day. The end result is the same. The loss of data is catastrophic.
Think of data as a lone secret agent (i.e. 007) trying to make it out of an imploding bad guy ‘s secret lair that’s filled with a plethora of thugs and villains trying to take him out (via razor sharp bowler hat or stainless steel teeth of course). In order for data (“007”) to come out “unscathed” as it travels through the Storage Area Network (SAN) between servers and storage arrays, database vendors have added logical integrity checks like “Error Correcting Code (ECC)” which is designed to protect “server memory.”
Here’s the problem: Even with these checks, increasing complexity of the data center environment and growth in storage have led to significant concerns about silent data corruption. The potential for problems in these areas has increased as data centers have moved to virtualized servers, multi-core processors and faster server buses. For example, operating systems (OSes) have to deal with more complex memory mapping, which increases the potential for data to be corrupted with unusual “edge” conditions that are difficult to fully test.
But really what it comes down to is without end-to-end protection technology, data corruption can go unnoticed until recovery is difficult and costly or even impossible to perform. Furthermore, without end-to-end integrity checking, these silent data corruptions can lead to unexpected and unexplained problems.
The common perpetrators of “silent data corruption” are comprised of the following three usual suspects:
- The OS, including the core OS and device driver
- Storage hardware and firmware
- Administrative errors
One of the most common areas where data corruption can occur is writing to disk drives. There are two basic kinds of disk drive corruption. The first is “latent sector errors,” which are typically the result of a physical disk drive malfunction. An example would be a file system read error reported from a disk array. This type of corruption is usually detected by ECC or CRC in the I/O path, and most often is corrected automatically.
The second type is silent corruption, which can happen without warning. This is by the far the most cataclysmic type of data corruption and there are no effective means of detection without end-to-end integrity checking.
Administrative errors, also known as good old human error, in fact, cost corporations millions of dollars a year in undoing everything from missed backup windows, failure to detect corrupt data packets and ironically, failing to report catastrophic failures in time to mitigate the problem in a timely manner.
The very reasons described above, were what led to the creation of the “Data Integrity Initiative” by the joint efforts of Oracle, Emulex and EMC. Our collective goals are to provide end-to-end data integrity from the application to the storage, protect mission-critical application information from corruption and eliminate the potential costs and downtime associated with silent data corruption.
Shameless Emulex Plug
Emulex LightPulse® Fibre Channel Host Bus Adapters (HBAs) play a key role in helping eliminate silent data corruption by providing the following business benefits:
- Emulex is the only Fibre Channel HBA vendor to provide T10 Protection Information (T10 PI) support with full line-rate performance and no systems overhead with our vEngine™ CPU offload technology, elevating the reliability and operations of Oracle database environments
- Emulex HBAs provide IT managers with the ability to protect their company’s data and resources, while maximizing service level agreements
- Emulex has the only HBA qualified with EMC’s VMAX array offering T10 PI Host DIFF Support
Silent data corruption is expensive, disruptive and causes more than just heartburn for your IT staff (not to mention pain and angst for secret agents, wives, movie buff friends and paranoid neighbors).
Don’t let silent data corruption ruin your day. Oracle, Emulex and EMC bring years of expertise, know-how and tried and true solutions to help you eliminate silent data corruption once and for all.
Check out the recent article published by Oracle and can be prominently on the following Oracle sites:
And don’t forget to visit us at http://www.emulex-oracle.com/ to learn about the latest Oracle and Emulex offerings.