Nuclear data waste

Introduction

Nowadays everyone seems to be heading to the hills to try and cash in on the new gold rush. Data! You have all heard that data is the new gold. Some call it the new oil, as that is their favorite fantasy, that’s all good. But there are drawbacks to this.

The cost of data and data waste

Like any resource you mine it does come with associated costs. A cost that has to be covered by the value you derive from it. That value has to exceed the money spend to gather, process and consume it. That can be an expensive business.

On top of that you have to deal with “the waste” it creates as a byproduct. Waste can be toxic. Mining data tends to produce nuclear data waste, the bad kind where “safe levels” are hard to determine. In the rush to grab the gold many forget a couple of important lessons from history. We should know by now we need to act proactively to avoid waste. That is the cheapest option in the long run and mitigates many of the risks. We should also know that not all data is gold, some of it just glitters, but it isn’t valuable. Fool’s data, like fool’s gold, is essentially worthless no matter how much money you have spent. Even worse, it’s still produces nuclear data waste asset than can get you into (legal) trouble.

Data storages and backups

How much data to you need to get to the gold and at what cost. Storage capabilities as well as storage capacity grows fast and cost seems under control for now. But will this last forever? And even if so, what’s the ratio of data gold versus raw data stored? Can we improve this ratio? Because even when things are cheap, why even do it if it is not needed?

Protecting the data and the waste

And then there is the cost with protecting that data as well as the governance around it.  The sad reality with data is that once you have it the probability that it will get you into trouble is real. Data, sooner or latter will get lost, misplaced, sold, hacked, leaked, … it’s almost guaranteed. Ask any real InfoSec professional (not the standard issue, policy quoting security officers, those are just windows dressing) and they will open your eyes to the reality of the risks. It’s very sobering.

The gold rush

As with any hype or gold rush we can avoid costly mistakes buy looking at history. Think about who benefited and who lost out. Think about why this happened and how. Can you see any parallels?

  • Many people are drawn to the data gold fields. Very few strike a gold vein.
  • There is a lot of money to be made selling the tools, supplies and gear to mine the data, process it, store and protect it.
  • Gathering raw data and processing it can be highly toxic
  • Storing and protecting the gold is expensive and hard.

Let’s dive a bit deeper into these issues, what these mean and how they materialize. They all have one thing in common for sure and that is that the fear of missing out is one of the driving factors.

Gold diggers

The reality is that many people that now become “data scientists” are not all highly skilled mathematicians and experts at statistics analysis. It’s a new hype, just like OLAP tools and data mining where before. We now have BI, big data and data science. That’s where the gold can be found so that’s where gold diggers flock to. Some have the skills, abilities and the luck to derive wealth from that. Most will just have the job digging.

Pick Axe

There is a lot more data than there is science in the hype created around data scientists. Data scientists should be great at math and statistics. Those are not very fields of human endeavor that do not scale well. They are not even popular. Attaching “scientist” to something doesn’t make it a science. Be sure of the quality of your gold and make sure it is not fool’s gold. But the gold rush is on. There’s money to be made. In an era where science is viewed by many as “an opinion” the urge to derive some credibility from adding “science” to any endeavor is a paradox. It is on the rise. Clearly, this shows the value of real science even when some only seem to like it when it suits their agenda.

But the field is exploding as companies want people working on all the raw data they collect. As one statistician stated: “I used to be a boring, underpaid geek with glasses, now that I’m a data scientist I’m cool, in demand and paid very well even if the work is less scientific.” That’s the nature of the beast. Her employer got a real statistician, but as the mines require a lot more bodies, many will make due with less.

Merchants

The “sure” money is in the supply chain. Storage, networking, compute … no matter where it is (cloud, fog or on-premises computing). There is money to be made with tools to process the data, protect it (backups) and secure it against unwanted prying eyes and theft. If you’re selling any of those business is a booming.

Everyone seems obsessed with collecting data. Luckily storage costs are down per GB and we can store ever more. We also need to protect more. But whoever deletes data? Who dares push the button? A lot of data is collected “just in case”. We might find gold in there later and if we don’t have it we cannot for sure. The fear of missing out in action. That is great if you’re selling stuff. Data lakes, data ponds, storage blobs or tables, Mongo DB or SQL PaaS, storage arrays, data processing technology and data protection. These can be products or services, it doesn’t matter, there is money to be made. And while you’re selling you’re not asking the buyers if the really need it. You don’t question them, you praise their insights and help them protect their investment. Everyone is doing it, so must you. The copy/paste strategy in action.

Nuclear data waste

While the vast growth in data is spectacular. A lot of it is crap. But there is very little effort put into  being selective. It’s too cheap right now to collect and store it. No one want to say “we don’t need it” and be the one to blame if you don’t have it.

But in the age of data leaks, hackers, privacy concerns and ever more legislation around data protection it’s worth making sure you don’t store data just because you can. Storing data holds inherent risks. Risk of losing it, corrupting it, deriving faulty information from it, leaking it, have it stolen or abused. It

In the age of GDPR and many other rightful privacy and data protection concerns collecting data should be treated like nuclear power. The value it brings is undeniable. But you don’t need vast amounts of nuclear fuel to deliver that value. You do need very good processes, fail safes, regulation, capable people and technology.

We should start looking at data as nuclear fuel and as such, after use and processing, part of it is left as toxic nuclear data waste. It’s a long-term toxic by product of the process of deriving information form data. Minimize the collection, storage and of data to achieve your goals at minimum cost and risk. Luckily, we have a very good solution for toxic data waste. You can delete it and wipe it securely.

We have to stop thinking that more is better when much of it is junk. The overhead of caring for that junk is ridiculous. We might have to do so for nuclear waste out of need. But for data there are alternatives. Destroy it if you don’t need it. It’s the safest way to handle the legal and reputation risks related to it. That will take a conscious effort.

Efforts and costs

Critical thinking about collecting data is lacking. That is understandable. There is a lot of the money to be made in data mining is in providing the tools to collect, process, store and protect the data. Even with many people that warn us of the security issues and legal responsibilities around it is often about selling services and products. For many all this might turn out to be a lot like the other gold rushes. There were a lot more suppliers of the tools that got rich than actual finders of profitable gold mines. This means there is also a lot of pressure and incentive to feed the “data is the new gold” beast.

Where a SQL database or a data warehouse at least meant you had to put effort into collecting the data, the rise of unstructured data technologies means way too often we don’t care and we’ll figure it out later. Imagine doing that to nuclear fuel! For now, the technical advances in storage and data technologies has allowed us to act without too much deliberation on the sanity of our choices. That might change, it might be wise to avoid the cold shower when it does and benefit from minimizing toxic data risks today.

Conclusion

Now true data gold is very valuable, but make sure you can recognize it. Just going through the motions and buying the tools, copying “in the know” statements from the internet isn’t going to cut it. That is called pretending. Sure, it’s fun. It is also a very dangerous and costly mistake when things get real. At best you look like an idiot with money. Many sales people will separate you from your money very efficiently.

The smarter organizations already have a data strategy that includes waste avoidance, reduction and management. Many don’t unfortunately. Collecting data for those is the main goal, driven by the tyranny of action over strategies. You have to be seen acting and being in charge. The buzz words have to be present and you have to come across as a “can do sir, yes sir” person. Well that is what will kill you. The late Norman Schwarzkopf knew this all too well.

Take care of your weaknesses, figure them out before they hurt you and before they destroy your ability to exploit your strengths. That people is a strategy exercise. I can do that for you and it will cost you a lot of money. But remember, strategies are not products you can buy, they are not commodities and as such buying them is a paradox. A strategy is what will give you the edge over your competitors.If you have others determine your strategy, your competitors will pay them more to find out . So, roll up your sleeves and put in the effort yourself. In the end, it’s all about common sense and this is true for data-mining, AI and BI as well.

Monitor the UNMAP/TRIM effect on a thin provisioned SAN

Introduction

During demo’s I give on the effectiveness of storage efficiencies (UNMAP, ODX) in Hyper-V I use some PowerShell code to help show his. Trim in the virtual machine and on the Hyper-V host pass along information about deleted blocks to a thin provisioned storage array. That means that every layer can be as efficient as possible. Here’s a picture of me doing a demo to monitor the UNMAP/TRIM effect on a thin provisioned SAN.

clip_image002

The script shows how a thin provisioned LUN on a SAN (DELL SC Series) grows in actual used spaced when data is being created or copied inside VMs. When data is hard deleted TRIM/UNMAP prevents dynamically expanding VHDX files form growing more than they need to. When a VM is shut down it even shrinks. The same info is passed on to the storage array. So, when data is deleted we can see the actual space used in a thin provisioned LUN on the SAN go down. That makes for a nice demo. I have some more info on the benefits and the potential issues of UNMAP if used carelessly here.

Scripting options for the DELL SC Series (Compellent)

Your storage array needs to support thin provisioning and TRIM/UNMAP with Windows Server Hyper-V. If so all you need is PowerShell library your storage vendor must provide. For the DELL Compellent series that use to be the PowerShell Command Set (2008) which made them an early adopter of PowerShell automation in the industry. That evolved with the array capabilities and still works to day with the older SC series models. In 2015, Dell Storage introduced the Enterprise Manager API (EM-API) and also the Dell Storage PowerShell SDK, which uses the EM-API. This works over a EM Data Collector server and no longer directly to the management IP of the controllers. This is the only way to work for the newer SC series models.

It’s a powerful tool to have and allows for automation and orchestration of your storage environment when you have wrapped your head around the PowerShell commands.

That does mean that I needed to replace my original PowerShell Command Set scripts. Depending on what those scripts do this can be done easily and fast or it might require some more effort.

Monitoring UNMAP/TRIM effect on a thin provisioned SAN with PowerShell

As a short demo let me show case the Command Set and the DELL Storage PowerShell SDK version of a script monitor the UNMAP/TRIM effect on a thin provisioned SAN with PowerShell.

Command Set version

Bar the way you connect to the array the difference is in the commandlets. In Command Set retrieving the storage info is done as follows:

In the DELL Storage PowerShell SDK version it is not harder, just different than it used to be.

Which gives …

clip_image004

I hope this gave you some inspiration to get started automating your storage provisioning and governance. On premises or cloud, a GUI and a click have there place, but automation is the way to go. As a bonus, the complete script is below.

 

Collect cluster nodes with HBA WWN info

Introduction

Below is a script that I use to collect cluster nodes with HBA WWN info. It grabs the cluster nodes and their HBA (virtual ports) WWN information form an existing cluster. In this example the nodes have Fibre Channel (FC) HBAs. It works equally well for iSCSI HBA or other cards. You can use the collected info in real time. As an example I also demonstrate writing and reading the info to and from a CSV.

This script comes in handy when you are replacing the storage arrays. You’ll need that info to do the FC zoning for example.  And to create the cluster en server object with the correct HBA on the new storage arrays if it allows for automation. As a Hyper-V cluster admin you can grab all that info from your cluster nodes without the need to have access to the SAN or FC fabrics. You can use it yourself and hand it over to those handling them, who can use if to cross check the info they see on the switch or the old storage arrays.

image

Script to collect cluster nodes with HBA WWN info

The script demos a single cluster but you could use it for many. It collects the cluster name, the cluster nodes and their Emulex HBAs. It writes that information to a CSV files you can read easily in an editor or Excel.

image

The scripts demonstrates reading that CSV file and parsing the info. That info can be used in PowerShell to script the creation of the cluster and server objects on your SAN and add the HBAs to the server objects. I recently used it to move a bunch of Hyper-V and File clusters to a new DELLEMC SC Series storage arrays. That has the DELL Storage PowerShell SDK. You might find it useful as an example and to to adapt for your own needs (iSCSI, brand, model of HBA etc.).

Dell SC Series MPIO Registry Settings script

Introduction

When  you’re using DELL Compellent (SC Series) storage you might be leveraging the  Dell SC Series MPIO Registry Settings script they give you to set the recommended settings. That’s a nice little script you can test, verify and adapt to integrate into your set up scripts. You can find it in the Dell EMC SC Series Storage and Microsoft Multipath I/O

Dell SC Series MPIO Registry Settings script

Recently I was working with a new deployment ( 7.2.40) to test and verify it in a lab environment. The lab cluster nodes had a lot of NIC & FC HBA to test all kinds of possible scenarios Microsoft Windows Clusters, S2D, Hyper-V, FC and iSCSI etc. The script detected the iSCSI service but did not update any setting but did throw errors.

image

After verifying things in the registry myself it was clear that the entries for the Microsoft iSCSI Initiator that the script is looking for are there but the script did not pick them up.

image

Looking over the script it became clear quickly what the issue was. The variable $IscsiRegPath = “HKLM:\SYSTEM\CurrentControlSet\Control\Class\{4d36e97b-e325-11ce-bfc1-08002be10318}\000*” has 3 leading zeros out of a max of 4 characters. This means that if the Microsoft iSCSI Initiator info is in 0009 it get’s picked up but not when it is in 0011 for example.

So I changed that to only 2 leading zeros. This makes the assumption you won’t exceed 0099 which is a safer assumption, but you could argue this should even be only one leading zero as 999 is an even safer assumption.

$IscsiRegPath = “HKLM:\SYSTEM\CurrentControlSet\Control\Class\{4d36e97b-e325-11ce-bfc1-08002be10318}\00*”

I’m sharing the snippet with my adaptation here in case you want it. As always I assume nu responsibility for what you do with the script or the outcomes in your environment. Big boy rules apply.