First they were just molehills which grew into mounds. Now they are mountains of Big Data, and what used to be a nuisance on the enterprise’s lawn is now a goldmine – but its size is enormous and we have to explore it all to realise its value. David Gibson, VP of Strategy of Varonis, examines the Big Data task ahead of us and explores some strategies to get us to the top.
The multiplying layers of data are beginning to crush us like a slowly encroaching glacier, but sometimes we barely notice it is happening until we are buried in another slab of information. If we take e-mail as just one section of this encroaching data mountain we can discover a lot about why our employees are at their wit’s end.
Varonis conducted a survey recently which found that staff in the digital office are learning to cope with increasing email traffic, but it is costing the enterprise dearly in time and mishaps. While staff continue to grapple with the email flow and experiment in attempts to deal with their burgeoning inboxes and more effective email management practices, organisations are faced with increasing numbers of email-related accidents.
According to the survey, 62% of respondents reported an accident — often with serious consequences — as a result of sending an email to the wrong person or with improper or unauthorised content. The study[i], questioning employees about their digital habits and vices, also revealed that one in 20 companies faced compliance issues as a result of a wrongly sent email. If you're not thinking about your own organisation by now you should be.
With 78% of respondents receiving up to 100 emails per day, and nearly a quarter receiving between 100 to 500 daily emails, the results underline the mounting pressure digital communication places on employees. One in ten workers now has more than 10,000 mails in their inbox. Nearly 85% of those surveyed spend 30 minutes or more every day organising their mail — over one and one half weeks of work every year. It is a safe assumption that they have underestimated the time spent doing this task and it may well take them longer.
The study reveals three different styles of email management: 34% of those questioned are ‘filers’ (emptying their inbox on a daily basis and filing messages into folders), 17% are hoarders (who never delete but tag and keep in just a few folders), and 44% combine both practices. However, a small but telling niche of 6% admit to completely giving up on maintaining control over their email. As information workers try to keep pace with the daily email barrage, the danger of unintentional misuse and risk rises. A concerning 62% of respondents report their company suffered an email mishap. With incidents that include simple embarrassment (64%), compliance issues (7%), and job or promotion loss (19%), it is clear that the consequences of employees being overloaded can be severe.
This is a problem which has crept up on us and has quickly overwhelmed us – unless we automate some of these e-mail processes and start managing our Human Generated Big Data our employees are going to drown in it. What we spied as a small molehill on the corporate lawn, and could be dealt with quickly, is now a mountain which we have to keep down to a manageable size.
We suggest five steps to scaling the Human Generated Big Data mountain:
1 Make sure you can see the mountain
One of the biggest problems about data protection isn't at all technical – as you can see from our data protection survey[1] the simple fact is that many companies don't actually know where their sensitive data resides. If you don't know where your more sensitive data is, not only can you not find it when you have to, but you cannot protect it either. As we pointed out many times in the past, the largest rise in the volumes of data is in unstructured data – in other words those files that you keep on file shares, e-mails, SharePoint and NAS drives scattered all over your company.
But it's all those attachments to e-mails, photos, sound recordings of conference calls and the magma of hidden data which is the start of action from which your company moves forward. With IDC estimating that 90% of the 1.8 zettabytes generated in 2011 was unstructured and predicting that over the next decade, the information managed by enterprise data centres will grow by a factor of 50, the scale of this problem is likely to grow - making manual management of permissions and migration virtually impossible. This problem usually becomes apparent when the company wants to move its data and then finds it does not know where its more sensitive data is, which data is active or stale, and to whom it belongs.
Our later data migration survey[2] found that: “A worrying 65% admitted that they were not very confident that sensitive data was only accessible to the right people during a migration. In fact, 79% admitted that they could not guarantee that their folders and SharePoint drives were safe from global access groups, with one third of these admitting that unprotected folders were rampant or unidentifiable. This could become very damaging during a merger or acquisition, potentially leaving unprotected folders open to thousands more people after a migration.”
The findings demonstrate that maintaining who has access to what is an ongoing problem for organisations. The scale of the problem that organisations face when moving terabytes of data may be surprising, as a typical terabyte contains about 50,000 folders, and of those folders about 5%, or 2500 folders, have unique permissions[3].
2 Is the mountain alive? Audit usage and analyse to find out
We all claim that we are buried under drifts of e-mail every day – but is this really the case or is it just our perception? After all one man's snowdrift may be another's avalanche. Is email really as busy as we think? The first thing we have to understand is that we cannot manage what we haven’t measured – most unstructured data use isn’t audited by default, so it will take a concerted effort to audit it properly. Without auditing, you can’t tell what data is out-of-date, what is critical, or who is using it. It is an essential pre-requisite before we get onto the live mountain to tame it.
3 Be aware who is climbing the mountain with you – access must be reviewed
You have to climb the mountain with the right team members. You will have to automate the entitlement review/permissions audit process. Automation can help identify the resources that need to be reviewed, align security groups with data sets, identify data owners, and route access information and actionable intelligence to those owners. Automation should also execute data owner decisions and provide auditable evidence that the process is being followed.
4 Give the owners intelligence
Owners need metadata about what the data in the mountain contains, who is using it, and who should and shouldn’t have access – there’s just too much of it to sift through it manually. Simply collecting the metadata will not be enough to help us visualise and understand the complex functional relationships which surround our data; the metadata must be synthesised and analysed to help us determine where sensitive data is exposed, who it belongs to, who has excessive permissions to it, and identify other data management and protection concerns. The torrent of metadata elements and the functional relationships between them are far too numerous and complex for humans to analyse manually, so we must turn to automated analysis. Automated analysis already plays a large part in how we interact with the world. For example: Amazon.com now makes recommendations about books you might like based on what you have previously ordered. iTunes and other online shopping engines have similar functionality. Credit card companies analyse transactions to spot possible fraudulent activity.
5 Put it all together
Using metadata in an intelligent and automated way we can work with data owners to create disposition criteria, eliminate duplication of effort and optimise authorisation. Automated analysis transforms an overwhelming mountain of objects into a digestible one, enabling us to pick out items of high interest so we don’t have to ferret through them manually. There are simply too many websites, books, songs, people using credit cards, and potential mates for any human to go through them all, much less analyse them. Automating the analysis of metadata will help us find the data and access rights that require our attention.
If you follow the five steps above you will firstly realise that climbing the human-generated Big Data mountain is not something you can do on your own – it’s a team effort. Secondly you will have to automate most of the processes. And thirdly you will have to start soon – these mountains just keep getting bigger—the sooner you start scaling yours the better.
About David Gibson
David Gibson has been in the IT industry for more than fifteen years, with a breadth of experience in data governance, network management, network security, system administration, and network design. He is currently VP of Strategy at Varonis Systems where he oversees product marketing and positioning. As a former a technical consultant, Mr. Gibson has helped many companies design and implement enterprise network architectures, VPN solutions, enterprise security solutions, and enterprise management systems. He is a Certified Information Systems Security Professional (CISSP).
[1] http://www.varonis.com/research/#dataloss
[2] http://www.varonis.com/research/#data-migration
[3] http://blog.varonis.com/why-unstructured-data-security-is-so-hard/
[i] Digital habits survey conducted by Varonis in October 2012