Where Is Your Data “Located”?



In a well-reported decision last week, a New York Court ordered Microsoft to produce emails stored on a server located in Dublin, Ireland.  There has already been some very good legal analysis of the opinion, which Microsoft has stated that it will appeal.  A key issue, of course, is whether a US-based court should have the ability to order the production of data “located” in a foreign country.

One of the issues with analyzing this problem is the application of old school ideas, like physical location, to electronic information.  It’s easy (and convenient) to think about data being stored on a server, which is an actual physical item, and identifying that data as being located in that place.

But unlike actual physical objects, data is easy to copy, and copies often are stored in different places for more convenient access or for data protection and backup purposes.  It’s likely that the email messages in Dublin were replicated several times, possibly on backup media such as a tape or on a backup server (both of which are physical items).

However, unlike physical objects, many people can have “access” to data at the same time, and physical proximity is generally not very important to that access.  So while only people located near the server in Dublin can physically touch that server, there are likely dozens or hundreds of people throughout the world with the ability to access the server and read the data stored there.  The only constraint to that access is having the security credentials to access it.

Conversely, it’s easy to turn the idea of physical access on its head.  Even if you were standing next to the Dublin server, you would not have access to its data without proper credentials.  Thus, even assuming that a court with jurisdiction could order to you to “get” the server, you might not have any ability to actually deliver the data stored on it.  In fact, with the right security and encryption, it’s possible to limit access to that information to just one person in the entire world!

The law changes slowly, and for good reason.  But until we have a better legal framework for analyzing electronic data issues, cases like the Dublin server will be difficult to predict and explain under our current legal structures.

Hot Topics for 2014



In strictly avoiding making new year’s predictions in this space, in the last few years I have:

Upon further review, that last one treads a little close to the prediction line, so I’ll try to steer clear this year. Let’s focus on a few trends that are already hot as we kick off 2014.

Machine Learning.  Clearly, predictive coding was a very hot topic in 2013.  But the idea of using those technologies to deliver automated classification, sentiment analysis and even “predictive compliance” holds potentially even greater promise for the enterprise.  As our friend Chris Dale noted in a thought piece last year, there are far more documents impacted by an enterprise-based machine learning and classification system than one used just during eDiscovery.  (Note that Big Data – another hot topic – can be closely related to this issue).

Archiving and Backup.  Lawyers can no longer put off their technology education.  As part of that process, every in-house lawyer — and everyone who works with in-house counsel — must have at least a basic understanding of archives (whether for email, file systems or Sharepoint) and backup systems.  These systems hold key corporate data for retention and protection, implicate retention, compliance and privacy concerns, and may also require eDiscovery.  When legal has better knowledge of these systems, it also helps the organization to create policies and processes to more effectively manage the information in the first place.

Privacy.  Data privacy was also a hot topic during 2013.  And with tough state laws going into effect, the EU considering even stricter requirements and getting tough on the Safe Harbor, plus tougher enforcement in the US, there’s a lot to consider.

BYOD.  “Bring Your Own Device” is another issue that started strongly in 2013 and just seemed to get bigger.  Maybe that’s partly because it’s such a difficult and perhaps even unrecognized issue to solve (although we did have some thoughts on the process).  Thinking more about how BYOD impacts your compliance, privacy, data retention and eDiscovery processes is a big first step.

Happy 2014 and hope to see you all at Legal Tech.



How Large Is Your Digital Shadow (Part 2)



In part 1 of this post, I wrote about the “Digital Shadow” and provided some examples of all of the data that is being created about you or on your behalf, in addition to the data that you create.

Here, we’ll walk a few activities in a typical day, and identify some (but definitely not all!) of the digital shadow that’s being created.   On the left side of the table is a listing of activities, with a discussion on the right noting the digital data that’s being left behind.

Wake up, turn on my phone to check for new texts and emails, surf the web for news and read email in my personal email account. The cell phone carrier has a record of my phone contacting the local tower when it is turned on. knows that I logged into my account and has information about my location based upon my IP address (tracked for security and other purposes).   Since I’m logged into email, my email provider keeps track of my searches (I can turn this off).  My ISP (the cell is often on WiFi for data when I’m home) has information about the sites that I visit.  My browser locally records my history, and the sites I visit may be leaving or updating cookies and/or capturing my IP address along with some unique identifiers on their own servers.


Later, I start my work day, logging into the corporate VPN and checking and responding to email messages. The VPN system has information about my log-in, and some of my activities are preserved in email including replies and new messages that I create.  The recipients of my email messages also have a copy, and each copy may be replicated many times for email archives and data protection (backups, etc.).  I don’t think my company tracks my location, but it could.


One of my emails includes a file sharing link to a folder for content.  I add this link/file to my folder, and make changes to one of the presentations in that shared area. When I add the link, a copy of everything in the shared folder is made on my laptop, and the information about the link is logged by the system.   This process is replicated for all of the accounts where this app is installed (cell phone, tablets, etc.)   As I change and re-save the presentation, everyone sharing the folder receives the update (and this information is logged and distributed as “news” to other sharing the folder).


I grab a quick lunch at my favorite sandwich shop, and while waiting I check in on Facebook, make a few posts and re-tweet a message on Twitter The shop tracks my purchase (and thus my location at that time) with my loyalty card.  Facebook and Twitter both have new content from me that they time-stamp and (unless I’ve turned it off) also know and save my location.  My location is tracked by my phone.  As with every purchase I make today using a credit card, data about my purchase is tracked and available to me online; it is also stored and shared within my credit card company as permitted.


I finish my blog post and push “publish”. The post is published to one of our company blogs.  This automatically triggers a tweet about the posting from a few company accounts (and I send my own Tweet), which in turn generates additional data through re-tweets.  My tweet may include my location (this can be turned off).  The blog is captured and republished by several “automatic” online news sites looking for compliance stories – so now it exists on their servers, is backed up by them, and sometimes even re-distributed to hundreds or thousands of subscribers as part of a newsletter distributed in email form.


I’m flying later today, so I visit the airline’s site to view the status of my upgrade request and check-in.  I rent and download a movie to my tablet for the trip The airline’s systems capture my log-in and my check-in information.   iTunes records my purchase (as does my credit card company) and the movie is downloaded to my tablet for later viewing.


A client calls and we talk for a few minutes.  Then I attend a meeting remotely by phone and web conference.  I call my client back with some additional ideas and leave her a voice message. “Metadata” on both sides of the call is recorded by the carriers – start time and number, end time, etc.  I log into the web conference with my browser and use a password, so that system records my IP address, along with everyone else including the owner of the account. The duration of the conference and the time at which individual attendees “drop” the conference is probably saved.  The voice message system now has a log of my call (time, duration, phone number) along with a recording of the actual message, which might be transcribed and sent by email (e.g. Google Voice).


On the drive to the airport, I use a social GPS phone app to check traffic The GPS app uses my location so it knows where I am, and is combining that information with thousands of others to update traffic information.   My cellphone carrier is also creating records as it switches cell towers along the way.  The toll pass on my windshield notes the time and my account (i.e. me) as I electronically pay my fare on the highway.  My car has dozens or hundreds of sensors recording information that will be downloaded at a later date when it is serviced.


As I park my car at an offsite location, I send a quick text to a friend. My parking service uses a card reader that registers my entrance to the facility. My carrier creates a record of the text I send, as does my friend’s carrier.  The texts are also stored on each phone (and possibly some tablets and other devices if linked to the same account).


At the airport, I check my bag and head through security. My airline knows that I’m at the airport and has data about my luggage, too.  Since I’m in the TSA “Pre” line, TSA’s systems know and record my location on check-in.


Waiting for my flight, I buy a cup of coffee and take a photo of an item that may make a good gift. My credit card company registers the time, date, location and amount of my purchase.  The photo that I take is stored, and it tags itself with location info using GPS (unless I turn off this feature….), all of which is stored on my phone, which replicates to a cloud and then across other devices.


During the flight, I work on a spreadsheet and a presentation. Okay, this didn’t really happen because there’s no room in the cramped plane – but if it did, I now have new documents on my laptop, which will also be replicated and backed up soon.


Upon landing, I reclaim my luggage, pick up my car rental and use my GPS app to drive to my hotel. The airline tracks the location of my bag and the time that my flight arrived.  My rental company records the time and location of my rental, and of course the GPS app knows when and where I left the airport, along with the hotel where I stay.  The hotel keeps information about my check-in, and my credit card company knows that I’m there, too.


That night I log into the hotel wifi, check some emails and call it a night. The hotel’s wifi system maintains information on my log-in for billing (and possibly security) purposes.  I may have forgotten, but my trusty DVR at home remembers to record a few of my shows, which are stored on the DVR’s drive.



I intentionally created a very small amount of this data – the blog post, the photo, and some changes (one copy!) to files that I edited, along with some email and a few pieces of social media content.  Yet my activities generated dozens and dozens – if not hundreds – of discrete data chunks, some of which will be preserved for a long duration.


All of this data poses interesting questions, most of which have not been clearly answered:  Who owns and controls this data?  How much of it is / should be subject to privacy requirements?  Is this data available for: eDiscovery; compliance; other purposes?  Should I be made aware of the data that’s being created and stored?  Should I have the right to demand that the data is not retained for long, never retained, or never even created?  Would these answers be different if I lived in Europe?  What if I’m a US Citizen traveling there, or vice-versa?

It’s an interesting exercise, try it out yourself – you may be surprised by your results!




How Large Is Your Digital Shadow? (Part 1)



Most of us are at least vaguely aware of the staggering amount of electronic data we’re creating.  Here’s a quick refresher from our friends at IDC:

From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 
40,000 exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes for every man, woman, and child in 2020). From now until 2020, the digital universe will about double every two years.

That seems like a lot of data!  But once you give some thought to all of the different types of data being created today, it starts to add up and make sense.

Consider the following types of data that are regularly being created:

  • Data that I create directly and on my own – email messages, spreadsheets, presentations, Twitter, Facebook posts, etc.  Remember that each time that I reply to a message or forward an email with photographs, I’m “creating” a copy of that data in addition to whatever new information I add to the original
  • Data that is created for me using a device or a tool – think about digital still and video cameras, scanners, DVRs
  • Copies of data that I create or are created on my behalf – downloaded (video rentals, e-books, MP3s) and uploaded (YouTube, Facebook, Instagram) music and videos, photographs from friends that I keep, etc.
  • “Digital Shadow” data – information that is created about me (IDC says that the data in the digital shadow is actually larger than the information that you create).  This includes credit card transactions, preferences on systems like Amazon, loyalty cards, etc.
  • System data and logs.  A large amount of data is created by our activities through the systems that we use such as firewall information, sites we have accessed, cookies on our browsers, toll pass data, etc.  (Some of this is covered within our Digital Shadow).
  • A significant amount of data is also created by various systems, including those for data protection and compliance – archives, replication and backup systems that ensure data is available when needed.

Why is this important?  Much of this data is directly subject to compliance obligations (and even when it’s not, it’s often hard to separate it from data that is, so it’s all lumped together), which costs organizations money to properly store, secure, protect and even “discover” for litigation purposes.  Other data leaves a record of activities that we may not want to share – today or next year,  depending on who is accessing that information and for what purpose.  If you put it all together, in many ways all of this information forms a diary of our thoughts and activities.  And there are few of us who would want our diary to be an open book.

In part 2 of this post, we’ll cover a “day in the life” and detail many of the types of data being created by normal activities.  What you see may surprise you!

File Archiving: Keeping It Simple



Let’s face it, file servers are messy. Even the most well intentioned efforts to organize file shares suffer from the same problem: users.
Since I joined EMC 9.5 years ago, I’ve never deleted any files that I stored on any shared drive or my own hard drive. Every once in a while I think I should clean up some old files, but I rarely have any spare time. In reality, if I spend any time cleaning up anything, it is my email inbox. I find I get more anxious by thousands of emails in my inbox than I do by GBs worth of file data.
But what doesn’t have to be messy is how you license your archive technology. For this, the EMC Data Protection Suite is a simple, neat package that allows you to license data protection capabilities – both EMC archive and backup software – in a capacity model.
This makes it easy for customers and partners to take advantage of all of the EMC data protection software capabilities without the complexity of different licensing models and part numbers.
I think the greatest impact from an archive perspective is the simplicity to license EMC SourceOne for Files in a capacity model. With all of the different options available with EMC SourceOne for file archiving, file tiering, indexing in place, linking, stubbing etc., it used to be a daunting task for customers and partners to configure all the relevant pieces.
Now with the Data Protection Suite, licensing EMC SourceOne is simply by the capacity of what you want to archive. And you can grow your capacity as your volumes increase.
We can’t help with messy users, but we can make it simple to purchase EMC archiving software to manage your data growth. Find out more about the EMC Data Protection Suite here. LB

How Long Should You Keep Your Data?



At last, the definitive answer to the question “How long should we keep our data?”.

It depends.

I understand that this answer will not make me any more popular.  But before you rush to judgment, consider a few issues that make data retention periods such a complex issue.

Ideally, data is retained based upon its content, not its type or location.  So an email message that is a contract should be retained for the period that you retain contracts.  But how long is that?  Many organizations have retention schedules specifying that they will retain contracts for the applicable statute of limitations after the contract has been fully performed.  A common statute of limitations for the breach of a written agreement is six years.  If the agreement takes two years to complete, then the contract itself should be retained for eight years.  But how do you know when the contract has been completed so that you can start the clock on the six year retention?  And whose job is it to figure this out?

And that’s just for contracts.  What about financial statements, marketing materials, sales proposals, etc.?  Even in areas of strict regulation, there can be highly complex issues.  For example, under the new Dodd-Frank Act, certain financial companies that deal in swaps (derivatives) must maintain detailed information about those swaps for five years past their “completion” (and they must be maintained on compliant storage, but that’s even another issue!).  Many of our customers have mentioned that they deal in swaps with an anticipated duration of 40 years. Sometimes those swaps complete earlier, so an initial expectation of a 45 years retention period could quickly be reduced to ten years or less.

That’s why the answer to the retention question is that “it depends”.  In the real world where employees receive 100+ messages each day, we just cannot (yet!) reliably classify every email and file based upon its content.  So we take shortcuts.  We use “big buckets” to set retention periods that will safely capture and retain important content; we divide our organizations into functional areas (legal, HR, sales, operations, finance) and set default retention periods based upon risk and the type of information that is usually created and maintained in those departments.  We develop robust processes and deploy tools so that we can quickly segregate eDiscovery content efficiently, allowing us to keep deleting production data as it “expires”.  It’s not the simplest task, but with a little effort (and a cross-functional team), your organization can figure out how long it should keep its data.

Culling Your eDiscovery Costs



According to an informal recent survey, 79% of legal departments are performing their own eDiscovery collections.  Although that number is probably lower in the real world (the poll measured only those attending a special Law Department Roundtable at ILTA), it is an encouraging development.  Taking charge of eDiscovery collections can cut cost and risk, in addition to delivering better overall insight into the underlying litigation matter.

What about the rest of the eDiscovery steps in the EDRM — are those being performed in-house or outside?  The same poll revealed that only 31% did any culling before the data left the organization.  This seems like a missed opportunity since so much of the expense in the eDiscovery process – over 70% according to one survey — comes during the Review phase, and culled data won’t need to be reviewed.

So if you’re looking to cut some additional costs from your eDiscovery process, consider a few simple culling techniques:

  • Remove clearly extraneous email, such as content from espn.com, cnn.com, etc.  In the right case, you might even be able to limit email further to just one external domain (i.e. the other party in the litigation matter);
  • Determine whether you can bound the information by date range.  In other words, is information before or after a certain date not likely to be relevant based upon the facts of the case;
  • If you are collecting “everything” from laptops, desktops and file systems, are there file types that can safely be culled — such as log files, videos, photos, music collections and executable files?  Some of these could be relevant in the right case so check to be sure;
  • De-duplicate!  This is both easier and more difficult than you might think!  Be sure that you understand the basis on which you are culling duplicative content.

Most cases will present you with opportunities to cull collected data before sending it outside for further work.  Having the right tools in place and adding a few steps to your eDiscovery workflow can help to uncover significant savings.

Data Privacy: Coming To A Country Near You?



In the US, we do not tend to think much about data privacy in the workplace.  We generally default to a belief that our employer owns the network and devices, so it has  the legal right to store, process (and view) the content that we create – even email messages.  But those rules are not the same in many other areas of the world, particularly in the European Union.  And many organizations with operations outside of the US may soon find themselves in the middle of a clash of cultures.

In the EU, personal data – which is broadly defined — is subject to the EU’s data privacy directive.  Personal data cannot be processed or transferred outside the EU area, such as to the US, without an “adequate safeguard”.  In practice, this means that everyday IT operations such as archiving, backup and even transfers between data storage devices (such as tiering) must have an “adequate safeguard” if data is moving from the EU to the US.

Most organizations in this situation have relied on a relatively straightforward Safe Harbor self-certification to meet the “adequate safeguard” requirement.  But recent developments, including news of the NSA’s surveillance operations, have put the Safe Harbor at risk, with some calling for its repeal.  In addition, German data protection authorities are already limiting the Safe Harbor exception.  These developments may require many organizations to find a new safeguard from limited options:  either Model Contracts or Binding Corporate Rules, both of which are more complex and difficult to implement in practice.

Of course, many organizations have long relied upon a third option — the unofficial “head in the sand” exception where transfers are made without any recognized safeguard in place.  Generally speaking, enforcement of the data privacy directive has been sporadic. But even that may be changing, with proposed changes to the privacy directive enabling fines of up to 2% of global revenue for violators.  That threat could force many “head in the sand” users into strict compliance.

For now, the Safe Harbor remains in place.  With the recent activity, it’s probably a good idea to run an internal audit to confirm your organization’s compliance.  As the EU becomes even more aggressive in this area, many organizations will need to strike a better balance between the lax privacy requirements of the US and an increasingly strong privacy regime in the EU.

Considering an Email Archive Change?



Is your current solution one the Hewlett-Packard Autonomy archiving solutions acquired through acquisition (e.g. CA Ilumin, Zantaz EAS, Mimosa NearPoint, HP IAP)?

Is your current solution Symantec Enterprise Vault (pre- v10) and are you are facing an expensive (and time consuming) migration including a complete hardware refresh?

EMC understands that email archiving is a mission critical application and requires a company with a long-term commitment and solid resources; otherwise it is just a slow path to disaster.  To get you back on solid ground, EMC offers the industry’s most complete family of archive solutions.

For on-premise archiving, EMC SourceOne delivers the industry’s most advanced archive technology.  SourceOne is a full 64-bit application and installs in a completely virtual server environment.  SourceOne integrates email, file and SharePoint content in a single instance archive; supported with full legal discovery, retention management and audit reporting.

EMC understands that archiving also demands proper storage and thus offers the widest ranges of archive storage solutions that integrate seamlessly with SourceOne.  These archive storage solutions include EMC Centera compliance storage, EMC Data Domain deduplication storage, EMC Isilon NL scale-out NAS storage and EMC Atmos cloud-based object storage.

The first step to get you back on solid ground is to contact EMC Sales.  EMC Sales is ready to perform a thorough analysis of your current archive environment and provide a solution proposal to migrate your existing email archive data to SourceOne.  For migration, EMC works with partners who are expert in email migration.  And new SourceOne licenses are aggressively priced to make the transition as affordable as possible.

There has never been a better time to explore the exciting advancements in EMC archive technology, for your next on-premise email archive solution.  To learn more about the role of archiving in Microsoft Exchange environments read this white paper.

Contact EMC Sales  and begin your move to SourceOne.