Subscribe via E-mail

Your email:

Follow Me

NAS Optimization Blog

Current Articles | RSS Feed RSS Feed

Enabling WAN Efficiency in the Upstream Workflow

 

As we work with oil and gas information technology departments, the topic of WAN efficiency gains traction. Let’s talk about some different set-ups and how to improve performance without experiencing productivity-crushing latency.

Primary Data Center Supporting Remote Offices

I talked to a customer recently that had a remote office in Denver. This oil and gas company had their primary, wholly provisioned storage container in Houston and did not have the same infrastructure at this remote Colorado location. Interpretation applications being used in Denver needed to have access to reference data in Houston. Rather than having to deploy a wholly provisioned storage solution in Denver and copy to the remote location, we are deploying Avere nodes near the direct Denver-based consumers of the data and pointing the Avere nodes back over the wire to the data center in Houston.  

RemoteOffices Upstream resized 600

As the users load data into their application, data is primed into the Avere nodes, either artificially or dynamically as the data is accessed for the first time. As these Denver users execute changes or create output, that output is saved temporarily in the Avere FXTs in the remote office and synchronized back over the wire to the primary data center at some scheduled time or a regular interval.

Multiple Data Centers

A bit more interesting of an environment is where larger companies have primary plus secondary or even tertiary data centers (Sometime these are just primary data centers separated by geography.) The locations have completely provisioned compute grids, interpretation applications, storage, and users in one location. That location is mirrored around the globe. In this situation, what you sometimes have happen is projects might be received in one place, but then the need to reference the data happens somewhere else.

Multipledatacenters Upstream resized 600

We have a customer that receives the data in Houston where they will do some initial processing, but then they have users sitting in Malaysia that want to access that data set over that long distance. To replicate that data using traditional data management tools took three weeks from being received in Houston until the users in Malaysia could reference the data set.  With Avere in place, that same data is now available the next day. From three weeks to one day – a big efficiency gain!

The other nice thing in this situation is the ability to share that data. So as data is manipulated in one place, those changes can then be pushed back over to the other data center where they originated. Data protection can happen on either side of that wire; end users can bounce back and forth in the workflow as they use that same data. 

Cloud Computing

A slight shift from this is something that I’ve started seeing more and more of even in just the last few months. In this situation, service providers create “a sponge“ of processing power. Basically, oil and gas companies supplement compute power, in a primary data center by leasing some compute cycles from a service provider at a colo facility somewhere to get a temporary boost in processing capability.

CloudComputing Upstream resized 600

The challenge with this is pretty obvious. We’ve got really large data sets that don’t lend themselves very well to being transported from one storage container to another, particularly one that exists within another vendor’s infrastructure. (That tends to cause all kinds of heartburn!) In oil and gas, this is typically not a dataset or an I/O profile that lends itself to execution over a WAN connection due to latency. 

In this infrastructure, these customers place Avere nodes nearest to these “Cloud” compute nodes and then have the user reference their data back at the primary facility through that compute node. As the SPEC results show, we’re able to execute some very high I/O just going through the Avere System. We don’t have to keep hitting that core filer that sits on the back-end.

Cloud Storage

People are starting to build out (call it Cloud if you want, but it could really just be a storage container somewhere else inside of your own infrastructure, but is not right next door to the consumers of the data) these remote storage vaults of data. By having Avere FXTs sit close to the consumer of the data, it can allow the movement of that data out of the primary data center and reference some sort of Cloud container. Avere’s FlashMove technology allows you to take data, reference it through an Avere system pointed to a local filer, then move data from one storage container to another at your discretion.

 

WANOptimize1 resized 600

 

Data Independence

Most of these examples shown in the illustrations here show consumers sitting on one side of the wire and storage sitting on the other. However, there is not a requirement for Avere to be in both locations. Our best practices are that Avere sit next to the consumer location. The data itself can live on its own wherever you want it to be. Now, in these examples here where we have Avere FXT nodes in both places, the point is that the Avere nodes can reference the same sets of data, different sets of data, or for that matter, data cascading from one Avere to another providing different benefits to different places – a very flexible infrastructure!

I covered these use cases in a recent webinar as well as more detail on storage efficiency and flexibility in the upstream workflow. Download the recorded presentation and slides here. Questions? I’d be happy to help.

describe the image

 

 

Virtualization Dilemma: Skyrocketing Costs of Storage Performance

 

In virtualized environments, budget and performance demands are moving at different speeds. How long can this continue? A scenario where budgets increase at the same rate as storage demand or even faster seems highly unlikely.

Rise of Data DemandA study of information technology professionals by DataCore Software found some interesting but not surprising results. Consider the following survey findings:

  • 71% of respondents have storage budgets that have remained the same or were lowered year-over-year 

Yet,

  • 51% said storage now accounts for more than 25% of their virtualization budget 

Also not surprisingly, the participants see storage costs as an obstacle to adoption and migration of more mission-critical apps in virtualization, with performance was noted as a top concern. With budgets already at a standstill or in compounding decline, adding enough Flash/SSD to keep up just isn’t a sustainable solution to the problem.

One of the first things for storage managers in virtualized environments to realize is how to measure and then balance three elements: cost, performance and capacity. To do this, we must measure and compare common options. Data center management needs to choose storage solutions for virtualization based on getting the highest performance at the most comfortable cost. Measurements used are:

  1. IOPs/VDI 
  2. VDI-IOPS / Rack Unit 
  3. Cost per VDI IOP 

With these three calculations known for each vendor, better decisions can be made factoring cost as well as performance and footprint.

Recent tests demonstrated and compared VDI performance of popular data center hardware. You can see the results in this pre-recorded Webinar or this summary article.

The Adoption of Cloud

Cloud is more and more being recognized as the “golden ticket” to solving these storage woes. The ability to access disk space on demand is quite attractive for cash-strapped data centers trying to keep users productive.

Way back in 2011, Vanessa Alvarez of Forrester wrote about how these virtualization and cloud technologies relate and invited readers to take a look at new solutions to facilitate the building of private cloud environments to, “deliver scale-out economics and simplified management of storage resources in virtualized environments.”

As storage administrators look to meet growing demands with shrinking or static budgets, emerging technologies that provide real performance gains can prolong the useful life of existing infrastructures and close the gap between budget and performance. Dynamic-tiered Edge filers offered by Avere are proven appliances for scale-out optimization of virtualized environments.

Traditionally, storage managers have all VDI images accessible directly from Core storage filers. Placing the Edge filer in front of the Core filer – moving the Core filer out of users’ paths - can solve performance issues and reduce the need of expensive SSDs with both high costs and high performance. As we see in the study mentioned above, most cannot afford this approach. The use of an Edge filer in the infrastructure to bridge a private Cloud only enhances the manageability of data and brings VDI activity closer to the users, with significant scalability gains. With Avere, you can get up to fifteen times more VDI seats with minimal VMware reconfiguration by placing a hybrid-NAS Edge filer cluster in front of your Core filers without upgrading them. This approach allows you to grow your VDI infrastructure to a larger scale for less money, as you are no longer bound by the I/O capabilities of the Core.

Ultimately, we are all aspiring to deploy the most efficient and manageable storage possible, and that requires the convergence of every component of the enterprise data center infrastructure. This brings many challenges, but emerging technology solutions offer hope to create affordable growth. The best of these solutions bring end user performance to the enterprise without costly overhauls or expansive footprint growth.


 

Oil & Gas Big Data Defined

 
Big Data for Oil & Gas

 

Coming from exhibiting at the American Association of Petroleum Geologists (AAPG) Annual Conference in Pittsburgh, we continue to hear about performance and latency issues in the oil and gas industry. Demands upon production increase and so do data center constraints. Intensive software used in the upstream process can cripple performance, and we all know about the relationship between time and money.

In this infographic, you can see how the upstream workflow in the oil and gas industry faces a classic big data challenge. From acquisition to reservoir engineering, every step requires massive amounts of data, fast access, and the ability to make quick, accurate decisions. But fast isn’t everything, you also need to be able to transition from one workflow stage to the next at the lowest cost. Throwing faster drives at the infrastructure is simply not sustainable.

To help shed some light on proven improvements, Brian Bashaw, Storage Architect, Oil & Gas, will be presenting:

Optimizing the Upstream Workflow: Flexibly Scaling Performance to Meet Seismic Processing Demands

Tuesday, June 4 @ 2:00 p.m. EDT

In this session, Brian will review how a large producer not only improved performance of its NAS infrastructure, but also its manageability for less money - efficiency with real user benefits. The content will go beyond a case study to discuss how technology engineering solves common data challenges, the storage technologies proving optimal impact, and future needs in global WAN environments common in the industry.

This Webinar is now past. To access the Webinar recording and slides, submit a request.  

UpstreamWebinarMaterials resized 600

  

Related Posts:

Q&A: NAS Optimization for Seismic Processing

NAS Matters in the Upstream Workflow

7 CGI-Intensive Must-See Movies of Summer

 

In celebration of the unofficial summer season this weekend, we're looking beyond the data center to enjoy the end results of optimizing network-attached storage (NAS). These summer releases break barriers with unbelievable CGI and visual effects. When you sit down to see any of these, save some popcorn for the end and just look at the long list of names that are behind the effects creation. Your jaw will drop. Now think of the data center crew keeping all those people doing their thing. That's where Avere helps. And, we love what they create...

The Summer Movie Season

The summer movie season started early this year. Waiting until Memorial Day weekend must have gone out with the “wearing white” rule.  Some of the most intensive CGI films of the summer movie season (probably even of the year) are already in theaters. But this summer, the season extends well beyond the end of August with some big blockbusters positioned for release into October. Or maybe this is early holiday movie season? Either way, they look too good not to mention now.

So here is our list of “must-see” movies that had to have their storage blazing to pull-off. All pictures link to the trailer, so don't be afraid to preview.

Iron Man 3

May 3 - Tony Stark’s path continues and so do the amazing graphics.

Iron Man 3

Click the poster to watch the trailer.  

Star Trek Into the Darkness

May 16 - J.J. Abrams and the ILM crew deliver in this action packed stunning production. It has many scenes that will just blow you away... 

STID Spock Sun DxB6tAFfnH9z resized 600

Click the poster to watch the trailer.

Man of Steel

June 14 - We in Pittsburgh know a thing or two about steel, but this is going to be something to see.

MOS resized 600

Click the poster to watch the trailer.

Monsters University

June 21 - Another Pixar gem. College-bound Mike and Sulley will certainly be way more fun than a frat party. 

describe the image

Click the poster to watch the trailer.

Pacific Rim

July 12 - Who could be better than Guillermo Del Toro to capture a Kaiju and Jaegers war?

PacificRim resized 600

Click the poster to watch the trailer.

Elysium

August 9 - Travel to 2154 where Earth is no longer the best place to live. Its better out there...

elysium movie poster resized 600

Click the poster to watch the trailer.

Gravity

October - Gravity will be both heart-pounding and breath-taking. We may have to wait until October, but it's a perfect way to wrap up the summer season. 

gravity movie poster resized 600

Click the poster to watch the trailer.

 

Recently, Framestore talked about how this movie required massive support of its storage infrastructure and how they delivered. Read the case study.

 

First-of-Its-Kind Hybrid Storage Appliance Now Available

 

We are pleased to announce availability of the first-of-its-kind hybrid NAS appliance. 

In April, Avere announced the successful achievement of its newest FXT 3800 Edge filer breaking our past record in the SPECsfs2008 benchmark testsThe results demonstrate that Avere’s edge-core architecture delivers superior application performance to any NAS environment, including those where the storage is located across a long geographic distance from the end-users, with a significant reduction in cost and footprint in comparison to legacy NAS solutions.

Learn more about the FXT 3800 in this five-minute overview by director of product marketing, Jeff Tabor.

Prior to this latest benchmark result, Avere already held the top spot for highest performance for a single file system/namespace. For the new test, Avere inserted 150ms of latency – equivalent to a cross-continental wide area network (WAN) link – between the Edge and Core filers that comprised the system under test, demonstrating the viability of a private cloud for enterprise applications requiring shared storage.

describe the imageThe tests did two things. First it demonstrated that you get the best performance at the lowest cost by using a combination of storage media in the most efficient manner. The new FXT 3800 uses new hybrid technology, that now automatically tiers data across four media types: RAM, SSD, SAS and SATA HDDs, delivering maximum performance for the hottest files, while at the same time moving cold data out of the performance tier and onto SATA to minimize costs and shrink the data storage footprint. Dynamic tiering assures that every block of file data is located in storage that matches its current level of activity. As a result, the new system is 40% faster than the FXT 3500, the company’s previous top performer on the SPECsfs2008 NFS benchmark test, and is far less costly than flash-only solutions.

SPECsfs Comparison resized 600

Secondly, the tests demonstrated that the only way to use cloud storage for anything other than backup or archive is to eliminate the WAN latency inherent in legacy storage solutions by moving to an edge-core design where the active data is held closest to the end users or compute farm. The FXT 3800 makes this possible - the perfect Cloud launchpad.

Results were achieved during testing of a 32-node cluster of Avere FXT 3800 appliances using the SPECsfs2008 NFS benchmark, which showed the system achieved a record-setting combination of 1,592,334 ops/sec throughput and minimal latency of 1.24ms overall response time (ORT).

The Avere FXT 3800 Edge filer contains 144GB of DRAM, 2GB NVRAM and 800GB of SSD to accelerate the read, write and metadata performance of most active data. It contains 7.8TB of 10k SAS HDDs to store a large working set of recently active data. The FXT 3800's 2x 10GbE and 6x 1GbE ports allow connectivity to clients and servers for high performance access to active data and to core filers for infrequently accessed data. Each unit can be clustered to other FXT Edge filers with scaling of up to 50 nodes for linear performance and high availability.

Features don't matter if you can't back them up with performance. We are very excited about how the FXT 3800 will help deliver flexibility, performance and savings to enterprise data centers.

Why does it work? Learn more in this introductory video. 

Q&A: NAS Optimization for Seismic Processing

 


Raised Hand for Q&AQuestions & Answers on Optimizing Network Attached Storage for Seismic Processing

Of all the applications in the oil and gas industry’s upstream workflow, seismic processing places the greatest demands on network attached storage (NAS). Pre-stack and post-stack migration, velocity modeling, and other processing steps are challenging even the highest performance NAS systems. To meet this challenge, Avere offers its unique Edge-Core architecture, which optimizes seismic processing with accelerated performance, reduced cost, and a streamlined workflow.

In our initial post in this series, we reviewed the challenges and solutions for three parts of the upstream workflow. For seismic processing specifically, we identified:

Challenge: NAS systems must provide tens of GB/s of throughput to seismic processing applications while also managing petabytes of data for interpretation and reservoir engineering in a cost-effective manner. The challenge is developing a NAS infrastructure that doesn’t complicate the upstream workflow and delay the time to final results.

Solution: But what if you could separate performance scaling from capacity scaling? Could this deliver both more efficiently? With an Edge filer, it can by offloading performance from Core filers, which in turn enable Core filers to be built with cost-effective and dense nearline disks, reducing cost. Plus, using an Edge filer can reduce or eliminate copy steps to provide a smooth handoff to seismic interpretation and reservoir engineering. When the Edge and Core filers are separated by a WAN, seismic processing jobs can be run in remote offices with a substantially smaller storage footprint.

SeismicProcessingInfrastructure resized 600

Let’s answer some common questions about this approach and it’s ability to provide an uncomplicated NAS infrastructure that allows for timely access to final processing results.

How does this technology help with scalability?

To deliver necessary throughput for seismic processing, each FXT node delivers more than 2GB/s throughput to demanding applications such as time migration. Avere Edge filers scale performance linearly as FXT nodes are added to the cluster, enabling scaling performance of seismic processing applications to more than 100GB/s throughput on a 50-node cluster. Software automatically places “hot” data blocks on the SSD/Flash media contained within the FXT cluster, replicating and striping data across multiple nodes when needed to provide parallel access to the hottest data blocks.

So, explain how this saves money?

When you take the approach of adding Avere FXT Series of Edge filers, costs are reduced in three ways:

  1. Because Avere Edge filers can accelerate the performance of existing environments, you can avoid complete upgrades the current NAS infrastructure. No matter what brand of Core filers you use - EMC Isilon, NetApp, Panasas, DDN, or other NAS vendors – an FXT Edge filer can seamlessly operate in the environment. 
  2. Avere Edge filers offload the performance from the Core filers, enabling the Core filers to be built with cost-effective and dense nearline storage, saving 50% or more on capital expenses. Here’s two examples: (a) A data center using NetApp 3000 series with SATA disks rather than NetApp 6000 series and 15k SAS disks, and (b) Whitebox NAS systems composed of commodity servers, Linux, and high-density disks.
  3. Edge filers provide savings on ongoing operating expenses due to reduced rackspace, power, and cooling. 

Simple? It can’t be.

It is common practice in seismic processing for NAS vendors to force many administrators to implement specialty and proprietary NAS solutions (e.g. Panasas, Lustre on DDN), inefficient data copy and move steps, and client code management into their upstream workflow. The data copy and move steps add delay, which causes reduced quality or slower time to final results or both. Client code is challenging to manage and is proprietary to the individual vendor. We’ve eliminated extra steps in the workflow, duplication of data that adds to capacity costs, or client code to manage. The administrator is enabled to select the best Core filer for managing the end-to-end upstream workflow. At the seismic processing stage, all jobs are run through the Avere FXT cluster, offloading the Core filer. When the processing is done, the Avere FXT cluster ensures that all the processed data is available on the Core filer and the seismic interpretation phase can seamlessly begin. So, yes. It can be simple.

Remote office connectivity is blowing my budget and is proving to be a nightmare to manage. What about fixing that?

Running seismic processing jobs in remote offices that are hundreds or even thousands of miles away from the primary data center is costly and complex to implement. Complete storage solutions and replication software must be purchased for the remote offices and data replication schedules must be created and managed on a daily basis. Avere FXT Edge filers replace the complete storage systems and dramatically reduce the cost and footprint in the remote office. When processing jobs are run at the remote office, the FXT cluster automatically populates with the active seismic data without the need to configure replication schedules or other management steps. Over time, as the need for performance at the remote office increases, the Avere FXT cluster can be simply grown to keep pace with the demand, saving even more money and minimizing infrastructure complexity.

Have more questions? Check out the recording of our recent Webinar: Optimizing the Upstream Workflow, presented by Storage Architect, Brian Bashaw.


describe the image


Measuring VDI in a NAS Environment

 

This post is part 2 of 2 in our sharing of content presenting in a recent Webinar, Untangling the VDI Storage Enigma. Links to related content are available at the end of this post.

The LoginVSI Benchmark Environment

The only way to really understand how VDI is going to affect your storage environment is to use a testing tool that can reliably simulate your intended environment. The VDI testing tools available in the marketplace require varying levels of capital investment and hardware resource requirements. Avere chose to use the LoginVSI for our VDI performance benchmarking tests.

Let’s take a look at the test environment we built.

VDI Testing Environment resized 600

As you can see from the above diagram, we built:

  • 1500 VDI instances - these were Windows 7 VMview5 Linked Clones with persistent disks
  • These were spread out across 25 ESXi hosts
  • The 25 VMware ESXi hosts were configured with the VMWare View
  • The Avere FXT 4500 Edge filer was the configured as the vSphere NFS datastore, providing an all-Flash/SSD tier to handle the hot read/write data for the VDI instances
  • Inactive data was tiered down to the Core filer, a NetApp FAS2240 with 750 GB SATA drives
  • 10 Gigabit/sec network connections throughout to eliminate networking bottlenecks
  • All infrastructure services for the testing were also virtualized using VMware vSphere: Windows Active Directory, MS SQL server, VMWare View Connect server, and the LoginVSI launchers

So, what were the results?

First, we needed to determine what levels of performance were deemed acceptable versus unacceptable. This threshold is understood to be the point at which the act of adding another VDI seat to the infrastructure causes the user experience of all other VDIs to degrade. The LoginVSI benchmark measures how long it takes each logged-on user to perform common activities such as opening spreadsheets, word-processing documents, email messages and general web browsing. The benchmark is able to identify long-running task outliers as well as measure the overall experience of all active VDIs. When the task response times for the overall pool of VDI users exceeds the threshold, the test marks that point and uses it for final analyses of the system's capabilities.

Our test environment had 1472 VDI instances running the LoginVSI Heavy Workload against a single FXT 4500 node backed by the SATA FAS2240. This generates approximately 11.2 VDI IOPS per instance (Windows 7 + MS Office, running on VMWare View 5 Linked Clones with Persistent Disks). The result was measured while there were already 992 VDI instances running successfully in the background. The chart below shows the additional measured 480 instances running in conjunction with existing 992 instances for a total of 1472.

VDI Test Chart resized 600

This chart from the Avere Performance Dashboard shows the operations serviced by the Edge filer throughout the benchmark run:

describe the image

The single FXT 4500 supported 1472 VDI users, handling 17,000 VDI IOPS per second with 8 millisecond response times. These VDI IOPS are the blend of read and write ops of various I/O sizes as described in the part 1 blog post on this topic.

From the above chart, we see the workload that the benchmark drives and we are also able to calculate how many IOPS are being generated per desktop.

We have found results from other vendors' VDI testing where they determine a number of VDI instances and the quality of the experience the group of VDI users is going to have. We’re going to compare three common ones: LoginVSI (Heavy and Medium workloads), VDI IOmark, and VMware View Planner. All of these benchmarks were run with the reference GuestOS and hypervisor technology - Windows 7 with MS Office and VMWare View.

When we break the results down, we look for rack units for storage, the solution cost, and how many VDI IOPS were achieved. The comparisons below all come from published data available on the Internet.

describe the image

When faced with objective of running as many VDIs as possible into as little of physical space as possible, the important measurement is how many VDI IOPS can be handled by each rack unit of storage (VDI IOPS per RU). Avere’s test results compared to others' results showed 4124.4 VDI IOPS per rack unit. The next highest density was the Dell Equalogic using LoginVSI Medium at 2800 VDI IOPS per rack unit.

The other important component is price. Using published list prices, we determined the cost of the solution per VDI IOP which eventually boils down to price per VDI once you know how many IOPS each of your VDI seats requires. The Avere solution delivers each VDI IOP at a list price of $10.79, the lowest cost of the benchmark comparisons above.

To get at the best performance at the most affordable price, it's best to compare these two measurements side-by-side. The graph below highlights this comparison across the multi-vendor results analyzed. You’re looking for the highest performance (blue, higher is better) at a comfortable cost (red, lower is better).

describe the image

The Avere Edge-Core Architecture for VDI

Now that the test results have been analyzed and compared, let’s review in a bit of detail what the actual Avere Edge-Core Architecture looks like for a VDI environment. In an existing infrastructure, all of your VDI images are accessed directly on the Core filer. With Edge-Core Architecture, the Core filer is moved out of the critical path and the Avere FXT Edge filer cluster is placed into the critical path. This allows you to grow your VDI infrastructure to a much larger scale since you are no longer bound by the I/O capabilities of the Core filer. The Avere Edge filer clusters can be sized to accomodate up to 15 times more VDI seats than was previously possible, without having to upgrade your Core filer. All of this is done with minimal reconfiguration on your VMWare side.

VDI Architecture resized 600

Read the first part of this series, NAS in a VDI Workflow and the synopsis of Avere's testing

This overview comes from a recent Webinar titled Untangling the VDI Storage Enigma. You can watch the entire Webinar and view the slides by clicking here.

 

 

5 NAS Action Items to Make VFX Shops Lean and Mean

 

When working on creating feature films, the effects of interactive users and 2D and 3D applications like Maya, Nuke, Houdini and Arnold hitting storage infrastructure can cause a slowdown for users working under stringent production deadlines.

Traditionally, companies look to their storage vendor to address the issue, adding faster processors or more spindles. In this solution, the storage system fulfills both performance and capacity requirements. And, as the company grows, a repetitive cycle of adding filers affects footprint and costs of running and managing a larger and larger infrastructure.

For rendering shops, several approaches can prevent NAS overload, keep users productive and help manage costs.

  1. Implement dynamic-tiering with Flash. The introduction of dynamic tiering can increase performance by putting “hot” data on RAM and SAS disks and improve performance scaling linearly and predictably. 
  2. Separate performance from capacity. Being able to only grow capacity or only grow performance can help system engineers better manage the budget by matching equipment needs with current business needs, ultimately decreasing equipment costs and reducing the need for ever-increasing datacenter space. 
  3. Identify storage hot spots, IOPs and latency trends. With the right software, you can quickly and accurately monitor and trend usage to better plan for investments and changes. 
  4. Use lower cost SATA drives where possible. SATA simply costs less to purchase and operate than SAS. By better managing data between tiers of various types of storage media, costs will drop. 
  5. Move large datasets without user disruption. The ability to transfer large files from one storage media to another without bringing users to a halt allows for a savings of both time and money through better control of where active and inactive files live. 

DelRosario Quote IE Post resized 600Image Engine of Vancouver, British Columbia, put Avere’s Edge filers to the test during the height of their work on Zero Dark Thirty last year. Storage expert, Gino Del Rosario, Head of Technology, shared the outcome in a case study captured by Avere. You can read it for yourself by downloading a copy.

 

AWS Summit SFO - Avere on theCube

 

Earlier this week, Chris Archianaco of Avere Systems, talked with Dave Vellante and Jeff Frick about Avere's latest cloud solution at Amazon Web Summit 2013 in SanFrancisco. 

Transcript:

Dave: I’m Dave Vellante, Chief Analyst at Wikibon.org and I’m here with Jeff Frick, my colleague. This is theCube, Silicon Angle’s continuous coverage of the AWS Summit here in Moscone in San Francisco. AWS does about a dozen of these events around the world. Its big event is re:Invent - last November was a big breakout - really Amazon flexing its muscle showing the world of that is serious about bringing club the enterprise. These smaller events, smaller but this good – Jeff, what would you say, there about four or five thousand people here?

Jeff: Yeah, it's got a good buzz.

Dave: Yeh, so a good buzz, a lot of practitioners, a lot of IT professionals and partners, and one of partners here is Avere. Chris Archinaco is here; he is with Avere, a company is in the storage business, which is bringing storage capabilities to the cloud, embracing the cloud - not fighting the cloud. Chris, welcome to theCube.

Chris: Thank you, thank you for having me.

Dave: Avere is a very interesting company; you got some cool tech out of Carnegie Mellon. Ron, your CEO that I’ve met before – a total geek. He told me that he loves talking to our CTO, David Floyer, because the start talking about different profiles, about how to optimize performance and just really interesting stuff, but let’s up the level in here. What's happening with Avere, give us the high-level view.

Chris: So, first and foremost traditionally we've always been a network-attached storage play, so our architecture that we propose on an Edge and a Core. The Core devices are your traditional NetApp and EMC devices, the Edge filer is what we provide. That provides a number valuable services including global namespace, decoupling performance from capacity and decoupling from performance from geographic location. So really the natural extension of that is what we really believe is the cloud. And, we really think that that's the only architecture that can deliver high performance with that hybrid model technology.

Dave: So let's go back a little bit. So when you think about where we come from – I mean NetApp, obviously, popularized the whole notion of NAS, and filers, and they grew up the enterprise. And now, of course the industry's move is to figure out how to scale up – I’m sorry – scale out and manage clusters. You guys, trying to get ahead of that curve, you talk about global namespace, talk about the traction that you've made in in the marketplace and in where you fit.

Chris: Yeah, so I mean you know obviously, we’re well established now, we've been around for several years in the network-attached storage space, but in terms of gaining traction in the cloud, I mean, really, we started to survey quite some time ago …talking to our customers, about what they want to get, what they needed and really, that is what we brought together into this cloud prototype that we are demoing here today and at work down here today and will be released later this year. So it’s really about going out to them, seeing what problems they want to solve and then creating a product for them.

Dave: So let’s break it down a little bit further. So, I can go to Amazon. I can get block storage. I can get S3. I can get simple get put interface…why do I need Avere? Talk more about the value you guys bring and typical use cases that you envision.

Chris: That's a really good question. The thing with the power of S3, you know, is also something that's a challenge for people. It's so simple, that it’s provided just as a storage solution, but it's of course object based. Most applications are not ready for object storage. So what we've heard from people is that they've really like the economies of it, they wanted to use object storage, but they weren't sure how to get there. So what we've done is essentially overlaid file system into S3. We've taken a lot of the gateway functionality you see out there and pulled it into our architecture so it should allow people to make that seamless transition. So, if you have insisting that infrastructure, you can us with that. You can easily spin-up and S3 cloud instance and begin leveraging that. As far as typical use cases, it runs the full gamut. Some people just want to do something simple like eliminate tape, but really the thought leaders are the people that see where this is going. They want to do high-performance workloads. We had people come to us and say, “Can we use just your Avere FXT nodes and just S3 storage?” And the answer is “yes.” If you want to have no disks on site, you can have no disks on site. That is really powerful for a number of reasons. Not only the cost of energy everything associated that and tape, but also all the personnel it takes to run that, they can now be reallocate to other functions within the company.

Dave: So you reference this in your comments, so what you're saying is that, yeh, you can go to S3, its simple, but you're going to have to make some changes to your application.

Chris: Right. Absolutely.

Dave: So, with you guys the application doesn't know anything different.

Chris: Not only does the application not know, but you can take existing datasets that people are actually working on, and you can move them to S3, while people are still working on them with just a split-second cut-over while maintaining the namespace and the file system that they’re used to, so they have no idea that of any transition to S3. And our box will intelligently handle caching the hot data locally and letting the cold stuff go back to S3 at a much better price point.

Dave: Talk about the economics a little bit too.

Chris: Just take the case of perhaps media and entertainment, they have massive data storage needs, seismic processing, life sciences, genomic research…all these people tons of data that they need to work on and they have massive data that they need to hold onto. But the actual working set is actually a small subset of that. Really, what our device allows them to do is create an Edge layer that can hold just the data you need to and provide that great performance and then you can store the data in S3 at a much lower cost. Where before you potentially warehouses, or I should say, data centers of disks, tape robots and tape devices, all of that can potentially disappear. All of the headaches associated with maintaining it and the high cost of cooling, and energy and power all disappear.

Jeff: For topography though, are your boxes on prem? Or is there a way to get to the marketplace – or are your boxes also sitting out in the cloud somewhere?

Chris: That is a really good question. So, today it’s on prem. The idea is that you collocate our solution with the compute nodes or the clients or whoever you want to have that LAN-like experience. In the future, virtualizing our software is definitely something that we’re talking to Amazon about and there are definitely use cases around that, as I’m sure are finding today.

Dave: So you guys also, when you first came to market, you had this kind of multi personality capability, I’ll call it, where you had all kinds of devices on the backend, fat devices, fast devices…you got Flash coming in. And I remember you CEO talking about some of the algorithms that you had that sort of optimize that backend asset. Is that still part of your value proposition and how does that play into the cloud?

Chris: Absolutely. It is still part of our value proposition. That's the way people are using us today. In addition to decoupling performance from capacity and all these things, by inserting this other layer accelerating and providing the performance from that layer you also free up the resources on the backend and see that the net effect is actually exponential. Now your Core filers are able to do more work. In terms of our algorithms and how we maximize that, yeh, we’ve had to deal with full gamut - high-end NAS devices and cheap NAS devices. And there are some interesting things you've learned about how to talk to them and deal with them. And we think that that's all going to carry over very well to the cloud. In addition to that, you know, we have a lot of people using us today in a WAN application as well. They deploy our devices remotely to some remote site where users need a LAN-Like experience, but they want to hold their assets in a central location. That's a model we’re already using today. We think it's a very similar model to the cloud and will transfer very well.

Dave: So how do you guys position yourself. There are all kinds of themes, all-Flash arrays, hybrid… how do you guys position yourself? You’re a little bit different.

Chris: I would say that the way we position ourselves is that we make best use of the most valuable resources. So, there are a lot of different theories on how you can implement Flash. Like you said, there is all SSD devices, you can put Flash in servers and make them fast. Those are all valid solutions. However, the one Achilles' heel that we think they all have is that they're located in specific places within infrastructure. They become islands of high performance, so to speak. When you separate those into the core and you add an Edge layer of us, which can scale up to 50 of our nodes, you take your most valuable resources, which inside our box could be memory, NVRAM, SSD, and disks. All in the same platform, all intelligently tiering. Now you’ve separated those out, you can choose where you want to allocate those resources. I’m not talking about physically ripping something out of a box, inserting it in another or building up a Flash physical device where you have to take downtime. You can scale the tier in front of it seamlessly. You can choose which parts of the namespace you want to accelerate.

Dave: You’re in control of that dataflow, right?

Chris: That's correct. We have multiple options. Obviously, our best performing are most feature-rich when you’re going through us, you’re in a mode where we can cache everything – read, writes, creates, etc. However, that being said, there are different workflows where people have different needs to access data behind us. So we have a lot of flexibility in terms of how your insert us to make sure that in the cases where you need access to the data, we don’t force you in all cases, to go through us. You do you have the options and flexibility there.

Jeff: From the project manager point of view, you know, you come from a storage background, you’ve got all these cloud services and cheap storage, how have you guys really decided to attack it as a company to embrace, augment, and value-add some of this transformative nature of what is happening in the cloud?

Chris: Yeh. When people ask this kind of questions, I mean that the big killer app that I see us leveraging is FlashMove. With FlashMove, you can actually move data while it is online between what we call Core filers. So, when you talk about leveraging these cheap or new, you talk about object storage, both public and private, people want to know, “Well, how do I use them?” When you add them to our infrastructure, it basically allows you to seamlessly introduce them into your infrastructure. So, if it is working well, that’s great. Keep buying more of it. But, if it doesn’t work well, you can have that quick switch out. So it allows people to experiment and ascertain their own risk and move to that in a gradual way. The one theme we’ve heard is that everyone is being empowered and tasked by their CTOs, whoever the chief decision maker is, to leverage the cloud, reduce costs. All the guys are struggling with same thing, how to make that transition. And, we just feel like our architecture is the best architecture for real enterprise, a real high performance solution for the them to start dappling in the cloud or even flip the switch and do it all in the cloud if they want to.

Dave: What an interesting move by your guys. Let’s face it; the cloud is very disruptive to the traditional storage world. That must have been an interesting conversation back in Pittsburgh about actually going forward with this initiative. Share a little bit about what the thinking was there and was it a ‘no-brainer’ or was there a little tension there initially as well? You’re kind of sleeping with enemy in a way.

Chris: Yeah, I mean so…honestly this product, or this feature of the product, it has been more customer and then perhaps any other feature. So, I've been on dozens of calls with customers, going back to late last year. It is really all about customers wanting this solution. I do agree with you. It’s totally disruptive. But I think it is a lot less disruptive for our architecture than some of the other guys. So again, I think we play into that perfectly and the great thing was that as we talk to these customers, you know, obviously, you’re not going to have total overlap in terms of requirements, but more so than anything else, that was a huge overlap in terms of the top types of problems they want to solve and the way they want to solve them. So it really converged very quickly. I don't know that I’d call it a ‘no-brainer’ but yeh, we have so many customers asking for a solution, it seemed like a logical step.

Dave: I’m sure in your analysis, you determined that this expands your total available market pretty dramatically, I would think.

Chris: Basically, it is not an exaggeration to say virtually every customer I speak to regardless of topic, wants to talk to me about the cloud once they find out we’re doing something. It is literally every customer. Maybe one or two percent aren’t interested. Other customers are either looking at augmenting one of our existing solutions or implementing a new solution based on this.

Dave: Where are you guys winning in the in the marketplace and at whose expense?

Chris: Winning in the marketplace runs the gamut. Obviously were very well known in media and entertainment. We’re well known in seismic processing, as well. So there are two main uses. But, it is a general-purpose solution to be used across the gamut. The nature of the fact that it’s high-performance, scalable, it's natural that we gravitate to those industries first because that is really what they are all about. As far as who are we kind of hurting, really the incumbents. Because, you know, people that have large infrastructures, they want to keep extending those. They want to step them up to the next software revision, the next hardware revision. Well, when you insert our box out front, you can get a lot of life out of those NAS devices. So, maybe something that was on the cusp of really hurting you, you just bought another five or ten years of productive life out of because you put an Avere solution in front of it.

Dave: So there is a real big asset utilization play there for you guys. So, what’s you headcount these days?

Chris: I don’t know that I’m at liberty to say that.

Dave: Usually companies will talk about headcount, but not revenue.

Chris: We’re growing rapidly and we’re in the processing of finalizing a move in Pittsburgh that we’re extremely excited about.

Dave: Within Pittsburgh…

Chris: Yes, we’re in Pittsburgh. We’re from Pittsburgh. We’re staying there.

Dave: You’re pretty dogmatic about that. Well, you have some good DNA there. It has worked for Ron in the past. Well, congratulations on your bold move and good luck with the product. Break us down again, when’s it available?

Chris: The product will be available the fourth of fourth-quarter. We have a prototype now and will be starting customer demos in July with beta coming in August.

Dave: re:Invent is a big milestone for you guys.

Chris: Absolutely. We plan on staying engaged with Amazon from here on out.

Dave: What’s it like working with those guys?

Chris: It’s been great. Not just from a business development perspective… their architects have been great working with our engineers. Very accessible, very professional guys… so it’s been terrific. It’s been very good.

Dave: There has to be some serious geek-fests going on between you guys and Amazon.

Chris: I mean, we’re geeks… we’re working…the closer we get to delivery… we keep enhancing and showing it and the closer we get to delivering our product, the more excited they will get. I think they are excited about it too.

Dave: It’s a good event, isn’t it? There are a lot of good customers here. It is good quality. It’s not too much puff. I like that.

Chris: I know. We’ve a lot of tremendous conversations in our booth, more so that some of the traditional, larger trade shows.

Dave: Good. Well, Chris, great to see you again and we appreciate you coming on theCube.

Chris: Thanks a lot guys. Thanks for having me. Dave: Love the story. Love the update. All right…stay there everyone. We’ll be right back with more from the AWS Summit at Moscone LIVE this is theCube.

 

How fast is Cloud really being adopted? Download the Gatepoint Research Cloud Strategies report to gain insight into the plans of executives based on a recent survey:

 

NAS in a VDI Workflow

 

Part 1 of 2

In the recent Webinar, Untangling the VDI Storage Enigmawe covered the results of recent tests of VDI's impact on Storage. We’re going to use the Webinar content to dive a bit deeper into understanding its effects and how to measure the quality of the experience. You can also refer to the original summary of these tests here.

What happens to Storage in a VDI Environment?

To understand what happens to storage subsystems in a VDI workflow, we have to look at how the I/O generated by VDI instances moves through the virtualization stack.

In the diagram below, the top section contains all of your individual VDI instances or Guest OS instances. These can be Windows 7, Windows 2008, Windows XP, or any type of operating system running in the virtualized environment. These Guest OS’s are all also running productivity applications, like Microsoft Office, Microsoft Explorer, Microsoft Oultook, maybe Adobe Flash, or other browser-based applications as well. These applications running inside the Guest OS’s generate load (CPU, Memory, Network and Disk I/O) on the VDI hypervisor.

The hypervisor is most-commonly one of three: VMWare View, Citrix XenDesktop, or Windows Server HyperV. These hypervisors digest all of the virtual disk I/O generated by the VDI instances. The hypervisor essentially translates the virtualized I/O coming from the guest VDI instances into I/O directed at the network-attached storage (NAS). This consolidation of I/O onto the NAS is what generally causes most performance issues in VDI environments.

VDI Blender Effect

As you can see, the workflow occurring at the guest instance level eventually trickles down into NAS file I/O. With each of the VDI instances having its own individual I/O stream that it is going to read and write to a specific file that lives on the storage, the VDI hypervisor I/O is brutal on the storage subsystem. When you have one thousand to two thousand VDIs, you end up having all of these parallel streams contending for a singe storage resource.

Let’s look at what the characteristics of this virtualized I/O really look like. In the circular graph below, the inner tier shows the read vs. the write distribution of the hypervisor, specifically VMWare View in this case, with a 70% read and 30% write distribution. The outer circle shows the breakdown of I/O sizes for read and write operations. With all of these different read and write requests coming from thousands of VDI instances, this leads to what is called the I/O blender effect

VMWare View Activity resized 600

Raw IOPs: A Performance Measure or Not?

Every storage vendor out there is going to claim a raw IOPs number. At Avere, we do this too for various workflows. But, these workflows are most often purely read workflows or purely write workflows, not the VDI blender effect of many different types/sizes of requests at once. Although storage vendors may say that their devices can give you 200,000 4KB read IOPS per second, in reality this measurement has nothing to do with the virtual disk I/Os coming from a desktop virtualization hypervisor. Raw IOP numbers are usually generated with fixed I/O sizes and fixed read/write ratios. Some of them are 100% read or 100% write workloads so the claims are absolute numbers of how the storage device can actually perform with regards to the simulated I/O. What actually happens in a VDI workflow, is that you see a combination of different sizes and mixed read and write activity. Because of the small I/Os and the different read and write ratios, the raw IOPs calculation can’t give you any firm idea of how well your storage will perform with your actual VDI workload.

Finding a way to simulate the order and mix of these operations is not trivial. It isn’t easy to build a simulation that will generate a 50% read and 50% write workload, and of that, 10% 4KB 10% 16KB, etc.... Creating your own blender is still not necessarily a realistic representation of the workflow that is going to be generated by your actual VDI implementation. We’ve already determined that the raw IOP isn’t a suitable measuring stick for how a given application will behave. The testing completed at Avere showed that raw IOPS claims like, “Our device can do 75,000 read IOPs and 25,000 write IOPs, therefore with 100,000 IOPS you should be able to handle X number of VI’s" simply do not do the VDI storage admin any justice.

So raw IOPs aren’t the answer to measuring storage performance in VDI workflows. So what is? We’ll continue to talk about how to Avere’s testing identified an accurate and comparable measurement, tools to use, and benchmarking of vendor solutions right here. Stop back for the next segment.

 

This content was presented in Avere's recent Webinar titled Untangling the VDI Storage Enigma on April 18, 2013. The complete deck of slides from this presention are available for viewing through the link below. 

All Posts