Skip to main content

Have you heard of GlusterFS?



GlusterFS is cluster file system that is already tried and tested disk file systems like ext3, ext4, xfs, more to store data. It can easily scale up to petabytes of storage under a single mount point for the user. It is free and open source software available as GNU GPL v3 and some parts as GNU GPL v2.

GlusterFS Servers run glusterfsd daemon to export local file system as volume and glusterfs client process can connect to servers through custom protocol over TCP/IP.  The final volume can be mounted by the client using NFS v3 protocol also.

Why should I use this?
I find it useful typically in cloud environment where I need to scale out. More importantly, when I use AWS cloud, GlusterFS is available as AMI. With standard and premium support subscriptions available from Gluster, the option enables your solution's business continuity by providing disaster recovery capability. Also, Now, I can view file storage service as a commodity. Finally, Gluster is the only highly available storage solution for AWS EC2 AND AWS EBS.

Where do you want to use?
Imagine a scenario where you are designing the solution that involves EC2 and EBS deployment. There is a need for shared file system. The options are plenty, however the simple solution could be to use NFS server. While this is a real solution, You still need to address the problems of production like the following:

1.     What if your storage goes beyond the limit that you envisaged at the time production?
2.     What if your storage infrastructure fails for any reason?

The answer is that the designers will have to provide disaster recovery, high availability and scale out strategies. Alternatively, you can use GlusterFS support on AWS to address all your production needs; In one line, You have outsourced your storage solution support to Gluster support.

However, this comes at cost, and I think, its fair as its Pay as you go model.

Comments

Popular posts from this blog

GCP: GAE - Memcache best practices

Memcache is a distributed in-memory data cache in front of or in place of robust persistent storage for some tasks. GAE includes a memory cache service for this purpose. Best practices for using memcache: 1. Handling memcache API failures gracefully; Do not expose errors to the end users 2. Use batching capability of the API when possible 3. Distribute load across your memcache keyspace Use sharding and aggregating for improving performance efficiency. Use TTL (expiration policy) to make sure the memcache does not fill-up indefinitely Use getIdentifiable() and putIfUntouched() for managing the values that may get affected by concurrent updates Use batching (getMulti ("comments", "commented_by") ) to fetch related values together instead of one by one Use graceful error handling

Key to adopt open source product

Friends, I am working on business solution implementation on open source product called Kaltura. Kaltura is a media management solution and has loads of features that compel any business to take a peek into it. More-over this is the only complete end-to-end open source software available to handle digital assets. But it comes with its own head ache. Considering its open source, its understandable. I feel, handling these would ensure you the success in your open source product implementation. 1. In my opinion, before adopting any open source software, build the capability to deal with the inconsistency bundled in the open source software. 2. I would avoid involving external consultants for 2 reasons.      a. I am not sure, they would bring necessary expertise on to table      b. I fear that there would be little ownership, they will not see big picture of my business (neither I am interested to share it all) 3. Alternative to that is to build the tea...

Innate and Non-innate learning

I am reading a book called 'What did you ask at school today?' by Kamala V Mukunda. Would like to share some learning. The book is intended for teachers as primary audience, nevertheless, good for any adult to gain deeper understanding on learning process. She talks about brain structure, innate and non-innate learning aspects and talks about synergy needed between the two in the first two chapters. Firstly, innate learning is something that would not need explicit training. For example, kids learning the language. They wont feel strained or stressed during this kind of learning, just because they enjoy the process, where as non-innate learning focuses more on class room learning. It is accepted that learning through playful means will have more impact on kids than the impact through the structured learning. A physcologist, David Geary puts it this way - while learning through playful means has more impact, children should be encouraged to learn the skills through structure...