Skip to main content

Have you heard of GlusterFS?



GlusterFS is cluster file system that is already tried and tested disk file systems like ext3, ext4, xfs, more to store data. It can easily scale up to petabytes of storage under a single mount point for the user. It is free and open source software available as GNU GPL v3 and some parts as GNU GPL v2.

GlusterFS Servers run glusterfsd daemon to export local file system as volume and glusterfs client process can connect to servers through custom protocol over TCP/IP.  The final volume can be mounted by the client using NFS v3 protocol also.

Why should I use this?
I find it useful typically in cloud environment where I need to scale out. More importantly, when I use AWS cloud, GlusterFS is available as AMI. With standard and premium support subscriptions available from Gluster, the option enables your solution's business continuity by providing disaster recovery capability. Also, Now, I can view file storage service as a commodity. Finally, Gluster is the only highly available storage solution for AWS EC2 AND AWS EBS.

Where do you want to use?
Imagine a scenario where you are designing the solution that involves EC2 and EBS deployment. There is a need for shared file system. The options are plenty, however the simple solution could be to use NFS server. While this is a real solution, You still need to address the problems of production like the following:

1.     What if your storage goes beyond the limit that you envisaged at the time production?
2.     What if your storage infrastructure fails for any reason?

The answer is that the designers will have to provide disaster recovery, high availability and scale out strategies. Alternatively, you can use GlusterFS support on AWS to address all your production needs; In one line, You have outsourced your storage solution support to Gluster support.

However, this comes at cost, and I think, its fair as its Pay as you go model.

Comments

Popular posts from this blog

GCP: GAE - Memcache best practices

Memcache is a distributed in-memory data cache in front of or in place of robust persistent storage for some tasks. GAE includes a memory cache service for this purpose. Best practices for using memcache: 1. Handling memcache API failures gracefully; Do not expose errors to the end users 2. Use batching capability of the API when possible 3. Distribute load across your memcache keyspace Use sharding and aggregating for improving performance efficiency. Use TTL (expiration policy) to make sure the memcache does not fill-up indefinitely Use getIdentifiable() and putIfUntouched() for managing the values that may get affected by concurrent updates Use batching (getMulti ("comments", "commented_by") ) to fetch related values together instead of one by one Use graceful error handling

Innate and Non-innate learning

I am reading a book called 'What did you ask at school today?' by Kamala V Mukunda. Would like to share some learning. The book is intended for teachers as primary audience, nevertheless, good for any adult to gain deeper understanding on learning process. She talks about brain structure, innate and non-innate learning aspects and talks about synergy needed between the two in the first two chapters. Firstly, innate learning is something that would not need explicit training. For example, kids learning the language. They wont feel strained or stressed during this kind of learning, just because they enjoy the process, where as non-innate learning focuses more on class room learning. It is accepted that learning through playful means will have more impact on kids than the impact through the structured learning. A physcologist, David Geary puts it this way - while learning through playful means has more impact, children should be encouraged to learn the skills through structure...

Essential GCP services for a new age application

Identity and resource management IAM  Identity aware proxy Resource Manager Stackdriver Monitoring Stackdriver Monitoring: Infrastructure and application monitoring Stackdriver Logging: Centralized logging Stackdriver Error Reporting: Application error reporting Stackdriver Trace: Application performance insights (latency) Stackdriver Debugger: Live production debugging Development management Cloud Deployment Manager: Templated Infrastructure deployment Cloud Console: Web based management console Cloud shell: Browser based terminal/CLI Development tools Cloud SDK: CLI for GCP Container registry: Private container registry Container builder: Build/Package container artifacts Cloud source repository: Hosted private git repository Database services Cloud SQL: Managed MySQL and PostgreSQL Cloud BigTable: HBase compatible non-relational DB Cloud Datastore: Horizontally scalable non-relational (ACID) Cloud Spanner: Horizontally scalable relation...