Skip to main content

Upload to YouTube using MRSS feed

In this blog, I will talk about one of the requirements related to YouTube integration. Here's the context. Your customer will publish RSS feed and your customer does not want to manually log into YouTube site and upload. The customer wants the uploading activity to be automated. He wants you to design loosely coupled application.

This is one of the typical integration requirements in the media space. Google provides YouTube APIs through which one can build stand-alone application.

Here is one of the solutions that can be implemented. In this case, design a stand-alone YouTubeUploader application that can be scheduled through Cron job. While YouTube enables developers with APIs, authentication mechanisms, client libraries, it is important to segregate roles and responsiblity of your classes. In my solution, I will make YouTubeUploader as main class which can be scheduled through cron job. This class invokes FeedParser which can access feed through http URL. Usually, the publishers now a days, use MRSS feed to syndicate the content.

Let your FeedParser parse the MRSS feed and persist in database to make sure that duplicate entries are not persisted. Develop MediaContentDownloader to download the binary content through http URL into temporary folder. Finally, develop MediaContentUploader to upload the binary content to YouTube site. Make sure to define the customer specific configuration with regard to YouTube credentials, Feed URL.

Some tips for YouTube direct Upload.

1. Use ClientLogin authentication

2. Use direct and resumeable method (first request for Upload with metadata and subsequent requests for uploading actual binary content)

3. Persist target 'Location' in a database, so that this can be used while resuming upload

4. For duplicate content check, one can produce md5 digest and persist in the database against an entry. Before uploading, make sure to create md5 digest and verify against the md5 digests persisted in the database. However, this process may have implications on performance.

Deployment diagram for your reference ...

Comments

Popular posts from this blog

GCP: GAE - Memcache best practices

Memcache is a distributed in-memory data cache in front of or in place of robust persistent storage for some tasks. GAE includes a memory cache service for this purpose. Best practices for using memcache: 1. Handling memcache API failures gracefully; Do not expose errors to the end users 2. Use batching capability of the API when possible 3. Distribute load across your memcache keyspace Use sharding and aggregating for improving performance efficiency. Use TTL (expiration policy) to make sure the memcache does not fill-up indefinitely Use getIdentifiable() and putIfUntouched() for managing the values that may get affected by concurrent updates Use batching (getMulti ("comments", "commented_by") ) to fetch related values together instead of one by one Use graceful error handling

Innate and Non-innate learning

I am reading a book called 'What did you ask at school today?' by Kamala V Mukunda. Would like to share some learning. The book is intended for teachers as primary audience, nevertheless, good for any adult to gain deeper understanding on learning process. She talks about brain structure, innate and non-innate learning aspects and talks about synergy needed between the two in the first two chapters. Firstly, innate learning is something that would not need explicit training. For example, kids learning the language. They wont feel strained or stressed during this kind of learning, just because they enjoy the process, where as non-innate learning focuses more on class room learning. It is accepted that learning through playful means will have more impact on kids than the impact through the structured learning. A physcologist, David Geary puts it this way - while learning through playful means has more impact, children should be encouraged to learn the skills through structure...

Essential GCP services for a new age application

Identity and resource management IAM  Identity aware proxy Resource Manager Stackdriver Monitoring Stackdriver Monitoring: Infrastructure and application monitoring Stackdriver Logging: Centralized logging Stackdriver Error Reporting: Application error reporting Stackdriver Trace: Application performance insights (latency) Stackdriver Debugger: Live production debugging Development management Cloud Deployment Manager: Templated Infrastructure deployment Cloud Console: Web based management console Cloud shell: Browser based terminal/CLI Development tools Cloud SDK: CLI for GCP Container registry: Private container registry Container builder: Build/Package container artifacts Cloud source repository: Hosted private git repository Database services Cloud SQL: Managed MySQL and PostgreSQL Cloud BigTable: HBase compatible non-relational DB Cloud Datastore: Horizontally scalable non-relational (ACID) Cloud Spanner: Horizontally scalable relation...