Stack Overflow profile for Roopesh Shenoy

Transactional DDL In PostgreSQL

Recently I came across a comparison of Transactional DDL support in various databases, on the postgresql wiki.

For me  this has been an eye-opener – having started my career with Oracle databases, I was brought up thinking that ddl statements are always outside transactions, without questioning why it must be so.

PostgreSQL not only supports transactional DDL, it does so in the most intuitive way – you just wrap these statements in a transaction block and everything can be rolled back if necessary. No auto-commits (like Oracle/MySQL does), no need for specific Isolation levels (like SQL Server) – works quite robustly without any problems.

Just one more reason for me to pick Postgres as the database of choice for my upcoming projects. Then of course, there is that thing about Postgres actually being the most advanced database and open source at the same time.

Learning Kannada–Week 1

Now that I am learning Kannada from scratch, I started searching for effective ways of doing so – here are some of the things I have found useful so far.

Kannada Baruthe – Useful starter website – collection of daily usage words and phrases, along with audio – great for beginners. The content is broken into several categories such as Starters, Greetings, Enquiry, Directions, Relationships, Numbers and more, so that it is easy to learn and get used to the context.

A couple of useful Mobile apps – these are great because you can use them when travelling, when you are generally doing nothing else.

Kannada Kali – A basic app, decent enough to get you started with some basic conversations. There is a cool feature to practice writing basic alphabets, although it would have been nice if they had given the english phonetics of the alphabets in the same place – that’s in a different category altogether (so you’re expected to know the alphabet when you actually practice writing them)

Learn Kannada – this one’s good because it allows you to search through it’s content so it’s easy to find something – extremely handy, especially for quick reference. However it does crash a lot, I think my poor Samsung Galaxy pop can’t handle this search all that well

Of course, nothing beats google translate when it comes to ad-hoc translations on either sides.

An Audacious Project – Languages of India

It is a known fact (at least to my friends and family!) that I am a keen enthusiast of Technology in Learning and also pursuing a venture in the same. Today I also embark upon a new personal project that has the potential to change me as a person – the project is

Learn all 22 Official languages of India (aside from English)

“Wait, What?” you say. “That’s preposterous! You live in Karnataka and can’t even speak Kannada properly yet!”.

Thank you for pointing that out. I am not proud of that fact. But I intend to change it. Starting now, I am going to dedicate a good amount of time to learning basic Kannada vocabulary, and at least pick up some Kannada literature to read. But I don’t want to stop there. I want to try and learn at least the 22 languages, maybe even learn some of the other popular local languages and dialects.

Well enough for me means being able to strike a conversation with the locals, being able to read basic words (at least sign posts but would be nice to be able to read a book), watch local cinema/TV shows without needing a translator/sub-titles and so on.

I already know 3 Indian languages pretty well – Konkani (my mother tongue), Hindi and Marathi (courtesy of my upbringing in Mumbai). I can speak broken Kannada and understand quite a bit of it but can’t read or write it yet. I think that’s half-way into the 4th language.. so 18.5 more to go!

Why do I want to do this? I think it’s a pity I know English really well and can also speak few words of French/Japanese but can’t really say a single word in Tamil, Telugu or Urdu. I would really like to be able to speak to locals in their own languages whenever I visit different places in the country. I also think it’s important for my education related work; localization of education does seem to be a huge untapped opportunity but more so because it’s just so much easier to talk to people and understand their problems once the language barrier is crossed. And a problem-solver I am!

I know this is a life-long project, but I’m going to try and go as fast as I can. At the same time, I think the journey is more important than the destination and hope to learn more than just the languages themselves – local culture, cuisines, traditions, history, etc. I will keep this blog updated about my progress. Wish me luck!! And I will take any help/advice you have to spare, I will definitely need it!

Purpose, Mastery, Self-Direction

These are the things that motivate people once the issue of money is taken care of – watch this amazing animation by RSA explaining what motivates people and what doesn’t.

Difference Between REST and SOAP

We had a question today on what’s the real advantage of going with REST over SOAP. It’s tempting to say that REST is the next best thing after sliced bread while SOAP is the devil’s spawn but here’s a more reasonable attempt at explaining the difference –

  • SOAP is a protocol on top of HTTP whereas REST is a design principle of designing services with basic HTTP protocol.
  • REST uses HTTP verbs GET, POST, PUT, DELETE, etc for the things that they were intended – SOAP is mainly an RPC with XML that primarily uses POST requests
  • REST doesn’t try to handle state; SOAP protocol is designed to handle state of a particular client through session (which adds a bit of overhead)
  • HTTP is already handled by browsers and all web clients, as well as any web server – on the other hand SOAP protocol needs an additional layer of software, and each vendor has it’s own implementation (which may or may not fully adhere to the SOAP protocol).
  • SOAP exposes API in form of WSDL which allows client-side code generation – this makes it easier to get started but also introduces maintainance issues
  • Versioning is more difficult with SOAP – even additional properties getting added will break the clients if they are on a previous version – on the other hand, versioning is much simpler with RESTful services, since any additional properties added on the server side are just ignored at the client side if they are not required
  • If there are strict contracts to be forced between a service and a Client, SOAP has explicit ways of doing that – REST has no such mechanisms
  • A lot of client side libraries (such as backbone.js) make it much easier to consume RESTful web services
  • Since SOAP is a huge abstraction on top of the HTTP protocol, things get really difficult to debug when the abstraction breaks down – for e.g. when a configuration has a problem. Debugging issues in HTTP calls is relatively much easier since there is no proprietary stack in between your code and the client calls.

Can you think of any other differences?

UPDATE: I recently posted an interview with Demis Bellot on InfoQ that explains in a lot more detail the perils of SOAP – you should read it.

Spanner–Google’s Globally Distributed Database

Google recently published a paper describing Spanner, their globally distributed, Paxos-replicated database with externally consistent transactions, using specially designed hardware and a new TrueTime API.

The motivation behind using Spanner for certain types of applications instead of the massive key-value store that is BigTable is interesting –

We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.

In an effort to achieve this, Spanner combines the Replicated ACID transactions of Megastore with the Scalability and throughput of BigTable. The challenges faced and the solutions around this (especially custom hardware and the TrueTime API) are described in great detail in the research paper. You can also view the presentation “Building Spanner” from Alex Lloyd (or just download the slides) to get more details.

Google has already moved F1, the backend for their advertising platform, to use Spanner instead of MySQL.

There are several reactions from around the web.

nlavezzo of FoundationDB observes

It’s interesting to see that the creators of BigTable and the early proponents of eventual consistency have invested the last 4.5 years building a system that adds back strong consistency guarantees.

If the Spanner paper is as important as BigTable, ACID may become the new goal for those building distributed systems.

linuxhansl who works on Apache HBase remarks

It makes me sad to see how far ahead Google is compared to the rest of the world. Smile The notion of uncertain time is ingenious.

Mike Miller of Cloudant Labs says

It represents a pragmatic acceptance of developer’s reluctance to reason in the absence of immediately consistent transactions and therefore strikes a bittersweet chord. Philosophically this feels like a big step forwards for distributed systems.

It is probably too early to see whether open source software similar to this will appear soon, especially given that Google leverages custom hardware to really benefit from this. Also we don’t know whether Google will offer this as an external data service through it’s Cloud platform. Such software though could definitely help companies leverage globally distributed cloud infrastructure (to avoid service disruptions when a data-center goes down) and still rely on strong consistency guarantees when designing their applications. NuoDB and FoundationDB are at least two projects that seem to target such a use case although it is unclear how they will compare without special, time-handling hardware.

.NET Courses coming up!

I’ve been spending some time with entry-level and mid-level .NET programmers who have gone through several courses related to C#, WCF, ASP.NET, Windows Forms, and various technologies. However most of them don’t know the first thing about unit testing, Design patterns such as dependency injection, practices such as continuous integration, Acceptance and integration test automation and so on. If you think about it, technologies change, but practices don’t and it’s high time we start focussing our courses on practices rather than the technology that’s currently in flavor.

So I’ve kept an aim to come up with some easy to digest courses that will hopefully encourage and enable average programmers already well-versed with C# to push forward and start using some of these useful practices. This is what i have in mind so far – all the below will contain code samples and hands-on exercises so that the learners can start applying them in their real-world projects

1. Unit Testing
2. Dependency Injection
3. Automated Integration and Acceptance Testing
4. Continuous Integration
5. ORMs

Of course these look basic and there are many libraries and tools that we can teach about, but I think some of these concepts are essential and go beyond language and framework concerns. Some hands-on is must to get them imbibed into the programmers’ toolkit.

What do you think? Any suggestions to improve this course outline?

Commoditization Of Outsourced Software Development

For a few days now, I’ve been trying to wrap my head around how small startups in India are actually having any margins in the outsourcing development business. Challenges seem to be plenty, and a lot of people seem to be getting into it these days.

For instance -

  • Sticker-price for developer hours seem to be in a downward spiral – it’s not hard to see either freelancers or small companies offering development resources at as little as $12/hr. Of course quality is generally suspect, but customers who blindly compare hourly rates couldn’t care less. What’s worse, is that the experience they have (with poor quality deliverables) is often extrapolated to the entire industry.
  • Industry-wide salaries seem to be increasing – sometimes even unreasonably so. A side effect of everyone trying to start this business, is that there is intense competition for resources – so much so that a lot of average or under-average software engineers also tend to get paid a lot. Competition is intensified with even product-based, ecommerce or other VC-funded businesses coming into the picture. There is almost no correlation between the salaries for Software Engineers compared to other jobs requiring similar skill levels in other industries, which is a worrying trend.
  • Increasing salaries is drawing almost everyone into the software industry – engineers are blindly getting into software companies irrespective of their majors – more a rule now than an exception. And a mediocre software developer is almost an expectation these days – it’s *almost* impossible to find fresh graduates who actually *love* developing software
  • We are actually running out of resources – large companies have started hiring BSc graduates instead of Engineers for instance, and are lowering their filters to ensure that they are able to staff their work-force.
  • Giant outsourcing successes such as Infosys, TCS, Wipro, Cognizant, IBM seem to be the ideals that a lot of entrepreneurs strive for – without realizing that these were started in a different age, and it is nigh impossible to replicate such a feat with the current market dynamics, especially as a startup

The Industry might seem great for an economy like India – lots of jobs created, good amount of exports generated, talent pool created within the country and so on – and indeed in many ways it has contributed to the bulging middle-class in the country. But the fact remains that a lot of Indian companies do not do any “high-end” technology work – or focus on finding a niche that they can truly do justice to. To top that off, high stress levels, 60 hour work weeks, high attrition levels, etc. seem to be just getting factored into the business plan – a very worrying sign indeed.

Now don’t get me wrong, there are some companies that understand the need to differentiate themselves, take care of their employees and ensure that they create unique value for their customers, but this is not true about the industry as a whole.

These problems are faced by large companies as well – for instance, Infosys is very stringent about the margin for any new business they take up, but it was recently overtaken by Cognizant in terms of revenues. Cognizant is known for their lesser margins, and still lag Infosys in terms of net-profit. So does this mean that the only way to keep growing here is through reducing margins?

What is the end-game to all this? Will the industry just implode onto itself when this becomes unsustainable? Or will it grow out of pure-outsourcing and focus on consulting, product-development, platforms and other high-margin services that offers more differentiation opportunities? And what about startups, where all these problems get compounded several times?

I think it’s time we stop thinking of software development as an arbitrage business and start focussing on providing value that rivals the best consulting/development firms in the world. Competing on price alone seems to be a strategy doomed to failure in the long run. And if you are a start-up trying to get into the services business, you better have a differentiator/niche that allows you to charge more than the market rates. Else your business is will either fail or may become a mediocre success at best.

Hosting Options For Web Start-Ups

I recently came across a question on a start-up forum that asked this question -

"Please help understand which hosting is better (in terms of cost, security & privacy of data when one has to store a lot of private customer data), cloud or in house?"

This is a pretty good question, because it essentially clarifies that cost is not the only consideration – a lot of startups don’t really think of security/scalability/uptime etc. when they start out and these things can come back later to haunt them – some decisions can be reversed easily, others not so much.

So, as far as my experience goes, here is my attempt at listing the various factors to consider before making a decision.

First of all the various choices –

Shared Hosting – includes Bluehost, Hostgator, GoDaddy and a whole lot of other providers. Cheapest to start with, although you don’t get a lot of resources – great for low traffic web sites like starter blogs, personal web site, etc. Comes with lot of restrictions (in terms of what you can and cannot use) and soft limits (especially CPU and memory throttling), so any web app with some serious usage will outgrow this quickly.

Generic Virtual Machines - There are many here, for e.g. Rackspace (great for both Virtual Machine and dedicated hosting). These work out to be much better, give extra support, and the bandwidth cost is much lesser compared to most of the cloud providers. Linode is also great for Linux VM hosting.

Dedicated Hosting – A vendor such as Rackspace allocates specific hardware only for you and gives you remote access to it. They will be responsible for the hardware uptime, you will be responsible for application uptime and you or they will be responsible for OS uptime (depending on the support plan you choose). The advantage here is that you get exactly what you want in terms of hardware specs, configuration setup, etc, combined with the expertise of the hosting provider. This turns out to be cheaper than Cloud hosting if you can predict your loads accurately and they are more or less stable (not spiky). If not though, this can be expensive since you might tend to over-allocate resources and underutilize (since the alternative, i.e. over-utilizing is more trouble-some).

Cloud Infrastructure-As-A-Service – Here, you can go with any of the cloud providers (Amazon Web Services, Windows Azure) and focus mainly on the virtual machines – you buy raw capacity (not the hardware, just the abstract notion of capacity in terms of CPU, memory, disk space) from them with some OS level abstraction, but are responsible for installing and maintaining your own stack. Great for scale out and if you want complete control over how your software is configured, but does take that extra bit of time from you. This is somewhat different from Virtual machine providers because of the extent to which you can scale, the number of geographic locations available, but the underlying concept is the same – you get Virtual machines not hardware access.

Cloud Platform-As-A-Service – More choice here, including the earlier ones (Amazon Web Services and Windows Azure, but also Heroku, AppHarbor, AppFog, AppEngine, etc.). These also provide and maintain software stack along with the hardware capacity and OS, and a lot of times you just upload your app and expect it to work – great if you can give up some control over stack configuration, in exchange for freeing up some time and maintenance headaches.

Most of these also allow you to worry about your application logic, and even uptime, replication, failovers, auto-scaling, backups can be taken care of by the provider.

Co-located hosting – here you will buy a server, and will be responsible for the OS, but you can just put it in a datacenter near you – they will charge you for maintaining the hardware and providing electricity, space, cooling, internet and, depending on your contract, even hardware maintenance. I think BSNL provides that in India, there maybe others (not sure, others can pitch in)

Self-hosted – this does give you maximum power, but also max work – you don’t pay money to others (other than buying hardware) but you do have to account time lost/additional staff for taking care of this work. However scaling here can also allow you to bring in hardware level optimizations, especially if you think your software has unique requirements and you can tune the entire stack better than someone else can. You still have to worry about disaster recovery and failovers though and might choose one of the above options for that (unless you have offices in multiple geographies that can reduce this risk for you).

 

So what are the things that can help you decide?

1. Your stack and OS needs – if you don’t have any special needs and are using any of the popular web programming languages such as PHP, Python, .NET, a great progression is start using shared hosting, then upgrade to platform as a service – both of them abstract away the maintenance of the underlying hardware resources. PAAS providers are generally bigger feature sets as well (for eg separate worker/web roles, better database choices etc.) so you may skip the shared hosting part in some cases (for e.g. you want to use PHP with MySQL – you can start with Bluehost/GoDaddy, but if you want to use CouchDB, that may not be a great option).

Same time if you have specific configuration needs (say you are porting a legacy app, or there are some not-so-often used stacks you are using because it fits your use case well), then you might want to just start off with Virtual machine hosting and then upgrade to Cloud Infrastructure-as-a-service or Dedicated Hosting.

2. Your budget vs. timeline constraints – Higher budget but lesser time means you might want to outsource as much as possible so that you can focus on getting your stuff done fast – this could mean either Cloud hosting or Managed dedicated hosting. On the other hand, lesser budget might result in you trying to reduce cash out-flows and stretch precious cash out – this can help if the main bottleneck during this time is not your time.

3. Your preference – max flexibility with maybe more work (IAAS, self-hosting, virtual machines) vs. less flexibility but minimum work (PAAS hosting)

4. Your development and operational practices – This is where some of the PAAS providers really shine. For e.g. in AppHarbor, you can just do a git-push and this deploys your code to the prod. The service will even run all the automated tests before final deployment, rollback the deploy if there is any failure. Some of these can be extremely time consuming to setup and maintain if you try to go for an in-house or VM-based solution.

 

I would go for –

1. Just starting out with something, price is the biggest constraint – shared hosting

2. Upgrading from Shared hosting – PAAS

3. Want to control my own stack, but price conscious – VMs/IAAS if the demand cannot be foreseen (which is true in most cases) – IAAS gets preference whenever there is spiky load and I need auto-scaling with hourly billing instead of monthly

4. Used IAAS for some time, have stable or foreseeable demand, with no much spikes – Dedicated hosting, with option to plug-into cloud service such as AWS whenever needed (spikes). In-house hosting only if I am partnering with other geeks who know their hardware well and don’t cringe if they have to build their servers other infrastructure from scrap themselves.

 

About security – AWS has pretty good security certifications, and you can also do things such as VPN instead of keeping it a public network. However this depends on how secure you want it to be – for instance do you mind if you don’t know where your data is saved? Some banks cannot save their data outside of their country so this could be a problem. What’s the cost of data exposure for you or your customers? Are you storing financial/personal data? Do the laws of your country demand something specific about the kind of data you are storing?

For e.g. I read somewhere (correct me if I am wrong) that if you save Credit card info, you should be hosting it yourself – you cannot out-source hosting (besides there are security certifications that you need to pass). Is this really necessary? Can you just outsource the whole payment management (including saving credit card info) to a third party? These are architectural decisions that need to be made that will also determine where you can deploy your app.

Hopefully that should give some ideas about your options. Have I missed something? Let me know.

Startup Material

Someone asked on a mailing list how do you find out who’s “startup material”.  Here’s my take -

  • Stupid enough to start a company
  • Naive enough to trust others (partners, customers, employees)
  • Dumb enough to work hard for lesser pay, when you could be earning more with less work
  • Insanely patient through n-number of failures
  • Extremely optimistic under any circumstances
  • A stress junkie
  • Ability to involve other “startup material” folks in your idiotic plan
  • All in hopes of a rich pay day and an opportunity to “do something” (yeah right, as if that’s important) even if the statistical probability of it actually happening is 0.01% (note the point about extreme optimism above)

So, are you “startup material”?