NexGen Impex

The Challenges of Project Estimation: Exploring the Variations

Let’s consider the development of a pretty trivial feature for your imaginary web app that any similar app should have: log in with a username and password. On the surface, everything looks pretty simple. Users input their names and passwords, click the Login button, and the application performs an authentication and authorization routine. However, there are different ways the vendor can implement this feature.

Here, the vendor considers such factors as the application load (QPS or queries per second, sometimes called RPS or requests per second) and required UpTime. These are the parts of non-functional requirements that are often not included in the scope you send to vendors since your major concerns are functionality and the value of your application for users.

These parameters weren’t included in your requirements list, but they have a significant impact:

  • QPS is the number of users that simultaneously perform specific actions. It changes during the day and depends on the particular day of the year. For example, in your outlet store web app, QPS will rise significantly before the holidays;
  • UpTime shows how accessible your app is to users. UpTime equals 99% may look pretty awesome. However, it indicates that the app was inaccessible for 87 hours during the year. In other words, its downtime equals 87 hours. Unfortunately, it will be those hours when crowds of buyers want to buy their relatives some holiday gifts. For comparison, with all their financial and technical capabilities, Amazon AWS guarantees the downtime equals 4 hours, which gives us UpTime equals 99,95%. Such an excellent result!
  • Let’s consider the development of a pretty trivial feature for your imaginary web app that any similar app should have: log in with a username and password. On the surface, everything looks pretty simple. Users input their names and passwords, click the Login button, and the application performs an authentication and authorization routine. However, there are different ways the vendor can implement this feature.
  • Here, the vendor considers such factors as the application load (QPS or queries per second, sometimes called RPS or requests per second) and required UpTime. These are the parts of non-functional requirements that are often not included in the scope you send to vendors since your major concerns are functionality and the value of your application for users.
  • These parameters weren’t included in your requirements list, but they have a significant impact:
  • QPS is the number of users that simultaneously perform specific actions. It changes during the day and depends on the particular day of the year. For example, in your outlet store web app, QPS will rise significantly before the holidays;
  • UpTime shows how accessible your app is to users. UpTime equals 99% may look pretty awesome. However, it indicates that the app was inaccessible for 87 hours during the year. In other words, its downtime equals 87 hours. Unfortunately, it will be those hours when crowds of buyers want to buy their relatives some holiday gifts. For comparison, with all their financial and technical capabilities, Amazon AWS guarantees the downtime equals 4 hours, which gives us UpTime equals 99,95%. Such an excellent result!
  • The next question is, what will be the financial consequences for your business if suddenly, on Black Friday, your application crashes and the user cannot log in? Of course, you can reach out to the vendor, and all the issues will be fixed. However, no one will give you back the lost time and bring back users who could not buy the desired product. If functionality is restored within an hour, it’ll be a good result, and the incident will increase overall downtime by 1 hour. It doesn’t seem much, but what is the amount of lost profit for 1 hour of rush demand you missed?
  • These losses can be avoided by putting in more effort. Adding an extra server for running the code and using a load balancer before it, plus moving Redis to a separate server, can do the trick. Here, UpTime will increase significantly because the login functionality is duplicated. If one login server becomes unavailable, the second can process user requests. Since Redis works separately, its stability doesn’t depend on the application servers. As a result, we made a massive leap towards increasing the login’s UpTime but spent more time. The login feature implementation takes X3 hours, which is more than X2.
  • The next question is, what will be the financial consequences for your business if suddenly, on Black Friday, your application crashes and the user cannot log in? Of course, you can reach out to the vendor, and all the issues will be fixed. However, no one will give you back the lost time and bring back users who could not buy the desired product. If functionality is restored within an hour, it’ll be a good result, and the incident will increase overall downtime by 1 hour. It doesn’t seem much, but what is the amount of lost profit for 1 hour of rush demand you missed?
  • These losses can be avoided by putting in more effort. Adding an extra server for running the code and using a load balancer before it, plus moving Redis to a separate server, can do the trick. Here, UpTime will increase significantly because the login functionality is duplicated. If one login server becomes unavailable, the second can process user requests. Since Redis works separately, its stability doesn’t depend on the application servers. As a result, we made a massive leap towards increasing the login’s UpTime but spent more time. The login feature implementation takes X3 hours, which is more than X2.
  • The next question is, what will be the financial consequences for your business if suddenly, on Black Friday, your application crashes and the user cannot log in? Of course, you can reach out to the vendor, and all the issues will be fixed. However, no one will give you back the lost time and bring back users who could not buy the desired product. If functionality is restored within an hour, it’ll be a good result, and the incident will increase overall downtime by 1 hour. It doesn’t seem much, but what is the amount of lost profit for 1 hour of rush demand you missed?
  • These losses can be avoided by putting in more effort. Adding an extra server for running the code and using a load balancer before it, plus moving Redis to a separate server, can do the trick. Here, UpTime will increase significantly because the login functionality is duplicated. If one login server becomes unavailable, the second can process user requests. Since Redis works separately, its stability doesn’t depend on the application servers. As a result, we made a massive leap towards increasing the login’s UpTime but spent more time. The login feature implementation takes X3 hours, which is more than X2.
  • Now, simply speaking, the login’s UpTime depends on the UpTime of the server that runs Redis and the Redis itself. With AWS, there are guaranteed 4 hours of downtime per year since it’s the rented server’s downtime we can’t avoid, plus Redis’ downtime. To increase the login’s UpTime, we can replace the Redis server with a Redis cluster containing 3 servers. When the main Redis server crashes, the remaining ones will continue working, and the authorization process will work just fine.
  • For the application to know where the main server is and to which Redis server it must connect, before the cluster, we should use a proxy, for example, HAProxy. HAProxy can switch to a new main server in about 3 seconds if the main Redis server crashes. In this case, the Redis downtime will decrease to 3 seconds, but we’ll have to spend extra time implementing this functionality. Implementing the Login feature will take X4 hours, which is bigger than X3.
  • Now, simply speaking, the login’s UpTime depends on the UpTime of the server that runs Redis and the Redis itself. With AWS, there are guaranteed 4 hours of downtime per year since it’s the rented server’s downtime we can’t avoid, plus Redis’ downtime. To increase the login’s UpTime, we can replace the Redis server with a Redis cluster containing 3 servers. When the main Redis server crashes, the remaining ones will continue working, and the authorization process will work just fine.
  • For the application to know where the main server is and to which Redis server it must connect, before the cluster, we should use a proxy, for example, HAProxy. HAProxy can switch to a new main server in about 3 seconds if the main Redis server crashes. In this case, the Redis downtime will decrease to 3 seconds, but we’ll have to spend extra time implementing this functionality. Implementing the Login feature will take X4 hours, which is bigger than X3.
  • Now, simply speaking, the login’s UpTime depends on the UpTime of the server that runs Redis and the Redis itself. With AWS, there are guaranteed 4 hours of downtime per year since it’s the rented server’s downtime we can’t avoid, plus Redis’ downtime. To increase the login’s UpTime, we can replace the Redis server with a Redis cluster containing 3 servers. When the main Redis server crashes, the remaining ones will continue working, and the authorization process will work just fine.
  • For the application to know where the main server is and to which Redis server it must connect, before the cluster, we should use a proxy, for example, HAProxy. HAProxy can switch to a new main server in about 3 seconds if the main Redis server crashes. In this case, the Redis downtime will decrease to 3 seconds, but we’ll have to spend extra time implementing this functionality. Implementing the Login feature will take X4 hours, which is bigger than X3.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top