Comment

JibberJobber Downtime: Why, and What Now?

September 6th, 2016

The last few weeks have been really exciting on the development side of JibberJobber.  One of my developers is rolling out a major enhancement to the job postings which should add considerable value to each user.  Another developer is working on a very cool enhancement to the video library which, of course, should add considerable value to each user. Another developer is fixing and tweaking little things here and there to make JibberJobber more intuitive, flow better, etc.  Why? To add considerable value to each user, of course!

While they are working hard, and our users are working hard on their job search, we’ve had some unplanned and unfortunate downtime that has been beyond frustrating.  Let me share with you what happens, and the why of the last few weeks, and what we are going to do about it.

When JibberJobber goes down at least four people, including me, QA, the developers, and the server admin team, get an email alert, immediately.  Shortly thereafter, I (and Liz) start to get emails from our users… some are very kind, some written out of sheer frustration.  Bottom line, we know it’s happening… we’re just not sure why it’s happening. But we jump in and work on figuring that out.  There have recently been three reasons why JibberJobber has gone down, which I share below. Note that we haven’t had any data breaches that we are aware of, and no user has lost any data. If there were a data issue, we’d revert to our backups, which run at least once every 24 hours.

Server software incompatibilities.  There is a certain software on our server that provides a necessary function. However, we’re pretty sure that this software, which I won’t name, has caused our server to crash multiple times over the last few years. This is kind of the worst scenario because the resolution is that you have to physically touch the server (which means, if it’s after hours, someone has to get out of bed, drive down to the server farm, and sit there and get it to reboot), and has sometimes taken hours to get back up.  We should be switched off of the software this week or next week, and have this issue behind us.  I really, really hope that this is the problem, and that the solution will give us long-term peace of mind (which we haven’t had for too long).

Bad guy users. Well, I’m not sure I would call them users. Maybe losers is a better word. These are people who get a free account, and then, as a “user,” abuse the system.  The latest, last week, which took our server down for 10 minutes (kudos to my developer who identified the problem immediately and resolved it) was posting multiple job postings per second.  It was too much activity and took the server down.  The resolution is to find things like that, and remove the ability for a user to abuse.  In this case, only allow someone to post X number of job postings every Y minutes, or something like that.  The immediate solution in this case was to terminate the user and block their IP address.

Horrible, horrible, people. The most frequent issue we’ve had lately has been hackers or spammers. They haven’t gotten JibberJobber accounts, they just set up servers from multiple places to attack our server. We have had this kind of activity coming from about 15 different countries, and have worked on blocking them when we see them… there are some automated server-side solutions that look promising, to handle this 24×7, without us looking and blocking all the time (which is exhausting, and very distracting).  We have applied one as a band-aid solution and so far it’s doing a pretty good job (although we have had some issues with regular users who were blocked). Once we get the server software incompatibilities fixed, we’ll move to a more long-term solution for this issue.

The bottom lines:

  • We are very sorry that downtime has happened,
  • We have been, and are, working on keeping JJ up, and reducing the unplanned downtime significantly,
  • We appreciate anyone who has reached out kindly and patiently, asking WHAT’S UP???
  • We understand the frustration, because we feel it too, from anyone who has reached out and said WHAT’S UP, GET JJ UP!!!

I’m hopeful that our strategy moving forward will make JibberJobber more reliable for you.  If you have any questions, please don’t hesitate to reach out to me.

2 Comments »

2 responses to “JibberJobber Downtime: Why, and What Now?”

  1. Gabriel says:

    You may want to look at moving into the cloud in the future: AWS / Microsoft Azure / Google GCP. It would save you money, allow your developers to work from anywhere to fix the server, use a web application firewall and platform to handle many of the attacks.

  2. Jason Alba says:

    That has been a point of discussion for a while… when we started JJ 10 years ago, in the news you would hear about the tens of thousands of sites that went down when AWS went down. To unstable, too new. We got our own box, and have upgraded over the years, but stayed within our architecture while the cloud was maturing. Things are different now, but JJ is also a really complex system… so a move isn’t trivial. Anyway, still discussing it… thanks for the suggestion :)