Questions tagged [system-reliability]

22 questions
74
votes
19 answers

How come compilers are so reliable?

We use compilers on a daily basis as if their correctness is a given, but compilers are programs too, and can potentially contain bugs. I always wondered about this infallible robustness. Have you ever encountered a bug in the compiler itself? What…
EpsilonVector
  • 10,763
  • 10
  • 56
  • 103
65
votes
31 answers

Why isn't software as reliable as a car?

I had a user ask me this question. We know that cars break down, but that's because of something physical (unless software is involved!). I tried to answer that software is a much younger industry, but the user countered with "didn't the automobile…
Alex Angas
  • 681
  • 1
  • 8
  • 19
23
votes
7 answers

Are the terms stable and reliable interchangeable?

Is there a difference between stability and reliability (at least in software engineering context) or can they be used interchangeably? If not, what would be some examples of reliable but not necessarily stable systems, and vice versa?
gsakkis
  • 403
  • 1
  • 3
  • 5
15
votes
7 answers

What special considerations are needed when designing databases to hold financial records?

I hope this question isn't too broad. In the future I may need to add some accounting and financial-tracking systems to some applications (mostly web-based applications, but my questions pertains to desktop apps as well). Now, creating a simple…
6
votes
3 answers

Best practices for Heartbeat in distributed systems

We had in our system in the past an external data provider (call it source) sending regular heartbeats to a java application (call it client). If the heartbeat failed, system shut itself down (to avoid serving stale data in a critical application).…
senseiwu
  • 658
  • 6
  • 14
6
votes
1 answer

Need to re-build an application - how?

For our main system, we have a small monitor application that sits outside our network and periodically tries to log in to verify the system still works. We have a problem with the monitor though in that the communications component set (Asta 3…
Tom A
  • 382
  • 2
  • 16
3
votes
3 answers

How to prevent bugs in business-level configurations with similar discipline as in source code?

We have a system that allows our clients to coordinate people (shoppers) so that they can delivery groceries within 45 minutes from the order creation. Each client has a set of stores where the orders are processed by the shoppers and each shopper…
3
votes
2 answers

From a software development lifecycle perspective, is duck-typing a benefit or a problem?

Statically-typed languages such as Java afford the benefit of compile-time checking of types - you are guaranteed that an object is of a given type, so: there is no need to spend time and resources investigating the TYPE of a variable or parameter,…
2
votes
1 answer

Defining SLI / SLO for ETL and Reporting Application

All, We're just started on SRE journey and trying to define SLI / SLO for our application. It is an ETL application where 1. feeds (e.g. start of day, end of day data feeds) comes from various upstream and gets loaded with some transformation. 2.…
2
votes
0 answers

Running a high availability PostgreSQL cluster on native AWS services only

Backstory: I am unable to use RDS, as I need to install cartridges in my PostgreSQL instances. I have been trying to pin down an architecture for PostgreSQL running on EC2 instances for a few days. Most information I could find online use separate…
tjwoon
  • 29
  • 1
1
vote
3 answers

Windows Hibernate API

Is it possible to programmatically trigger the Windows's Hibernate without actually Hibernating, just to take snapshot of the OS at regular intervals? So that the system can return to the previously saved state in case of any failure. This is much…
1
vote
1 answer

Reliability for FTP Server

We have a Ftp server implemented. The manager wants to add reliability to it. He wants me to write incoming streams into some fast and reliable system (like hbase or redis) before writing them to server's hard disk. His point is if server has…
vakarami
  • 111
  • 3
1
vote
1 answer

Should an online platform be relied on to store mission critical files?

Context: With my team I created an "online platform" for a client which moved their operation from a paper based system to a content management system (CMS) based submission system on a Virturl Private Server (VPS). A large group of users submit…
0
votes
1 answer

What is the crux of difference between N version programming and self monitoring architecture?

Source-:https://cs.ccsu.edu/~stan/classes/CS410/Notes16/11-ReliabilityEngineering.html This is self monitoring architecture. So here computations carried across 2 channels, if they both provide same result then system is operating correctly else…
cuajiu
  • 9
  • 1
0
votes
2 answers

Building a program that truly deletes everything

We all know that if we delete a file, the operating system is recycling it but doesn't actually delete it. It just removes it from the directory indexes, and until the data is needed and overwritten, it will still remain there. Recently, I have…
1
2