Architecture, MSMQ, NServiceBus

Are you working on a distributed system? Microservices, Web APIs, SOA, web server, application server, database server, cache server, load balancer – if these describe components in your system’s design, then the answer is yes. Distributed systems are comprised of many computers that coordinate to achieve a common goal.

More than 20 years ago Peter Deutsch and James Gosling defined the 8 fallacies of distributed computing. These are false assumptions that many developers make about distributed systems. These are usually proven wrong in the long run, leading to hard to fix bugs.

The 8 fallacies are:

  1. The network is reliable
  2. Latency is zero
  3. Bandwidth is infinite
  4. The network is secure
  5. Topology doesn’t change
  6. There is one administrator
  7. Transport cost is zero
  8. The network is homogeneous

Let’s go through each fallacy, discussing the problem and potential solutions.

1. The network is reliable

Problem

Calls over a network will fail.

Most of the systems today make calls to other systems. Are you integrating with 3rd party systems (payment gateways, accounting systems, CRMs)? Are you doing web service calls? What happens if a call fails? If you’re querying data, a simple retry will do. But what happens if you’re sending a command? Let’s take a simple example:

var creditCardProcessor = new CreditCardPaymentService();
creditCardProcessor.Charge(chargeRequest);

What happens if we receive an HTTP timeout exception? If the server did not process the request, then we can retry. But, if it did process the request, we need to make sure we are not double charging the customer. You can do this by making the server idempotent. This means that if you call it 10 times with the same charge request, the customer will be charged only once. If you’re not properly handling these errors, you’re system is nondeterministic. Handling all these cases can get quite complex really fast.

Solutions

So, if calls over a network can fail, what can we do? Well, we could automatically retry. Queuing systems are very good at this. They usually use a pattern called store and forward. They store a message locally, before forwarding it to the recipient. If the recipient is offline, the queuing system will retry sending the message. MSMQ is an example of such a queuing system.

But this change will have a big impact on the design of your system. You are moving from a request/response model to fire and forget. Since you are not waiting for a response anymore, you need to change the user journeys through your system. You cannot just replace each web service call with a queue send.

Conclusion

You might say that networks are more reliable these days – and they are. But stuff happens. Hardware and software can fail – power supplies, routers, failed updates or patches, weak wireless signals, network congestion, rodents or sharks. Yes, sharks: Google is reinforcing undersea data cables with Kevlar after a series of shark bites.

And there’s also the people side. People can start DDOS attacks or they can sabotage physical equipment.

Does this mean that you need to drop your current technology stack and use a messaging system? Probably not! You need to weigh the risk of failure with the investment that you need to make. You can minimize the chance of failure by investing in infrastructure and software. In many cases, failure is an option. But you do need to consider failure when designing distributed systems.

Continue Reading

MSMQ, NServiceBus

In the previous blog post we went over some of the MSMQ bascis. In this blog post we’ll touch on how to monitor, troubleshoot and backup MSMQ.

I’ve been using NServiceBus with the MSMQ transport for a while now and have faced some issues. Unfortunately, information about MSMQ is pretty scarce and sometimes outdated. This blog post contains links to resources that I’ve found useful for understanding some of the best practices around MSMQ administration.

Monitoring

Events

MSMQ logs informational, warning and error events under the Application Log. When accessing an object fails or succeeds, it also adds an audit entry in the Security Log. All message queuing events contain the “MSMQ” text in the Source column.

If you have enabled End-to-End tracing, you can find the trace events in Application and Services Logs/Microsoft/Windows/MSMQ/End2End.

Dead-Letter Queues

When a message expires, the queue manager puts it in one of the dead letter queues. This ensures that messages are not lost. A message expires when one of its timers (Time-To-Reach-Queue or Time-To-Be-Received) expire. There are two dead-letter queues, one for non-transactional messages and one for transactional messages. In order to keep track of expired messages, you should monitor the dead-letter queues.

External Transactions

MSMQ can take part in external transactions. You can monitor these transactions by using the Distributed Transaction Coordinator Transaction Statistics and Transaction List views. Since internal transactions don’t go through MSDTC, these can’t be monitored using this approach.

Performance Counters

MSMQ provides performance counters that are helpful for monitoring its performance.

The MSMQ Service performance object contains global information about the Message Queuing Service:

  • Total bytes in all queues – This is very important, since you don’t want to run out of disk space
  • Total messages in all queues
  • MSMQ Incoming Messages
  • MSMQ Outgoing Messages
  • Incoming Messages/sec
  • Outgoing Messages/sec
  • Sessions – The total count of open network sessions

The MSMQ Queue performance object contains counters for each individual queue. If you want to monitor the Dead-Letter queues, you should select Computer Queues:

  • Bytes in Queue
  • Messages in Queue
  • Bytes in Journal Queue
  • Messages in Journal Queue

Since MSMQ is disk intensive, it is recommended to spread out MSMQ’s storage files over multiple disks.

Troubleshooting

The good news is that, usually, MSMQ just works. Most of the problems that you encounter are caused by either DTC or not having enough disk space.

For investigating DTC issues, you should use DTCPing. Here is a blog post documenting how to troubleshoot DTC issues.

Disk space depends on the throughput of messages that flow through your system, their size, and the amount of time it takes to recover from a failure (and start processing messages again). You don’t want to start losing messages because you can’t store them on disk. You should monitor disk space using performance counters. If there is not enough disk space, you can either increase the MSMQ computer quota and maybe change the MSMQ storage location.

If you want to find out more about MSMQ troubleshooting, check out these links:

Backup and Restore

MSMQ comes with a command line utility – mqbkup – for backup and restore. Using this tool you can backup storage files, log files, transaction files, and Registry settings. I’ve also heard that QueueExplorer is a good tool for managing MSMQ, although I haven’t tried it yet.

Conclusion

In my limited experience with MSMQ, I have ran into a couple of issues. Most of them were caused by the fact that I didn’t monitor it well enough or I didn’t understand how it works under the hood. In this blog post I’ve summarized some of the best practices around MSMQ monitoring and troubleshooting. I’ll probably update this post whenever I learn something new about MSMQ.

MSMQ

Microsoft Message Queue Server (MSMQ) is a Message Oriented Middleware that allows applications to communicate among them using queues. In this blog post we’ll go over some of the MSMQ basics: Queues, Messages, and Transactions.

Queues

The queue is one of the basic concepts of MSMQ. It is just a container that stores messages, decoupling the sender from the receiver. MSMQ Queues are not necessarily FIFO (First In, First Out), because messages can be prioritized.

Queues can be transactional or nontransactional. Transactional Queues can only receive  messages sent within a transactional context. Nontransactional queues can only receive messages sent outside of a transactional context. Messages sent in a transactional context are processed in the order in which they were sent.

There are two categories of queues: Application Queues and System Queues.

Continue Reading