Major Bug or Problem with dropped email!!

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

BigBirdy
Posts: 133
Joined: Sun Mar 13, 2005 2:10 pm
Location: Squamish, BC
Contact:

Major Bug or Problem with dropped email!!

Postby BigBirdy » Thu Jul 07, 2005 5:32 pm

I have been trying to resolve an issue with AES-256 encrypted zip files not getting through the mail server. Although this is an issue I need to resolve, the bigger issue is that a user can send any email, it shows in their sent box, but in reality it went absolutely nowhere except into the Scalix Error Que. No email to the admin, no message back to the sender, nothing!!! This is a huge issue!!! EVERY failed email MUST return at least something back to the user, regardless of the reason it failed (in tihs case AES-256 encrypted zip). Also, the sxadmin user, or some delegate, should also get some sort of notice. Relying on the running of the Scalix maintenance scripts to show what messages might be in the error que is simply not enough. I sent support an email about this yesterday but have not heard back but I am certain that every single deployment of Scalix in a corporate environment would expect that ALL failed email sends at least a reply back to the sender, and of course something to the admin account or some better system of tracking and notifying.

Please, we need some sort of solution to this as our office is a government agency and when users send out sensitive, and time critical emails, which they understandably assume was sent successfuly since its in their sent box, when in fact they DID NOT, they need to get some sort of response that their email failed. I have already had to deal with one high-level management question doubting my choice of Scalix. This was incredibly embarassing for me not to mention the impact to their confidence in my judgement. I am hoping there might be some serious consideration to this issue, especially since I am less than 1 month away from having my Linux consulting company putting on a presentation in 2 cities which was planning to also promote Scalix.

Scalix is a GREAT product, but as is often the case, some things could/should be improved.

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Re: Major Bug or Problem with dropped email!!

Postby ScalixSupport » Fri Jul 08, 2005 4:51 am

Johnny,

I am confused why you are raising the issue here again.

* You habe worked with Scalix support to isolate the problem to be the handling of AES encrypted ZIP files by ClamAV.
* You have been given the info you need from the ClavmAV folks ( http://www.gossamer-threads.com/lists/c ... sers/21344) to upgrade to a newer version.
* You can patch ClamAV yourself by following the instructions at http://www.gossamer-threads.com/lists/c ... ?page=last .
* You can edit clamd.conf to exlude archives from being scanned should you be unable to upgrade or patch your local installation by commenting out :

Code: Select all

# ClamAV can scan within archives and compressed files.
# Default: enabled
#ScanArchive

* You can change /var/opt/scalix/rules/ALL-ROUTES.VIR to VIRUS-UNCLEANED=1 ACTION=REJECT to generate a bounce instead of a discard (Note: THIS IS NOT REALLY A GOOD IDEA - it will generate bounces for Virus messages that are most like NOT sent by the sender)

If you look at the changelog for .86 you will notice

- libclamav/zziplib/zzip-file.c: add method id for AES encrypted archives (thanks to David Majorel <dm*lagoon.nc>) (tk)

That is exactly what you need to do. Remember, ClamAV is NOT a Scalix product. The reason why messages are being "dropped" is not a bug in Scalix, it is a "bug" in a 3rd. party application that we process messages thru. To fix this "bug", patch this:

-- zzip-file.c-orig Sat May 14 14:47:33 2005
+++ zzip-file.c Sat May 14 14:47:55 2005
@@ -192,10 +192,11 @@

case 0: /* store */
case 1: /* shrink */
case 6: /* implode */
case 8: /* inflate */
case 9: /* deflate */

+ case 99: /* aes */

break;
default:
cli_dbgmsg("ZzipLib: Unsupported compression mode (%d)\n", hdr->d_compr);

err = ZZIP_UNSUPP_COMPR;
goto error;


So, you have plenty of options to fix this easily.

Cheers!

Sascha.

BigBirdy
Posts: 133
Joined: Sun Mar 13, 2005 2:10 pm
Location: Squamish, BC
Contact:

Postby BigBirdy » Mon Jul 11, 2005 5:21 pm

Thanks for the response Sacha. The reason I was posting this here was to raise the Scalix issue of a failed email NOT sending any sort of message back to the sender. I should have made this part clearand not embedded it within all my CLAMAV questions.

I have heard back from the support email and have been informed that this issue has been logged as a bug and will hopefully be resolved in an update or the next release.

florian
Scalix
Scalix
Posts: 3852
Joined: Fri Dec 24, 2004 8:16 am
Location: Frankfurt, Germany
Contact:

Postby florian » Wed Jul 13, 2005 1:44 am

Big Birdy,

the part of the notification here is absolutely by design.

The ERROR queue and SMERR queue have been designed to collect messages that cannot be routed to the recipient because of configuration and/or software problems.

The sender is not notified because the admin, by accident wrongly configuring a server, could trigger the sending of hundreds or thousands of notification messages to the senders without doing any good because it is actually his problem.

The idea with the current structure is that he can rectify the problem, then resubmit the messages from the error queues and the only thing user's will have noticed is some delay in message delivery.

Your point of admin notification is taken; still, different customers prefer different ways of doing that. also, a large number of messages coming into the ERROR queue, could potentially trigger a large amount of notification emails... now in worst case if the misconfiguration is really bad they would as well end up in the error queue due to the error and trigger more notifications... you get the point.

Scalix was built and designed to work in Enterprise environments with the largest thinkiable message volume in mind....

Therefore, if you look at our monitoring guidelines, we do recomment checking very frequently if there is a buildup of messages on the ERROR queue and taking appropriate action. How this is done (through Nagios, eMail, etc...) is up to the user. We provide one command (ommon, see man page) and one Script template (ommaint, see the admin_resource_kit subdirectory in our tarball) to demonstrate how scripting can be used for very effective basic monitoring. Both of these tools check the size of the queues.

Hope this clarifies the reasons for this design.

Cheers,
Florian.
Florian von Kurnatowski, Die Harder!

BigBirdy
Posts: 133
Joined: Sun Mar 13, 2005 2:10 pm
Location: Squamish, BC
Contact:

Postby BigBirdy » Thu Jul 14, 2005 12:01 pm

Although I would agree, in principle, with wanting to avoid eroneous flooding of error emails to a sender based on a missconfigued server, but this issue IS NOT a result of a missconfigured server. This issue occured as a result of an AES-256 encrypted email getting stopped, and sent to the error que, with neither and administrator nor the user being automatically and clearly notified. This is not an unusual situation; enterprise environment dealing with sensitive data, hence the encryption, and using one of the recommended 3rd party anti-virus apps for a commercial email product. I am 100% certain that if you were to poll your existing user base that they would agree that there should not be any case where an email fails to get delivered, and the end user/sender is not notified. You response is the first time I have ever heard anything back from Scalix which caused me to doubt, or reduce my confidence, in the company.

florian
Scalix
Scalix
Posts: 3852
Joined: Fri Dec 24, 2004 8:16 am
Location: Frankfurt, Germany
Contact:

Postby florian » Thu Jul 14, 2005 12:57 pm

Birdy,

I possibly should have used different wording, but a failure of a Open Source 3rd party component to process email correctly (which needs to be resolved on the Admin level, by changing the faulty component) is basically the same as a misconfiguration: An unfortunate situation where email cannot be routed correctly and something must be done about it.

Particularly in this case, it is absolutely the right behaviour - The Anti-Virus engine failed for unknown reason, so the Scalix Service Router must assume that Virus Scanning MAY have failed. This is a very serious situation from a security point of view. If it just continued routing the message(s), they might contain undetected or oncleaned viruses because of that error. If it bounced the message, in very short timeframe, hundreds of messages might be bounced, resulting in a lot of trouble for senders and recipients alike.

I absolutely agree with you that a messaging system MUST NEVER and UNDER NO CIRCUMSTANCES loose email without trace; Scalix behaves correctly in that sense as well - it doesn't know what to do, therefore it queues the messages. The ERROR queue is not an ERROR log, it is a save place (even stored on-disk) for messages that can later be correctly processed. EMail messaging is asynchronous by definition, so the sender of an email message has no guarantees about the time it will take until the mssage reaches the destination. Therefore, queueing is absolutely appropriate.

The Admin will be notified by proper server monitoring about a buildup of a message backlock on one of the monitored queues. He will then look at the system event log and messages to find out what the cause of the error was. He will then fix the issue - by correcting the configuration or replacing the faulty 3rd party component - restart the service router and resubmit the messages using the omresub command. The messages will arrive intact and no data whatsover has been lost.

All by design, tested and proven - and I firmly believe that's all correct behaviour.

Open to your comments why and how this should be changed; we have our quarterly meetings on a lot of topics coming up next week, so we'll be happy to have a good internal discussion on that if we're conviced that any improvement is needed here.

greets from frankfurt,
Florian.
Florian von Kurnatowski, Die Harder!

BigBirdy
Posts: 133
Joined: Sun Mar 13, 2005 2:10 pm
Location: Squamish, BC
Contact:

Postby BigBirdy » Wed Aug 03, 2005 5:04 pm

I certainly agree with Scalix in this issue being a result of a 3rd party app. But my biggest concern, which has nothing to do with WHY or HOW a message fails. I think it would be far more useful and beneficial to users and admins alike if REGARDLESS of the reason for an email failure, error que, missconfiguration, bad address, lost network connection, whatever, that the USER/SENDER is sent a message about the failed delivery. Also, Scalix provides little or no out-of-the-box monitoring and so I am certain that many users never even get to the point of setting up the monitoring scripts, or even the Nagios scripts. In those cases (no :extra" effort put into setting up the monitoring), users could easily be losing mail and they would never nkow about it. This, IMHO is a very serious, yet easily rectified issue.

So for your monthly meetings to address/discussfeatures, how about some changes to ensure that ALL failed email, regardless of the reason or where they get placed after failing, the sxadmin user AND the sender receive an error message. My next addition would be to include some sort of alerts/error section in the SAC to setup a user to be forwarded all system/alerts.

My 2 bits :) And thanks for your thorough reply

tonyn
Scalix
Scalix
Posts: 12
Joined: Mon Jul 04, 2005 4:33 am

Postby tonyn » Thu Aug 04, 2005 4:46 am

I think the key thing here, which Sascha actually mentioned in an earlier reply - but may have been overlooked, is that virus-infected messages are a special case. With the majority of virii, the sender of the message has been spoofed - so there is little point in sending an NDN back to the "sender" who may know nothing about the message that was sent. In my opinion the best action is to DISCARD these messages. But as Sascha said, if you really want to, you can set ACTION=REJECT in your /var/opt/scalix/rules/ALL-ROUTES.VIR file, and then Scalix WILL send an NDN back. However, as I said, I do not think this is what you really want.

kanderson

Scalix doesn't provide a monitoring tool?

Postby kanderson » Thu Aug 04, 2005 6:45 pm

Johnny, you can monitor this with several of the various tools Florian mentioned, but the one that wasn't mentioned is even easier.

If you look at the queue via SAC, you'll see that there are messages sitting in the error queue. This should be regularly checked, and you'd see the error there immediately. For the rare occasion that this happens, waiting a while for the message just isn't a problem.

Kev.


Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 22 guests

cron