|
Hashcash FAQClick here for a French translsation of this FAQ.This article is translated to Serbo-Croatian language by Jovana Milutinovich from Geeks Education.
There are two problems with spam:
Email loss due to spam and anti-spam technology happens for a number of reasons, including:
Hashcash is a technological approach to reducing the impact of spam. Hashcash aims to make email more reliable. It is a companion technology which should be used with any anti-spam technology to avoid that anti-spam technology adversely affecting email reliability. Whatever anti-spam technology you are using, you want it to be configured so that hashcash can bypass what ever filters and blocks it puts in place so that other hashcash users will be able to still reliably send you mail. Similarly as a sender you want to send hashcash to bypass such filters so that you can make your email as reliable as possible.
Hashcash comes in the form of plugin software for mailers which adds hashcash stamps to sent email. The hashcash plugin software inserts a X-Hashcash: header into the email headers section of the email the user sends. The following is an example of an email addressed to me with a hashcash stamp in the email headers:
From: Someone <test@test.invalid>
Spammers can use hashcash too, however hashcash is bad news for spammers because the hashcash stamp takes your CPU some work to compute. To you as a normal user, with an entry level desktop or laptop class machine the CPU overhead per mail is negligible because you don't send that many mails; at worst your mail is delayed a few seconds before being sent on slow old hardware. However to spammers this is a show-stopper: they want to send 10,000+ emails per minute down a DSL line bought with a stolen credit card quick before the account gets cancelled.
Spammers already compromise security on many users machines to make so-called "Zombie" armies to send spam from. However currently the rate at which spammers can send mail on a zombie machine is limited purely by the speed of those machine's internet links. A typical DSL user might be able to send 25 unique messages per second each of size 1KB (assumes 256kbit uplink). Or many more messages per second if the messages are delivered to multiple users at once (using multiple Cc or Bcc recipients). Even a 20-bit stamp takes 1/2 second per recipient on the highest end pc hardware at time of writing. This would slow spammers down by a factor of 10-100 or more per compromised machine (depending on whether the messages sent are sent individually or to many users at once).
Spammers commonly optimize the amount of spam they can send over a given link speed by delivering messages to 100s or 1000s of Bcc recipients at once directly to an end-site, or to an ISP mail-hub. In this way they can consume just 3.5KB of bandwidth in sending messages to 100 recipients compared to the 100KB which would be used to send each message separately. This would allow a spammer to send 700 messages per second (assumes DSL with 256kbit uplink). Delivering in batches reduces the degree of customization the spammer can make because all of the message bodies in a batch have to be the same, but never-the-less is a trick spammers commonly use to increase the number of mails per second they can send. However with hashcash a separate stamp is required for each individual recipient, which stops this spammer trick. If the spammer has to put a hashcash stamp for each recipient, even a 3Ghz Pentium 4 can only generate 2 stamps per second, compared to 700 per second with no hashcash, so using hashcash in this scenario slows the number of mails the spammer can send by 350x.
Not really. At worst your mail is delayed a few seconds before being sent. But better quality plugins will have already created the stamp while you are composing your mail, or will have speculated that you may want to reply to mail you receive and have stamps ready to go for when you do. In addition some hashcash plugins may automatically white-list, or exempt people who you communicate with frequently: for example people in your address book, or people you reply to. One example is the CAMRAM hashcash based system which does auto white-listing. (In case you were wondering CAMRAM stands for CAMpaign for ReAl Mail, it's a pun on a British Ale commercial). Auto white-lists reduce the overhead for normal users still further, because then your hashcash plugin will only be creating hashcash stamps on the first mail you send to a new contact. But it doesn't help spammers as they are not engaging in a two-way communication -- they are spamming which is an inherently one-way process and few users are likely to add the spammer to their white-list.
ISPs and recipients who use anti-spam technologies such as keyword filtering, known spammer blacklists, missing reverse-DNS checks, etc are starting to use hashcash as an anti-spam exemption mechanism. Your mail has a form of postage on it -- the hashcash stamp -- and sails through anti-spam check-points. This helps reliability because the spam-detectors are busy and error prone and frequently block lots of non-spam.
Hashcash is supported in SpamAssassin as of version 2.70. SpamAssassin is a popular user and ISP anti-spam tool to add hashcash support. SpamAssassin supports keyword filtering (and other techniques) to weed out spam. If you look in your mail headers for X-Spam-Checker-Version: SpamAssassin then mail you receive is being examined by SpamAssassin. Hashcash is also supported by TMDA and CAMRAM. This means by sending hashcash on your mails you can virtually eliminate your chances of getting a false positive and hence of the mail you send not getting delivered, or getting delivered into a junk folder where the receiving ISP or user is using SpamAssassin, TMDA or CAMRAM. The number of hashcash supporting systems is growing. If you are interested in adding hashcash postage support to an anti-spam system, contact me, Adam Back <adam@cypherspace.org> and I'll do what I can to help you.
SpamAssassin is quite widely used by ISPs. If you look in your mail headers for X-Spam-Checker-Version: SpamAssassin then mail you receive is being examined by SpamAssassin. Even if your ISP does not use SpamAssassin, consider: spam is growing in volume at a very high rate. Estimates vary, but are in the range of 10% per month! At that rate email reliability and usefulness will degrade fast. Anti-spam technology is likely to be stepped up in attempts to squelch out the tide of spam -- and this will make the false positive rate worse, making your email ever less reliable. Already many ISPs report excess of 50% of email throughput is spam. Hashcash is something practical that can be done to avert disaster. But like anything else it has a momentum that has to be built: the more users the more demand there is for anti-spam systems to support hashcash postage; similarly the more anti-spam systems that support hashcash postage, the more value there is to a user in using a hashcash plugin to increase reliability. Be an early adopter, participate in the solution. Well that is a question for each user. The trade-off is if you start bouncing mail without hashcash you may not receive mail that you wish to receive. For general use one should be patient and I figure that point is 10 years or more out, if it ever comes. There are other more militant view-points however; people who have become so sick of spam, and who receive few unsolicited mails that they would want to read that they're willing to go straight there. There are things one can put in the bounce message which allow the sender to compute hashcash stamps. For example there is a java applet which allows anyone with a web browser to compute stamps. Also the CAMRAM interim approach is to have alternate means to become white-listed, just by replying to emails.
There are many problems in math where it is much easier to verify the solution than to compute the solution. A simple one is computing square roots. It is more complex and it takes a computer longer to compute a square root than to verify it. Recall verifying a square root is just multiplication: y = sqrt(y) x sqrt(y). No, but it's not far from the truth. In fact Dwork and Naor seriously proposed using square roots as a proof of work function in their 1992 paper on the topic. To use their square-root approach, you'd have to use big numbers -- 1000s of digits long -- because computers are insanely fast at computing square roots on normal sized numbers. What hashcash actually uses are things called partial hash-collisions. Partial hash-collisions are significantly faster to verify and simpler to program than square-roots of big numbers. They are also smaller (which makes them nicer to put in email headers and work with). Hashcash is also non-interactive which is a useful property for email use, where you don't want to wait for the recipient's auto-responder to bounce your email with the number to take the square root of. With hashcash you the sender can choose the string to compute partial-hash collisions on, so no interaction is required. (Technical note: the square-root approach has a non-interactive variant proposed by the author involving a hash-function and taking cube-roots instead.)
A hash function is a cryptographic function for which it is supposed to be hard to find two inputs that produce the same output. Common hash functions are MD5 and SHA1. (Hashcash uses the SHA1 hash function). Cryptographic hash functions such as SHA1 are designed to be collision resistant. This means it is supposed to be very hard to find SHA1(x) == SHA1(y) where x != y. For SHA1 it is expected that it would take around 2^160 tries of different y values until the same output was obtained as for a given x value. (Technical note: this latter problem, is called 2nd-preimage resistance, because you start with a given pre-image x, and try to find another pre-image y. A regular hash collision would be where you try to find two arbitrary x and y values that give the same output. Arbitrary collisions are a lot easier to find: around 2^80 operations, due to a principle known as the birthday-paradox).
As computing a full hash-collision is computationally infeasible -- there isn't enough compute power on the planet to create one in the next 100 years -- we'd like to simplify the problem. A simple way to do that is to accept a partial-collision. Ie where a full-collision would be that all bits of SHA1(x) must match SHA1(y), a k-bit partial collision would be where only the k most-significant bits match. If we take the 16 most significant bits for example, a 16-bit partial hash-collision becomes very much more practical. In fact my workstation (an ageing 400Mhz PII) can compute one in about 1/3 of a second. (Technical note: strictly this is a partial 2nd-preimage because we start with a given x and try to find a 2nd-preimage such that the outputs match in the 16 most significant bits).
Basically on the recipient's email address. In practice there are a few other details. What hashcash actually does is look for collisions on strings such as: 0:030626:adam@cypherspace.org:6470e06d773e05a8 where you can see in there a date (030626 = 2003 Jun 26th), and an email address (mine adam@cypherspace.org). The first field (the 0:) is the stamp version number, and is fixed to 0 for now. The last field -- the string of random letters is just some garbage so we can find a collision. (We have to try lots of different strings, approximately 2^16 for a 16 bit collision.)
That is one of the neat things about hashcash. It is defined using SHA1, so if you have a sha1 implementation handy, you can try it out. The above stamp hashed (with no newline) gives: echo -n 0:030626:adam@cypherspace.org:6470e06d773e05a8 | sha1As you can see the first 8 hex digits are 0. I didn't explain this above, but hashcash tries to find a collision with the all 0 string. So the above stamp is a 32-bit collision. It's an impressively big collision which took my 400 Mhz PII about 7 hours to compute. But for normal email you would use stamps in the range of 16 - 20 bits (a fraction of a second to a few seconds on most hardware).
Well you don't need to follow the math of cryptographic hash functions to understand how it works. The square-root example given earlier is a fine analogy for how it works. The sender can compute something related to the recipients email address (the square-root of it in the analogy), and the recipient can verify it (by squaring it in the analogy). The recipient knows the sender created this stamp just for him (not for someone else) because the answer (the square root) is of the recipient's address. And it doesn't cost the recipient much to verify stamps. You could even do exactly that. The only reason hashcash doesn't is because it's more efficient to use partial hash-collsions, though the effect is exactly the same.
No, because stamps are only valid for one recipient. Stamps are a bit like a check: there is an identified recipient. If a stamp is minted for joe@foo.com, then all recipients other than joe@foo.com will reject the stamp because it is not minted for them.
No, because the stamp is computed on the destination email address. If the email address is changed, the stamp verification will fail. There is no way to change the email address in an existing stamp without computing a fresh stamp from scratch on the new email address.
No, because stamps are only valid for one use. Each recipient keeps a double-spend database to enforce this rule, if a message with an already spent stamp is received it is rejected.
No because the recipient only needs to keep currently valid stamps, expired stamps can be removed from the recipient's double-spend database. Each stamp includes a creation date, and expiry is measured relative to that.
No, because the recipient will reject expired stamps if they are re-used after expiry based on their old creation date.
No, because the stamp is computed on the creation date also. If the creation date is changed, the stamp verification will fail. There is no way to change the creation date in an existing stamp without computing a fresh stamp from scratch on the new creation date.
No, because the cost would be prohibitive. Hashcash does not store stamps in the double-spend database unless they are valid and have sufficient value. So it costs the sender significantly more to create a valid stamp than it costs you to store it. After the stamps expire they will be removed from your double-spend database, so the storage is reclaimed. Also the cost of storing the mail will be significantly larger than the cost of storing the compact hashcash stamp.
No, because stamps with creation dates in the future are rejected as invalid and not stored in the double-spend database. The assumption here is that by putting fake creation dates very far into the future the hashcash client will not expire the tokens for a long time, rejecting futuristic stamps avoids this issue.
If this happened it would be a problem as the second use of the stamp would be rejected as invalid. However hashcash is designed so that it is exceedingly unlikely that this would ever happen. The probability of it happening is similar to the probability of winning the national lottery every week for weeks in a row. Hashcash is very efficient to verify. Each stamp takes about 2 microseconds to verify on a 1Ghz machine. To put it another way, the same single machine could verify stamps faster than you could deliver emails over an OC12 (a really fast expensive link ~ 1Gbit/sec rate). If someone is sending you mails that fast, your bottleneck will be your TCP stack, mail server and operating system. Verifying hashcash for users will not noticably increase mail server load because verifying hashcash stamps has much lower overhead than the many other operations that go into accepting delivery of an email.
There should be one X-Hashcash: header per recipient. Each recipient looks for a header that is addressed to him and verifies it. To preserve the privacy of Bcc recipients and the existing Bcc semantics (that other recipients do not know there are Bcc recipients) each Bcc'd email should be delivered separately. However Bcc: is falling into disuse due to spammers. Spammers like to use Bcc because it doesn't look so obvious as seeing a mail with 100 or 1000 Cc: recipients. As a result some people have started just deleting email which is not To: or Cc: to them. (Actually the author is also guilty of this because it was easy and effective for a while). Sure. You just have to tell your hashcash plugin what addresses you wish to accept mail as. eg. So you have two addresses foo@pobox.com and foo@isp.com and your pobox.com address is forwarded to your isp.com address where you pick your mail up from. Then you just tell your hashcash plugin that you receive mail as foo@isp.com, plus the alternate email address: foo@pobox.com. The mailing list server should not create hashcash postage for each recipient, that really would overload it. When sending mail to a mailing list hashcash clients will consider the mailing-list address as the recipient. In fact they will do this for free because mailing list addresses just look like an ordinary email address as far as a mail-client is aware. Then users who sign up to a mailing list have to accept mail from the mailing-list address. When you join a mailing list and setup the mail filters in your mail client, similarly you are instructing the mail client (and its hashcash plugin) that you are willing to receive email from that address. So a mailing-list as far as hashcash is concerned is just another alternate email address that you are willing to receive mail as. Yes. Here's how it works: consider a spammer subscribes to a mailing list to which users are posting messages with hashcash postage. If the spammer is quick he can receive a hashcash postage stamp before some other users and re-use it to spam those users without paying for the cost of creating the stamp. With some mailing lists you can discover the subscriber address list just by asking the list server. But in any case posters necessarily expose their addresses. However this is a problem with mailing lists, not with hashcash. Hashcash is intended to be verified by the (single) recipient. The recipient is the mailing list server. Clearly any hashcash postage stamps left on by the mailing list server can be subjected to the above race condition attack. The lack of mailing list authentication is an existing problem independent of hashcash. Let's say there is a mailing list that has a moderator, or poster only or some such rule. Now a spammer can forge a message as having come from the mailing list address and all of the recipients will preferentially process it into the folder the user has set up for that list. I haven't seen this in the wild, but I expect it may already be happening; if not only because there may be easier attacks on too many mailing lists for the attacker to bother. (No spam moderator, no poster only restrictions etc.) There are other approaches which have been used to authenticate mailing list traffic. For example there is software to have the mailing list server PGP sign the messages it sends. A hashcash specific approach (avoiding signatures) would be for the hashcash postage stamp to include a hash of the message body also. This prevents someone exploiting a race-condition taking and pre-spending the stamp, and it also prevents race-conditions being used as a denial-of-service. (When the race-condition is exploited those users who get the spam first will never see the real message their client will consider it double-spent.) However including a message body hash is problematic because of MTA transformations. It is suprisingly difficult to reliably send exact body contents without transformations changing it slighlty. (Blank lines, encoding, etc). Similar challenges are faced by digital signature systems such as PGP and S/MIME which apply respectively text canonicalization rules and MIME encoding to protect the email. See also section on USENET. They might. Well in fact they already are. You can see the attraction: they get a for-free open-relay -- the mail server -- you send it mail and it sends mail to the thousands of users who subscribe to the list in question. Even with hashcash the spammer gets more bang for his buck: he computes one stamp and gets to deliver to many recipients. There are different things that can and have been done to combat this. (Again these anti-spam approaches cause mailing-list related email loss for users).
There is also a collaborative filtering system called NoCeMs, though this requires client software, or at minimum delivery delays while the NoCeMs are accumulated. Yes in theory it does. In practice the problem is typically not that significant because the would-be spammer usually has less control over which emails he receives, and typically they will be sent to far fewer people and so contain far fewer addresses that can have delivery raced. Another approach to defend against this for email (where there is a big and potentially untrusted list of recipients) is to use Bcc for delivery. With Bcc each mail gets delivered separately so each recipient only sees the stamp addressed to himself. However Bcc is sometimes less reliably read due to historical spammer abuse of the Bcc semantics. Another approach would be to use Bcc-like delivery (separate delivery for each message) while retaining Cc headers. This ensures that each recipient only receives the hashcash postage stamp addressed to himself. Also the same approach as with mailing-lists (of including the message body hash) would also work in this context. Generally however excessively long Cc lists each with hashcash would be less common as there is a CPU cost associated which starts to add up if the list is in the 100s or 1000s. In this case the sender is exhibiting spam-like characteristics which hashcash is penalizing. The sender would need ideally to have the recipients opt-in or treat him as a mailing list so that hashcash is not required for delivery. If no authentication is used, white-lists could be abused. Here's how it would work: white-listed users are users who don't require hashcash postage from each other. If a spammer could capture your address-book, or white-list he could forge mail to your circle of friends pretending to be you and not have to pay postage. The fix for this problem is to use authentication even though the users are white-listed. I mentioned earlier in the FAQ that CAMRAM is one hashcash based system which offers auto white-list functionality. The way CAMRAM authenticates its white-lists is that hashcash is used to introduce yourself to other users, but once that's done a signature is used. This prevents white-list abuse. Actually for short-term deployability CAMRAM also introduces alternate introduction methods and in those cases there is no signature, so its white-lists probably would be vulnerable to the white-list abuse scenario in theory. However at this point in time this is a 2nd order effect that is unlikely to be attacked. Mail2news gateways are email addresses that allow you to post to USENET. Their main function is for use with anonymous remailers, as there are some remailers which can only deliver to email addresses, and yet the remailer users would like to post to USENET. This is a different case than USENET posts. There may be additional or different hashcash requirements imposed by a given mail2news gateway just to throttle abuse of its services. Michael Shinn and Alex de Joode are experimenting with hashcash postage for mail2news gateways. Michael Shinn also is experimenting with pseudonymous account based posting allowances. As with mail2news gateways, individual remailers may require hashcash to throttle abuse of their services. I believe there was software written to support this, and if I recall there was an experimental remailer that supported hashcash for delivery. However the practice is not in general use. It might however be useful where email is the transport protocol used between remailer hops for the sender to be able to provide hashcash for each hop to increase reliability of the delivery. Email reliability problems are suspected to be a major reliability issue for type I and II remailers; the problem is exacerbated as the sender never gets to see the bounce messages when things go wrong further down the chain. But a better idea still is to not use email as the transport between remailers (as the lost bounce message problem is systemic to the usage pattern). Mixminion (which is also called a type III remailer) uses by default an interactive SSL connection over TCP. As well as reliability this provides forward-secrecy as a forward-secret ciphersuite can be used. |