Klientų aptarnavimas: +370 652 87781

Pagalba | Naujas vartotojas | Prisijungti

Ending Spam [Minkštas viršelis]

3.70/5 (60 ratings by Goodreads)

Jonathan A. Zdziarski

Formatas: Paperback / softback, 312 pages, aukštis x plotis: 234x178 mm
Išleidimo metai: 07-May-2005
Leidėjas: No Starch Press,US
ISBN-10: 1593270526
ISBN-13: 9781593270520

Kitos knygos pagal šią temą:

Email: consumer/user guides

Minkštas viršelis
Kaina: 55,89 €*
* Ši knyga nebeleidžiama, tačiau mes Jums pranešime naudoto egzemplioriaus kainą
Ši knyga nebeleidžiama, tačiau mes Jums pranešime naudoto egzemplioriaus kainą.
Kiekis:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Įdėti į krepšelį
Įtraukti į pageidavimų sąrašą

Formatas: Paperback / softback, 312 pages, aukštis x plotis: 234x178 mm
Išleidimo metai: 07-May-2005
Leidėjas: No Starch Press,US
ISBN-10: 1593270526
ISBN-13: 9781593270520

Kitos knygos pagal šią temą:

Email: consumer/user guides

Pastovi nuoroda: https://www.kriso.lt/db/9781593270520.html

Raktažodžiai:

Explains how spam works, how network administrators can implement spam filters, or how programmers can develop new remarkably accurate filters using language classification and machine learning. Original. (Advanced) Zdziarski, who maintains a spam filter that can achieve a level of accuracy up to 99.985 percent, leads the charge against what has become a very significant challenge to both productivity and sanity by explaining how it was spawned, its anatomy and physiognomy, how it crawled into our pristine PCs, and how, as responsible and gleeful citizens, we can torch it. He covers how early spam wars were fought and the lessons learned, language classification concepts and statistical filtering fundamentals, and advanced concepts such as testing theory, concept identification, Markovian discrimination, intelligent feature set reductions, and collaborative algorithms. The final chapter includes what Zdziarski terms shining examples of filtering. Annotation ©2005 Book News, Inc., Portland, OR (booknews.com) Fascinating reading for any geek, this landmark title describes, in-depth, how statistical filtering is being used by next generation spam filters to identify and filter spam. Join author John Zdziarski for a look inside the brilliant minds that have conceived clever new ways to fight spam in all its nefarious forms. This landmark title describes, in-depth, how statistical filtering is being used by next-generation spam filters to identify and filter unwanted messages, how spam filtering works and how language classification and machine learning combine to produce remarkably accurate spam filters. After reading Ending Spam, youll have a complete understanding of the mathematical approaches used by todays spam filters as well as decoding, tokenization, various algorithms (including Bayesian analysis and Markovian discrimination) and the benefits of using open-source solutions to end spam. Zdziarski interviewed creators of many of the best spam filters and has included their insights in this revealing examination of the anti-spam crusade. If youre a programmer designing a new spam filter, a network admin implementing a spam-filtering solution, or just someone whos curious about how spam filters work and the tactics spammers use to evade them, Ending Spam will serve as an informative analysis of the war against spammers. TOC Introduction PART I: An Introduction to Spam Filtering Chapter 1: The History of Spam Chapter 2: Historical Approaches to Fighting Spam Chapter 3: Language Classification Concepts Chapter 4: Statistical Filtering Fundamentals PART II: Fundamentals of Statistical Filtering Chapter 5: Decoding: Uncombobulating Messages Chapter 6: Tokenization: The Building Blocks of Spam Chapter 7: The Low-Down Dirty Tricks of Spammers Chapter 8: Data Storage for a Zillion RecordsChapter 9: Scaling in Large Environments PART III: Advanced Concepts of Statistical Filtering Chapter 10: Testing Theory Chapter 11: Concept Identification: Advanced Tokenization Chapter 12: Fifth-Order Markovian Discrimination Chapter 13: Intelligent Feature Set Reduction Chapter 14: Collaborative Algorithms Appendix: Shining Examples of Filtering Index

INTRODUCTION

xvii

PART I AN INTRODUCTION TO SPAM FILTERING

1 THE HISTORY OF SPAM

(22)

The Definition of Spam

(1)

The Very First Spam

(3)

Spam: The Early Years

(10)

Jay-Jay's College Fund

(2)

The Jesus Spam

(1)

Canter & Siegel

(3)

Cancelmoose

(1)

Jeff Slaton, the "Spam King"

(1)

"Krazy" Kevin Lipsitz

(1)

Stanford Wallace, Cyber Promotions

(1)

Floodgate-The First Spamware

(1)

Other Significant Events in 1995

(1)

War Waged on Spam

(2)

Spamhaus

(2)

Unsolicited Commercial Email

(1)

Spam Out of Control

(4)

1998, 1999, and 2000: Three Years of War on Spam

(2)

Network Solutions

(1)

2001 to the Present: Exponential Spam Growth

(1)

Final Thoughts

(2)

2 HISTORICAL APPROACHES TO FIGHTING SPAM

(20)

Primitive Language Analysis

(1)

Blacklisting

(2)

Propagation and Maintenance Problems

(1)

Heuristic Filtering

(3)

Brightmail

(1)

SpamAssassin

(1)

Drawbacks to Heuristic Filtering

(1)

Maintenance Headaches

(1)

Scoring

(1)

Whitelisting

(2)

A Little Too Effective

(1)

Forgeries

(1)

Challenge/Response

(1)

Problems with Challenge/Response

(1)

Throttling

(2)

TarProxy

(1)

Other Throttling Tools

(1)

Collaborative Filtering

(1)

Address Obfuscation

(1)

New Standards

(2)

Authenticated SMTP

(1)

Sender Policy Framework

(1)

Litigation

(3)

Spammer Fingerprinting

(1)

Intellectual Property

(1)

Final Thoughts

(1)

3 LANGUAGE CLASSIFICATION CONCEPTS

(18)

Understanding Accuracy

(1)

Machine Learning

(1)

Concept Learning

(1)

Using Language Classification to Fight Spam

(2)

Training

(1)

Statistical Filtering and Bayesian Analysis

(1)

Components of a Language Classifier

(5)

The Historical Dataset

(1)

The Tokenizer

(2)

The Analysis Engine

(1)

Providing Feedback

(1)

Training

(3)

Train-Everything (TEFT)

(1)

Train-on-Error (TOE)

(1)

Train-Until-Mature (TUM)

(1)

Train-Until-No-Errors (TUNE)

(1)

When to Train

(1)

An Example of a Filter Instance

(2)

Step 1: Tokenize the Message

(1)

Step 2: Build a Decision Matrix

(1)

Step 3: Evaluate the Decision Matrix

(1)

Step 4: Train the Message

(1)

Step 5: Correct Errors

(1)

Efficacy of Statistical Filtering

(1)

The Future of Language Classification

(1)

The Sovereignty of Statistical Filtering

(1)

Final Thoughts

(1)

4 STATISTICAL FILTERING FUNDAMENTALS

(24)

An Imperfect Solution

(1)

Building a Historical Dataset

(7)

Corpus Feeding

(1)

Starting from Scratch

(1)

Correcting Errors

(1)

The Tokenizer and Calculating Token Values

(2)

Single-Corpus Tokens

(1)

A Biased Filter

(1)

Hapaxes

(1)

Final Product

(1)

The Analysis Engine

(2)

Sorting

(1)

Statistical Combination

(6)

Bayesian Combination (Paul Graham)

(1)

Bayesian Combination (Brian Burton)

(2)

Robinson's Geometric Mean Test

(1)

Fisher-Robinson's Inverse Chi-Square

(1)

Improvements to Statistical Analysis

(3)

Improving the Decision Matrix

(1)

Improvements to Tokenization

(1)

Statistical Sedation

(1)

Iterative Training

(1)

Learning New Tricks

(1)

Final Thoughts

(4)

PART II FUNDAMENTALS OF STATISTICAL FILTERING

5 DECODING: UNCOMBOBULATING MESSAGES

(10)

Introduction to Encoding

(1)

Decoding

(1)

Message Body Encodings

(3)

Quoted-Printable Encoding

(1)

Base64 Encoding

(1)

Custom Encodings

(1)

Message Header Encodings

(1)

HTML Encodings

(1)

Message Actualization

(1)

Supporting Software

(1)

Final Thoughts

(2)

6 TOKENIZATION: THE BUILDING BLOCKS OF SPAM

(14)

Tokenizing a Heuristic Function

(1)

Basic Delimiters

(1)

Redundancy

(1)

Other Delimiters

100

(1)

Exceptions

101

(1)

Token Reassembly

101

(1)

Degeneration

102

(1)

Header Optimizations

103

(1)

URL Optimizations

104

(1)

HTML Tokenization

105

(2)

Word Pairs

107

(1)

Sparse Binary Polynomial Hashing

108

(1)

Internationalization

108

(1)

Final Thoughts

109

(2)

7 THE LOW-DOWN DIRTY TRICKS OF SPAMMERS

111

(30)

Successful Filtering

112

(1)

No More Headaches

112

(1)

A Weak Link in Statistical Filters?

113

(1)

Attacks on Tokenizers

113

(12)

Encoding Abuses

114

(1)

Header Encodings

114

(1)

Hypertextus Interruptus

115

(2)

ASCII Spam

117

(2)

Text-Splitting

119

(2)

Table-Based Obfuscation

121

(2)

URL Encodings

123

(1)

Symbolic Text

124

(1)

Just Plain Dumb

124

(1)

Attacks on the Dataset

125

(7)

Mailing List Attacks

126

(1)

Bayesian Poisoning

127

(3)

Empty but Not Empty Probes

130

(2)

Attacks on the Decision Matrix

132

(7)

Image Spams

132

(2)

Random Strings of Text

134

(1)

Word Salad

135

(2)

Directed Attacks

137

(2)

Final Thoughts

139

(2)

8 DATA STORAGE FOR A ZILLION RECORDS

141

(16)

Storage Considerations

142

(3)

Disk Space

142

(1)

Speed

142

(1)

Locking

143

(1)

Portability

143

(1)

Statefulness

143

(1)

Recovery

143

(1)

I/O Contention

144

(1)

Random-Access Features

144

(1)

Ease of Use

144

(1)

Storage Framework

145

(2)

Third-Party Storage Solutions

147

(8)

Stateless Database Implementations

147

(2)

Stateful SQL-Based Solutions

149

(2)

Peter Graf's PBL ISAM Library

151

(2)

SQLite

153

(2)

Proprietary Implementations

155

(1)

Final Thoughts

155

(2)

9 SCALING IN LARGE ENVIRONMENTS

157

Requirements Assessment

158

(9)

Total Disk Space Requirements

159

(2)

Total Processing Power

161

(3)

Parallelization versus Serialization

164

(1)

Operating System Requirements

164

(1)

High Availability

165

(1)

I/O Bandwidth Requirements

166

(1)

Features

166

(1)

End-User Support

167

(1)

Sizing Machine Capacity

167

(3)

General Resource Planning

168

(1)

Assessing Resource Utilization

169

(1)

Building a Distributed Model

170

(4)

Round-Robin Distributed Networking

170

(2)

Distributed BGP Networking

172

(2)

Final Thoughts

174

(3)

PART III ADVANCED CONCEPTS OF STATISTICAL FILTERING

10 TESTING THEORY

177

(20)

The Challenge of Testing

178

(3)

Message Continuity

178

(1)

Archive Window

179

(1)

Purge Simulation

180

(1)

Interleave

181

(1)

Corrective Training Delay

181

(1)

Types of Simulations

181

(1)

Measuring the Accuracy of a Specific Filter

182

(3)

Test Criteria

182

(1)

Performing the Test

183

(2)

Measuring Adaptation in Chaotic Environments

185

(2)

Test Criteria

185

(1)

Performing the Test

186

(1)

Testing the Effectiveness of Multiple Filters

187

(4)

Test Criteria

188

(1)

Performing the Test

189

(2)

Comparing Features in a Single Filter

191

(2)

Test Criteria

191

(1)

Performing the Test

192

(1)

Testing Caveats

193

(2)

Corrective Training

193

(1)

Purge Simulations

194

(1)

Test Messages

194

(1)

Presuppositions

195

(1)

Final Thoughts

195

(2)

11 CONCEPT IDENTIFICATION: ADVANCED TOKENIZATION

197

(18)

Chained Tokens

198

(9)

Case Study Analysis

199

(1)

Pattern Identification

200

(1)

Differentiation

201

(1)

HTML Classification

202

(1)

Contextual Analysis

203

(1)

Other Uses

204

(1)

Administrative Concerns

205

(1)

Supporting Data

206

(1)

Summary

207

(1)

Sparse Binary Polynomial Hashing

207

(3)

Supporting Data

209

(1)

Summary

210

(1)

Karnaugh Mapping

210

(3)

Final Thoughts

213

(2)

12 FIFTH-ORDER MARKOVIAN DISCRIMINATION

215

(12)

Markov's Great Advance

216

(2)

Hidden Markov Models (HMMs)

218

(1)

Using Markov Models to Model Text

219

(3)

Classic Bayesian Spam Filter

219

(3)

Bayesian versus Markovian Classification

222

(3)

Storage Concerns

225

(1)

Purging Old Data

226

(1)

Floating-Point Renormalization and Underflow

226

(1)

Final Thoughts

226

(1)

13 INTELLIGENT FEATURE SET REDUCTION

227

(14)

Calibration Algorithms

228

(3)

Bayesian Noise Reduction (BNR)

231

(9)

Instantiation Phase

232

(1)

Training Phase

233

(1)

Dubbing Phase

234

(2)

Examples

236

(3)

End Result

239

(1)

Efficacy

239

(1)

Final Thoughts

240

(1)

14 COLLABORATIVE ALGORITHMS

241

(16)

Message Inoculation

242

(5)

Supporting Data

246

(1)

External Inoculation

246

(1)

Classification Groups

247

(1)

Collaborative Neural Meshes

248

(2)

Neural Declustering

249

(1)

Machine-Automated Blacklists

250

(2)

Streamlined Blackhole List

251

(1)

Weighted Private Block List

252

(1)

Distributed Attacks

252

(1)

Filters That Fight Back

252

(1)

Fingerprinting

253

(1)

Probing

253

(1)

Automatic Whitelisting

253

(2)

URL Blacklisting

255

(1)

Minefields

256

(1)

Final Thoughts

256

(1)

APPENDIX SHINING EXAMPLES OF FILTERING

257

(18)

POPFile: The POP3 Proxy

258

(3)

About POPFile

258

(1)

Accuracy

259

(1)

Interview with the Author

260

(1)

SpamProbe: A Modified Approach

261

(3)

About SpamProbe

261

(1)

Accuracy

262

(1)

Interview with the Author

262

(2)

TarProxy: IANA Spam Filter

264

(2)

About TarProxy

264

(1)

Accuracy

264

(1)

Interview with the Author

265

(1)

DSPAM: A Large-Scale Filter

266

(4)

About DSPAM

266

(1)

Accuracy

267

(1)

Interview with the Author

268

(2)

The CRM114 Discriminator

270

(5)

About CRMI l4

270

(1)

Under the Hood

271

(1)

Accuracy

272

(1)

Interview with the Author

272

(3)

INDEX

275

Jonathan A. Zdziarski has been fighting spam for eight years, and has spent a significant portion of the past two years working on the next generation spam filter DSPAM, with up to 99.985% accuracy. Zdziarski lectures widely on the topic of spam.

Ending Spam [Minkštas viršelis]

Paskyra ir nustatymai

Paieška

Ieškoti duomenų bazėje

Patikslinti paiešką

Temos Temos anglų kalba

Pasirinkti pirkinių krepšelį