欧美性猛交xxxx黑人,欧美成人午夜免费全部完

Table of Contents

set up

Unique username

Fresh content

Outdated data

結(jié)論

Home

CMS Tutorial

WordPress

Explore the power of Bloom Filters using Node.js and Redis

PHPz

Sep 01, 2023 pm 10:53 PM

使用 Node.js 和 Redis 探索 Bloom Filter 的魅力

In the right use case, bloom filters look like magic. That's a bold statement, but in this tutorial we'll explore this strange data structure, how to best use it, and some practical examples using Redis and Node.js.

The Bloom filter is a probabilistic, one-way data structure. The word "filter" can be confusing in this context; filter means it's an active thing, a verb, but it might be easier to think of it as storage, a noun. With a simple bloom filter you can do two things:

Add an item.
Check if an item has not been added before.

These are important limitations to understand - you cannot delete items, nor can you list items in a bloom filter. Additionally, you cannot determine whether an item has been added to the filter in the past. This is where the probabilistic nature of Bloom filters comes into play - false positives are possible, but false positives are not. If the filter is set up correctly, the chance of false positives is very small.

Variants of Bloom filters exist that add additional functionality such as removal or scaling, but they also add complexity and limitations. Before moving on to variations, it is important to first understand a simple bloom filter. This article only introduces simple Bloom filters.

With these limits, you get many benefits: fixed size, hash-based encryption, and fast lookups.

When you set up a bloom filter, you need to specify a size for it. This size is fixed, so if there are one or billion items in the filter, it will never grow beyond the specified size. As you add more items to the filter, the likelihood of false positives increases. If you specify a smaller filter, the false positive rate will increase faster than if you use a larger filter.

Bloom filters are built on the concept of one-way hashing. Much like correctly storing passwords, Bloom filters use a hashing algorithm to determine the unique identifier of the item passed into it. A hash is essentially irreversible and is represented by a seemingly random string of characters. Therefore, if someone gains access to a bloom filter, it will not directly reveal anything.

Finally, bloom filters are fast. This operation involves far fewer comparisons than other methods and can be easily stored in memory, preventing performance-impacting database hits.

Now that you understand the limitations and advantages of Bloom filters, let's look at some situations where they can be used.

set up

We will illustrate Bloom filters using Redis and Node.js. Redis is the storage medium for Bloom filters; it's fast, in-memory, and has specific commands (GETBIT, SETBIT) that make implementation more efficient. I assume you have Node.js, npm, and Redis installed on your system. Your Redis server should be running on the default port on localhost for our example to work properly.

In this tutorial, we will not implement a filter from scratch; instead, we will implement a filter from scratch. Instead, we'll focus on a practical use of a pre-built module in npm: bloom-redis. bloom-redis has a very concise set of methods: add, contains, and clear.

As mentioned before, bloom filters require a hashing algorithm to generate an item's unique identifier. bloom-redis uses the well-known MD5 algorithm, which works fine although it may not be suitable for Bloom filters (a bit slow, a bit overkill).

Unique username

Usernames, especially those that identify the user in the URL, need to be unique. If you build an application that allows users to change their username, then you may want a username that is never used to avoid username confusion and attacks.

Without bloom filters, you would need to reference a table containing every username ever used, which can be prohibitively expensive at scale. Bloom filters allow you to add an item every time a user adopts a new name. When a user checks to see if the username is taken, all you need to do is check the bloom filter. It will be able to tell you with absolute certainty whether the requested username has been added previously. The filter may incorrectly return that the username has been taken when in fact the username has not been taken, but this is just a precaution and does not cause any real harm (other than that the user may not be able to declare "k3w1d00d47").

To illustrate this, let's build a fast REST server using Express. First, create the package.json file and then run the following terminal command.

npm install bloom-redis --save

npm install express --save

npm install redis --save

The default option size for bloom-redis is set to 2 MB. That's wrong out of caution, but it's quite large. Setting the size of the bloom filter is critical: too large and you waste memory, too small and the false positive rate will be too high. The math involved in determining the size is complex and beyond the scope of this tutorial, but luckily there is a bloom filter size calculator that does the job without having to crack a textbook.

Now, create app.js as follows:

var
  Bloom         =   require('bloom-redis'),
  express       =   require('express'),
  redis         =   require('redis'),
  
  app,
  client,
  filter;

//setup our Express server
app = express();

//create the connection to Redis
client = redis.createClient();


filter = new Bloom.BloomFilter({ 
  client    : client, //make sure the Bloom module uses our newly created connection to Redis
  key       : 'username-bloom-filter', //the Redis key
  
  //calculated size of the Bloom filter.
  //This is where your size / probability trade-offs are made
  //http://hur.st/bloomfilter?n=100000&p=1.0E-6
  size      : 2875518, // ~350kb
  numHashes : 20
});

app.get('/check', function(req,res,next) {
  //check to make sure the query string has 'username'
  if (typeof req.query.username === 'undefined') {
    //skip this route, go to the next one - will result in a 404 / not found
    next('route');
  } else {
   filter.contains(
     req.query.username, // the username from the query string
     function(err, result) {
       if (err) { 
        next(err); //if an error is encountered, send it to the client
        } else {
          res.send({ 
            username : req.query.username, 
            //if the result is false, then we know the item has *not* been used
            //if the result is true, then we can assume that the item has been used
            status : result ? 'used' : 'free' 
          });
        }
      }
    );
  }
});


app.get('/save',function(req,res,next) {
  if (typeof req.query.username === 'undefined') {
    next('route');
  } else {
    //first, we need to make sure that it's not yet in the filter
    filter.contains(req.query.username, function(err, result) {
      if (err) { next(err); } else {
        if (result) {
          //true result means it already exists, so tell the user
          res.send({ username : req.query.username, status : 'not-created' });
        } else {
          //we'll add the username passed in the query string to the filter
          filter.add(
            req.query.username, 
            function(err) {
              //The callback arguments to `add` provides no useful information, so we'll just check to make sure that no error was passed
              if (err) { next(err); } else {
                res.send({ 
                  username : req.query.username, status : 'created' 
                });
              }
            }
          );
        }
      }
    });
  }
});

app.listen(8010);

To run this server: node app.js. Go to your browser and point it to: https://localhost:8010/check?username=kyle. The response should be: {"username":"kyle","status":"free"}.

Now, let's save that username by pointing your browser to http://localhost:8010/save?username=kyle. The response will be: {"username":"kyle","status":"created"}. If the return address is http://localhost:8010/check?username=kyle, the response will be {"username":"kyle","status ":"used"} .Similarly, returning http://localhost:8010/save?username=kyle will result in {"username":"kyle","status":"not -created"} .

From the terminal you can see the size of the filter: redis-cli strlen username-bloom-filter.

Now, for one item, it should read 338622.

Now, go ahead and try to add more usernames using the /save route. You can try as many as you want.

If you check the dimensions again, you may find that the dimensions have increased slightly, but not with every addition. Curious, right? Internally, the bloom filter sets individual bits (1/0) at different locations in the string stored in username-bloom. However, these are not contiguous, so if you set a bit at index 0 and then set a bit at index 10,000, everything in between will be 0. For practical purposes, it's not important to understand the precise mechanics of each operation at first, just know that this is normal and you will never store more in Redis than you specify.

Fresh content

Fresh content on the website can attract users to return, so how to show new content to users every time? Using a traditional database approach, you would add a new row to a table containing the user identifier and story identifier, and then query the table when you decide to display a piece of content. As you might imagine, your database will grow very quickly, especially as your users and content grow.

In this case, the consequences of false negatives (e.g. not showing unseen content) are very small, making bloom filters a viable option. At first glance, you might think that each user needs a Bloom filter, but we'll use a simple concatenation of a user identifier and a content identifier, and then insert that string into our filter. This way we can use a single filter for all users.

In this example, let's build another basic Express server that displays content. Each time you access the route /show-content/any-username (any-username is any URL-safe value), a new piece of content will be displayed until the site is empty of content. In the example, the content is the first line of the top ten Project Gutenberg books.

We need to install another npm module. Run from terminal: npm install async --save

Your new app.js file:

var
  async         =   require('async'),
  Bloom         =   require('bloom-redis'),
  express       =   require('express'),
  redis         =   require('redis'),
  
  app,
  client,
  filter,
  
  // From Project Gutenberg - opening lines of the top 10 public domain ebooks
  // https://www.gutenberg.org/browse/scores/top
  openingLines = {
    'pride-and-prejudice' : 
      'It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.',
    'alices-adventures-in-wonderland' : 
      'Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it' }

If you pay careful attention to the round trip time in the development tools, you will find that the more times you request a single path with the username, the longer it takes. While checking the filter takes a fixed amount of time, in this case we are checking for the presence of more items. Bloom filters are limited in what they can tell you, so you are testing the presence of each item. Of course, in our example it's fairly simple, but testing hundreds of projects is inefficient.

Outdated data

In this example, we will build a small Express server that will do two things: accept new data via POST, and display the current data (using a GET request). When new data is POSTed to the server, the application checks whether it exists in the filter. If it doesn't exist we will add it to the collection in Redis, otherwise we will return null. A GET request will get it from Redis and send it to the client.

This is different from the first two situations, false positives are not allowed. We will use bloom filters as the first line of defense. Given the properties of bloom filters, we can only be sure that something is not in the filter, so in this case we can continue to let the data in. If the bloom filter returns data that might be in the filter, we check against the actual data source.

那么，我們得到了什么？我們獲得了不必每次都檢查實(shí)際來源的速度。在數(shù)據(jù)源速度較慢的情況下（外部 API、小型數(shù)據(jù)庫、平面文件的中間），確實(shí)需要提高速度。為了演示速度，我們在示例中添加 150 毫秒的實(shí)際延遲。我們還將使用 console.time / console.timeEnd 來記錄 Bloom 過濾器檢查和非 Bloom 過濾器檢查之間的差異。

在此示例中，我們還將使用極其有限的位數(shù)：僅 1024。它很快就會填滿。當(dāng)它填滿時(shí)，它將顯示越來越多的誤報(bào) - 您會看到響應(yīng)時(shí)間隨著誤報(bào)率的填滿而增加。

該服務(wù)器使用與之前相同的模塊，因此將 app.js 文件設(shè)置為：

var
  async           =   require('async'),
  Bloom           =   require('bloom-redis'),
  bodyParser      =   require('body-parser'),
  express         =   require('express'),
  redis           =   require('redis'),
  
  app,
  client,
  filter,
  
  currentDataKey  = 'current-data',
  usedDataKey     = 'used-data';
  
app = express();
client = redis.createClient();

filter = new Bloom.BloomFilter({ 
  client    : client,
  key       : 'stale-bloom-filter',
  //for illustration purposes, this is a super small filter. It should fill up at around 500 items, so for a production load, you'd need something much larger!
  size      : 1024,
  numHashes : 20
});

app.post(
  '/',
  bodyParser.text(),
  function(req,res,next) {
    var
      used;
      
    console.log('POST -', req.body); //log the current data being posted
    console.time('post'); //start measuring the time it takes to complete our filter and conditional verification process
    
    //async.series is used to manage multiple asynchronous function calls.
    async.series([
      function(cb) {
        filter.contains(req.body, function(err,filterStatus) {
          if (err) { cb(err); } else {
            used = filterStatus;
            cb(err);
          }
        });
      },
      function(cb) {
        if (used === false) {
          //Bloom filters do not have false negatives, so we need no further verification
          cb(null);
        } else {
          //it *may* be in the filter, so we need to do a follow up check
          //for the purposes of the tutorial, we'll add a 150ms delay in here since Redis can be fast enough to make it difficult to measure and the delay will simulate a slow database or API call
          setTimeout(function() {
            console.log('possible false positive');
            client.sismember(usedDataKey, req.body, function(err, membership) {
              if (err) { cb(err); } else {
                //sismember returns 0 if an member is not part of the set and 1 if it is.
                //This transforms those results into booleans for consistent logic comparison
                used = membership === 0 ? false : true;
                cb(err);
              }
            });
          }, 150);
        }
      },
      function(cb) {
        if (used === false) {
          console.log('Adding to filter');
          filter.add(req.body,cb);
        } else {
          console.log('Skipped filter addition, [false] positive');
          cb(null);
        }
      },
      function(cb) {
        if (used === false) {
          client.multi()
            .set(currentDataKey,req.body) //unused data is set for easy access to the 'current-data' key
            .sadd(usedDataKey,req.body) //and added to a set for easy verification later
            .exec(cb); 
        } else {
          cb(null);
        }
      }
      ],
      function(err, cb) {
        if (err) { next(err); } else {
          console.timeEnd('post'); //logs the amount of time since the console.time call above
          res.send({ saved : !used }); //returns if the item was saved, true for fresh data, false for stale data.
        }
      }
    );
});

app.get('/',function(req,res,next) {
  //just return the fresh data
  client.get(currentDataKey, function(err,data) {
    if (err) { next(err); } else {
      res.send(data);
    }
  });
});

app.listen(8012);

由于使用瀏覽器 POST 到服務(wù)器可能會很棘手，所以讓我們使用curl 來測試。

curl --data“您的數(shù)據(jù)放在這里”--header“內(nèi)容類型：text/plain”http://localhost:8012/

可以使用快速 bash 腳本來顯示填充整個(gè)過濾器的外觀：

#!/bin/bash
for i in `seq 1 500`;
do
  curl --data “data $i" --header "Content-Type: text/plain" http://localhost:8012/
done

觀察填充或完整的過濾器很有趣。由于這個(gè)很小，你可以使用 redis-cli 輕松查看。通過在添加項(xiàng)目之間從終端運(yùn)行 redis-cli get stale-filter ，您將看到各個(gè)字節(jié)增加。完整的過濾器將為每個(gè)字節(jié) \xff 。此時(shí)，過濾器將始終返回正值。

結(jié)論

布隆過濾器并不是萬能的解決方案，但在適當(dāng)?shù)那闆r下，布隆過濾器可以為其他數(shù)據(jù)結(jié)構(gòu)提供快速、有效的補(bǔ)充。

如果您仔細(xì)注意開發(fā)工具中的往返時(shí)間，您會發(fā)現(xiàn)使用用戶名請求單個(gè)路徑的次數(shù)越多，所需的時(shí)間就越長。雖然檢查過濾器需要固定的時(shí)間，但在本例中，我們正在檢查是否存在更多項(xiàng)目。布隆過濾器能夠告訴您的信息有限，因此您正在測試每個(gè)項(xiàng)目是否存在。當(dāng)然，在我們的示例中，它相當(dāng)簡單，但測試數(shù)百個(gè)項(xiàng)目效率很低。

The above is the detailed content of Explore the power of Bloom Filters using Node.js and Redis. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Grass Wonder Build Guide | Uma Musume Pretty Derby

4 weeks ago By Jack chen

Roblox: 99 Nights In The Forest - All Badges And How To Unlock Them

3 weeks ago By DDD

Uma Musume Pretty Derby Banner Schedule (July 2025)

4 weeks ago By Jack chen

Windows Security is blank or not showing options

4 weeks ago By 下次還敢

RimWorld Odyssey Temperature Guide for Ships and Gravtech

3 weeks ago By Jack chen

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Laravel Tutorial

1597

PHP Tutorial

1488

Related knowledge

How to diagnose high CPU usage caused by WordPress Jul 06, 2025 am 12:08 AM

The main reasons why WordPress causes the surge in server CPU usage include plug-in problems, inefficient database query, poor quality of theme code, or surge in traffic. 1. First, confirm whether it is a high load caused by WordPress through top, htop or control panel tools; 2. Enter troubleshooting mode to gradually enable plug-ins to troubleshoot performance bottlenecks, use QueryMonitor to analyze the plug-in execution and delete or replace inefficient plug-ins; 3. Install cache plug-ins, clean up redundant data, analyze slow query logs to optimize the database; 4. Check whether the topic has problems such as overloading content, complex queries, or lack of caching mechanisms. It is recommended to use standard topic tests to compare and optimize the code logic. Follow the above steps to check and solve the location and solve the problem one by one.

How to minify JavaScript files in WordPress Jul 07, 2025 am 01:11 AM

Miniving JavaScript files can improve WordPress website loading speed by removing blanks, comments, and useless code. 1. Use cache plug-ins that support merge compression, such as W3TotalCache, enable and select compression mode in the "Minify" option; 2. Use a dedicated compression plug-in such as FastVelocityMinify to provide more granular control; 3. Manually compress JS files and upload them through FTP, suitable for users familiar with development tools. Note that some themes or plug-in scripts may conflict with the compression function, and you need to thoroughly test the website functions after activation.

How to optimize WordPress without plugins Jul 05, 2025 am 12:01 AM

Methods to optimize WordPress sites that do not rely on plug-ins include: 1. Use lightweight themes, such as Astra or GeneratePress, to avoid pile-up themes; 2. Manually compress and merge CSS and JS files to reduce HTTP requests; 3. Optimize images before uploading, use WebP format and control file size; 4. Configure.htaccess to enable browser cache, and connect to CDN to improve static resource loading speed; 5. Limit article revisions and regularly clean database redundant data.

How to use the Transients API for caching Jul 05, 2025 am 12:05 AM

TransientsAPI is a built-in tool in WordPress for temporarily storing automatic expiration data. Its core functions are set_transient, get_transient and delete_transient. Compared with OptionsAPI, transients supports setting time of survival (TTL), which is suitable for scenarios such as cache API request results and complex computing data. When using it, you need to pay attention to the uniqueness of key naming and namespace, cache "lazy deletion" mechanism, and the issue that may not last in the object cache environment. Typical application scenarios include reducing external request frequency, controlling code execution rhythm, and improving page loading performance.

How to prevent comment spam programmatically Jul 08, 2025 am 12:04 AM

The most effective way to prevent comment spam is to automatically identify and intercept it through programmatic means. 1. Use verification code mechanisms (such as Googler CAPTCHA or hCaptcha) to effectively distinguish between humans and robots, especially suitable for public websites; 2. Set hidden fields (Honeypot technology), and use robots to automatically fill in features to identify spam comments without affecting user experience; 3. Check the blacklist of comment content keywords, filter spam information through sensitive word matching, and pay attention to avoid misjudgment; 4. Judge the frequency and source IP of comments, limit the number of submissions per unit time and establish a blacklist; 5. Use third-party anti-spam services (such as Akismet, Cloudflare) to improve identification accuracy. Can be based on the website

How to enqueue assets for a Gutenberg block Jul 09, 2025 am 12:14 AM

When developing Gutenberg blocks, the correct method of enqueue assets includes: 1. Use register_block_type to specify the paths of editor_script, editor_style and style; 2. Register resources through wp_register_script and wp_register_style in functions.php or plug-in, and set the correct dependencies and versions; 3. Configure the build tool to output the appropriate module format and ensure that the path is consistent; 4. Control the loading logic of the front-end style through add_theme_support or enqueue_block_assets to ensure that the loading logic of the front-end style is ensured.

How to add custom fields to users Jul 06, 2025 am 12:18 AM

To add custom user fields, you need to select the extension method according to the platform and pay attention to data verification and permission control. Common practices include: 1. Use additional tables or key-value pairs of the database to store information; 2. Add input boxes to the front end and integrate with the back end; 3. Constrain format checks and access permissions for sensitive data; 4. Update interfaces and templates to support new field display and editing, while taking into account mobile adaptation and user experience.

How to optimize WordPress robots txt Jul 13, 2025 am 12:37 AM

robots.txt is crucial to the SEO of WordPress websites, and can guide search engines to crawl behavior, avoid duplicate content and improve efficiency. 1. Block system paths such as /wp-admin/ and /wp-includes/, but avoid accidentally blocking the /uploads/ directory; 2. Add Sitemap paths such as Sitemap: https://yourdomain.com/sitemap.xml to help search engines quickly discover site maps; 3. Limit /page/ and URLs with parameters to reduce crawler waste, but be careful not to block important archive pages; 4. Avoid common mistakes such as accidentally blocking the entire site, cache plug-in affecting updates, and ignoring the matching of mobile terminals and subdomains.

See all articles

亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

Explore the power of Bloom Filters using Node.js and Redis

set up

Unique username

Fresh content

Outdated data

結(jié)論

Hot AI Tools

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

Hot Article

Hot Tools

Notepad++7.3.1

SublimeText3 Chinese version

Zend Studio 13.0.1

Dreamweaver CS6

SublimeText3 Mac version

Hot Topics