1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

GeoIP working in Nginx without caching the DB to RAM?

Discussion in 'Server Configuration and Hosting' started by HittingSmoke, Feb 5, 2014.

  1. HittingSmoke

    HittingSmoke Active Member

    I can't seem to find a solution to this. mod_geoip2 by default reads the GeoIP database from disk. You can set it to cache the database in RAM but that's not the default. Unfortunately I can't find a way to reproduce this using any non-Apache server config. I can add GeoIP support to Nginx, or uWSGI but both only offer the option of caching the database in RAM, inflating the process size many times over. I can add it to PHP but then it would only be available in PHP apps, not any other language running through uWSGI on my server.

    Is there a solution for creating stack-wide GeoIP support using Nginx without caching the entire GeoIP database to RAM for what amounts to less that half a percent of operations on my server?
     
  2. Floren

    Floren Well-Known Member

    Have you tried the legacy version 1.6.0?
    https://www.axivo.com/packages/geoip.43/

    I use it with Nginx.
    Code:
    # free -m
                 total       used       free     shared    buffers     cached
    Mem:          7770        824       6945          0         91        235
    -/+ buffers/cache:        497       7272
    Swap:         8191          0       8191
    
    # smem -kU nginx
      PID User     Command                         Swap      USS      PSS      RSS 
    20932 nginx    nginx: cache manager proces        0   564.0K     4.3M    22.6M 
    20928 nginx    nginx: worker process              0     2.3M     6.0M    25.5M 
    20930 nginx    nginx: worker process              0     2.4M     6.1M    25.6M 
    20929 nginx    nginx: worker process              0     2.5M     6.3M    25.8M 
    20931 nginx    nginx: worker process              0     2.9M     6.5M    25.8M 
    
    # smem -kU mysql
      PID User     Command                         Swap      USS      PSS      RSS 
     2237 mysql    /usr/libexec/mysqld --based        0   210.9M   211.0M   212.6M 
    
    # ls -lah /var/lib/GeoIP/
    total 42M
    drwxr-xr-x.  2 root root 4.0K Feb  5 19:16 .
    drwxr-xr-x. 23 root root 4.0K Dec 30 18:26 ..
    -rw-r--r--.  1 root root 3.4M Feb  3 14:39 GeoIPASNum.dat
    -rw-r--r--.  1 root root 3.6M Mar 18  2013 GeoIPASNumv6.dat
    -rw-r--r--.  1 root root  17M Feb  5 10:55 GeoIPCity.dat
    -rw-r--r--.  1 root root  17M Feb  5 10:49 GeoIPCityv6.dat
    -rw-r--r--.  1 root root 598K Feb  5 10:59 GeoIP.dat
    -rw-r--r--.  1 root root 1.1M Feb  5 10:59 GeoIPv6.dat
    As you can see, the memory usage is minimal, compared to MySQL.
    I use geoip-update to update once a week the databases:
    https://www.axivo.com/packages/geoip-update.45/

    Nginx configuration:
    Code:
    geoip_country                   /var/lib/GeoIP/GeoIP.dat;
    geoip_city                      /var/lib/GeoIP/GeoIPCity.dat;
    I also use php-pecl-geoip to perform various checks directly in PHP.
    Code:
    # yum --enablerepo=axivo list php-pecl-geoip
    Loaded plugins: fastestmirror
    Loading mirror speeds from cached hostfile
    * base: www.cubiculestudio.com
    * extras: centos.mirror.netelligent.ca
    * updates: centos.mirror.iweb.ca
    Installed Packages
    php-pecl-geoip.x86_64        1.0.8-1.el6         @axivo
    
    # yum info php-pecl-geoip
    Loaded plugins: fastestmirror
    Loading mirror speeds from cached hostfile
    * base: www.cubiculestudio.com
    * extras: centos.mirror.netelligent.ca
    * updates: centos.mirror.iweb.ca
    Installed Packages
    Name        : php-pecl-geoip
    Arch        : x86_64
    Version     : 1.0.8
    Release     : 1.el6
    Size        : 69 k
    Repo        : installed
    From repo   : axivo
    Summary     : Extension for mapping IP addresses to geographic places
    URL         : http://pecl.php.net/package/geoip
    License     : PHP
    Description : php-pecl-geoip allows you to find the location of an IP address City, State,
                : Country, Longitude, Latitude, and other information as all, such as ISP and
                : connection type. It makes use of Maxminds geoip database.
     
    Last edited: Feb 6, 2014
  3. HittingSmoke

    HittingSmoke Active Member

    That appears to be the GeoIP library, not the module. The library used will have no affect on Nginx's RAM usage.

    You're caching the GeoIP library to RAM. Your 25MB per worker is a huge amount of RAM for Nginx. The Nginx processes are measured in KB with default modules, not MB. It's highly inefficient unless you're using GeoIP as a primary, user-facing portion of your app. I only ever use GeoIP for analytics and I don't care if my country map loads a second slower because it's pulling the database from disk each time. I don't want that using up RAM full-time for something I check one every week or two.
     
    jeffwidman likes this.
  4. Floren

    Floren Well-Known Member

    The country database will add 0.8MB per worker, not 25. The city database will add 19MB. I don't see the total of 80MB allocated to 4 workers exaggerate at all, for Nginx usage. I use more than that on OPCache. Note that on my server, MySQL uses 200MB and the hole system 800MB, out of 8GB of RAM.

    With both country and city databases enabled:
    Code:
    # smem -kU nginx
      PID User     Command                         Swap      USS      PSS      RSS
    31915 nginx    nginx: cache loader process        0   352.0K     3.2M    20.9M
    31914 nginx    nginx: cache manager proces        0   380.0K     3.2M    20.9M
    31909 nginx    nginx: worker process              0     1.7M     4.5M    22.2M
    31910 nginx    nginx: worker process              0     1.7M     4.5M    22.2M
    31911 nginx    nginx: worker process              0     1.7M     4.5M    22.2M
    31913 nginx    nginx: worker process              0     1.7M     4.5M    22.2M
    With country only database enabled:
    Code:
    # smem -kU nginx
      PID User     Command                         Swap      USS      PSS      RSS
    31851 nginx    nginx: cache loader process        0   348.0K   857.0K     4.4M
    31850 nginx    nginx: cache manager proces        0   376.0K   885.0K     4.5M
    31845 nginx    nginx: worker process              0     1.7M     2.2M     5.8M
    31846 nginx    nginx: worker process              0     1.7M     2.2M     5.8M
    31847 nginx    nginx: worker process              0     1.7M     2.2M     5.8M
    31848 nginx    nginx: worker process              0     1.7M     2.2M     5.8M
    Without GeoIP databases enabled:
    Code:
    # smem -kU nginx
      PID User     Command                         Swap      USS      PSS      RSS
    31820 nginx    nginx: cache loader process        0   344.0K   767.0K     3.9M
    31819 nginx    nginx: cache manager proces        0   372.0K   795.0K     3.9M
    31814 nginx    nginx: worker process              0     1.7M     2.1M     5.2M
    31815 nginx    nginx: worker process              0     1.7M     2.1M     5.2M
    31816 nginx    nginx: worker process              0     1.7M     2.1M     5.2M
    31818 nginx    nginx: worker process              0     1.7M     2.1M     5.2M
    I mean, memory is the cheapest component for a server. I use several ram disks for various directories that help speed things overall. Plus, in your case, you use only the country database, so the memory usage is 4MB.
     
    Last edited: Feb 8, 2014
  5. HittingSmoke

    HittingSmoke Active Member

    You're comparing opcache and database size to GeoIP database size but the two are completely incomparable for reasons I've already pointed out. Opcode cache speeds up literally all PHP execution on the server. The database is a major component of any interactive app. The GeoIP database is used in extremely niche circumstances any unless your web app uses it for a primary function then there's absolutely zero reason to have it cached in RAM quadrupling your server worker size. I'm not sure where you read that I'm only using the country database. I use country, city, and business databases.

    Anyway, I never asked for a debate on what you think appropriate process size is. The fact is it's highly inefficient and there are use cases which are not your own. Which is why I asked if there's a way to cache the database to disk.
     
  6. p4guru

    p4guru Well-Known Member

    If your concern is related to needless increase in average nginx size for non-geoip usage cases. Why not setup load balancing of some kind and just direct geoip usage cases to a specific nginx server backend with geoip compiled and direct non-geoip usage to a nginx server backend without geoip compiled ?
     
  7. HittingSmoke

    HittingSmoke Active Member

    Because that's a ridiculously convoluted solution to a fairly simple problem, which is that mod_geoip2 caches to disk and Nginx GeoIP won't. It's possible to do and all I wanted to know is if there was a solution for Nginx that supported the same functionality.
     
  8. p4guru

    p4guru Well-Known Member

    HittingSmoke likes this.
  9. HittingSmoke

    HittingSmoke Active Member

    Yep. I've asked on the official Nginx forums, Serverfault, reddit, etc. Looks like the next step is looking through the source to see if I can create a simple patch for it myself or posting a bounty for a feature request.
     
  10. HittingSmoke

    HittingSmoke Active Member

    A comment from the uWSGI dev led me in the right direction on getting this working in Nginx.

    In src/http/modules/ngx_http_geoip_module.c in the Nginx source code, do a find/replace for _MEMORY_CACHE/_STANDARD. Recompile with http_geoip_module and Nginx will no longer cache the database in RAM on start.

    Nginx is now handing my GeoIP database from disk.

    Shouldn't be too hard to create a patch to make this a config option
     
    jeffwidman and p4guru like this.
  11. p4guru

    p4guru Well-Known Member

    sweet !
     
  12. Floren

    Floren Well-Known Member

    @HittingSmoke, nice finding and no reply so far to your post.
    http://forum.nginx.org/read.php?11,246982

    I'm still partial on the workers memory usage. I don't see what is the disadvantage to have the workers store the data into memory, as for me memory consumption is not an issue. Are you referring that the used memory could be allocated to other resources, since you rarely perform a check into MaxMind databases (therefore there is no need for quick response)? Thanks for your thoughts.
     
  13. HittingSmoke

    HittingSmoke Active Member

    Exactly. Take two scenarios, a shared host you can compile on or a cheap VPS. In both situations you have very limited RAM. Especially on the VPS where you have OS overhead hitting your RAM allotment.

    A compiled Nginx server with very basic modules is going to run ~700KB per worker. For the sake of taking the moderate ground let's throw the Pagespeed module in the mix to bring each worker to ~4MB. Now let's say I run Piwik analytics and I want location mapping on my activity charts.

    In this instance the GeoIP lookups are only going to be called by my analytics suite. On each page load my analytics suite records data and writes it to the database. The GeoIP lookup is not done on the database write. It simply writes the the IP entries for each visit. When I load up my analytics software and check my location map chart the IP addresses are translated by the MaxMind database.

    I check my analytics maybe twice per week. I don't need my analytics software to be extremely responsive as it's not a user-facing service. The only part of my analytics software I need to be responsive is the javascript loading. Database writes are done after the DOM is loaded and it does not affect the page load times.

    Let's say I run 8 Nginx workers with ngx_pagespeed. 4MB per worker is ~32MB. 24MB per worker is nearly 200MB. That's a massive increase for the benefit of a service I'm going to use in a way which will not benefit my users in any way. That's RAM I could allocate to my innodb cache or my opcode cache. It's a complete waste to have in RAM unless I'm running on a dedicated server with a medium size or smaller database.

    These are the nuances of server RAM allocation and optimization. I run a very tight server stack and I do it with .36 second average page load time on my XenForo site with a heavy theme and lots of image heavy threads. Multiplying my Nginx worker size by six makes a hit in where I can make other optimizations to keep my page load times to the absolute minimum. I'm a min/maxer when it comes to by server stack and while GeoIP is useful, it's not s6* memory usage useful.

    I hope I explained that clearly. I hammered it out a bit quickly between jobs so I might have got a little rambly.
     
    jeffwidman likes this.

Share This Page