GeoIP working in Nginx without caching the DB to RAM?

HittingSmoke

Active member
I can't seem to find a solution to this. mod_geoip2 by default reads the GeoIP database from disk. You can set it to cache the database in RAM but that's not the default. Unfortunately I can't find a way to reproduce this using any non-Apache server config. I can add GeoIP support to Nginx, or uWSGI but both only offer the option of caching the database in RAM, inflating the process size many times over. I can add it to PHP but then it would only be available in PHP apps, not any other language running through uWSGI on my server.

Is there a solution for creating stack-wide GeoIP support using Nginx without caching the entire GeoIP database to RAM for what amounts to less that half a percent of operations on my server?
 
Have you tried the legacy version 1.6.0?
https://www.axivo.com/packages/geoip.43/

I use it with Nginx.
Code:
# free -m
             total       used       free     shared    buffers     cached
Mem:          7770        824       6945          0         91        235
-/+ buffers/cache:        497       7272
Swap:         8191          0       8191

# smem -kU nginx
  PID User     Command                         Swap      USS      PSS      RSS 
20932 nginx    nginx: cache manager proces        0   564.0K     4.3M    22.6M 
20928 nginx    nginx: worker process              0     2.3M     6.0M    25.5M 
20930 nginx    nginx: worker process              0     2.4M     6.1M    25.6M 
20929 nginx    nginx: worker process              0     2.5M     6.3M    25.8M 
20931 nginx    nginx: worker process              0     2.9M     6.5M    25.8M 

# smem -kU mysql
  PID User     Command                         Swap      USS      PSS      RSS 
 2237 mysql    /usr/libexec/mysqld --based        0   210.9M   211.0M   212.6M 

# ls -lah /var/lib/GeoIP/
total 42M
drwxr-xr-x.  2 root root 4.0K Feb  5 19:16 .
drwxr-xr-x. 23 root root 4.0K Dec 30 18:26 ..
-rw-r--r--.  1 root root 3.4M Feb  3 14:39 GeoIPASNum.dat
-rw-r--r--.  1 root root 3.6M Mar 18  2013 GeoIPASNumv6.dat
-rw-r--r--.  1 root root  17M Feb  5 10:55 GeoIPCity.dat
-rw-r--r--.  1 root root  17M Feb  5 10:49 GeoIPCityv6.dat
-rw-r--r--.  1 root root 598K Feb  5 10:59 GeoIP.dat
-rw-r--r--.  1 root root 1.1M Feb  5 10:59 GeoIPv6.dat
As you can see, the memory usage is minimal, compared to MySQL.
I use geoip-update to update once a week the databases:
https://www.axivo.com/packages/geoip-update.45/

Nginx configuration:
Code:
geoip_country                   /var/lib/GeoIP/GeoIP.dat;
geoip_city                      /var/lib/GeoIP/GeoIPCity.dat;
I also use php-pecl-geoip to perform various checks directly in PHP.
Code:
# yum --enablerepo=axivo list php-pecl-geoip
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: www.cubiculestudio.com
* extras: centos.mirror.netelligent.ca
* updates: centos.mirror.iweb.ca
Installed Packages
php-pecl-geoip.x86_64        1.0.8-1.el6         @axivo

# yum info php-pecl-geoip
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: www.cubiculestudio.com
* extras: centos.mirror.netelligent.ca
* updates: centos.mirror.iweb.ca
Installed Packages
Name        : php-pecl-geoip
Arch        : x86_64
Version     : 1.0.8
Release     : 1.el6
Size        : 69 k
Repo        : installed
From repo   : axivo
Summary     : Extension for mapping IP addresses to geographic places
URL         : http://pecl.php.net/package/geoip
License     : PHP
Description : php-pecl-geoip allows you to find the location of an IP address City, State,
            : Country, Longitude, Latitude, and other information as all, such as ISP and
            : connection type. It makes use of Maxminds geoip database.
 
Last edited:
Have you tried the legacy version 1.6.0?
https://www.axivo.com/packages/geoip.43/

I use it with Nginx.
Code:
# free -m
             total       used       free     shared    buffers     cached
Mem:          7770        824       6945          0         91        235
-/+ buffers/cache:        497       7272
Swap:         8191          0       8191

# smem -kU nginx
  PID User     Command                         Swap      USS      PSS      RSS
20932 nginx    nginx: cache manager proces        0   564.0K     4.3M    22.6M
20928 nginx    nginx: worker process              0     2.3M     6.0M    25.5M
20930 nginx    nginx: worker process              0     2.4M     6.1M    25.6M
20929 nginx    nginx: worker process              0     2.5M     6.3M    25.8M
20931 nginx    nginx: worker process              0     2.9M     6.5M    25.8M

# smem -kU mysql
  PID User     Command                         Swap      USS      PSS      RSS
2237 mysql    /usr/libexec/mysqld --based        0   210.9M   211.0M   212.6M

# ls -lah /var/lib/GeoIP/
total 42M
drwxr-xr-x.  2 root root 4.0K Feb  5 19:16 .
drwxr-xr-x. 23 root root 4.0K Dec 30 18:26 ..
-rw-r--r--.  1 root root 3.4M Feb  3 14:39 GeoIPASNum.dat
-rw-r--r--.  1 root root 3.6M Mar 18  2013 GeoIPASNumv6.dat
-rw-r--r--.  1 root root  17M Feb  5 10:55 GeoIPCity.dat
-rw-r--r--.  1 root root  17M Feb  5 10:49 GeoIPCityv6.dat
-rw-r--r--.  1 root root 598K Feb  5 10:59 GeoIP.dat
-rw-r--r--.  1 root root 1.1M Feb  5 10:59 GeoIPv6.dat
As you can see, the memory usage is minimal, compared to MySQL.
I use geoip-update to update once a week the databases:
https://www.axivo.com/packages/geoip-update.45/

Nginx configuration:
Code:
geoip_country                   /var/lib/GeoIP/GeoIP.dat;
geoip_city                      /var/lib/GeoIP/GeoIPCity.dat;
I also use php-pecl-geoip to perform various checks directly in PHP.
Code:
# yum --enablerepo=axivo list php-pecl-geoip
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: www.cubiculestudio.com
* extras: centos.mirror.netelligent.ca
* updates: centos.mirror.iweb.ca
Installed Packages
php-pecl-geoip.x86_64        1.0.8-1.el6         @axivo

# yum info php-pecl-geoip
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: www.cubiculestudio.com
* extras: centos.mirror.netelligent.ca
* updates: centos.mirror.iweb.ca
Installed Packages
Name        : php-pecl-geoip
Arch        : x86_64
Version     : 1.0.8
Release     : 1.el6
Size        : 69 k
Repo        : installed
From repo   : axivo
Summary     : Extension for mapping IP addresses to geographic places
URL         : http://pecl.php.net/package/geoip
License     : PHP
Description : php-pecl-geoip allows you to find the location of an IP address City, State,
            : Country, Longitude, Latitude, and other information as all, such as ISP and
            : connection type. It makes use of Maxminds geoip database.

That appears to be the GeoIP library, not the module. The library used will have no affect on Nginx's RAM usage.

You're caching the GeoIP library to RAM. Your 25MB per worker is a huge amount of RAM for Nginx. The Nginx processes are measured in KB with default modules, not MB. It's highly inefficient unless you're using GeoIP as a primary, user-facing portion of your app. I only ever use GeoIP for analytics and I don't care if my country map loads a second slower because it's pulling the database from disk each time. I don't want that using up RAM full-time for something I check one every week or two.
 
The country database will add 0.8MB per worker, not 25. The city database will add 19MB. I don't see the total of 80MB allocated to 4 workers exaggerate at all, for Nginx usage. I use more than that on OPCache. Note that on my server, MySQL uses 200MB and the hole system 800MB, out of 8GB of RAM.

With both country and city databases enabled:
Code:
# smem -kU nginx
  PID User     Command                         Swap      USS      PSS      RSS
31915 nginx    nginx: cache loader process        0   352.0K     3.2M    20.9M
31914 nginx    nginx: cache manager proces        0   380.0K     3.2M    20.9M
31909 nginx    nginx: worker process              0     1.7M     4.5M    22.2M
31910 nginx    nginx: worker process              0     1.7M     4.5M    22.2M
31911 nginx    nginx: worker process              0     1.7M     4.5M    22.2M
31913 nginx    nginx: worker process              0     1.7M     4.5M    22.2M
With country only database enabled:
Code:
# smem -kU nginx
  PID User     Command                         Swap      USS      PSS      RSS
31851 nginx    nginx: cache loader process        0   348.0K   857.0K     4.4M
31850 nginx    nginx: cache manager proces        0   376.0K   885.0K     4.5M
31845 nginx    nginx: worker process              0     1.7M     2.2M     5.8M
31846 nginx    nginx: worker process              0     1.7M     2.2M     5.8M
31847 nginx    nginx: worker process              0     1.7M     2.2M     5.8M
31848 nginx    nginx: worker process              0     1.7M     2.2M     5.8M
Without GeoIP databases enabled:
Code:
# smem -kU nginx
  PID User     Command                         Swap      USS      PSS      RSS
31820 nginx    nginx: cache loader process        0   344.0K   767.0K     3.9M
31819 nginx    nginx: cache manager proces        0   372.0K   795.0K     3.9M
31814 nginx    nginx: worker process              0     1.7M     2.1M     5.2M
31815 nginx    nginx: worker process              0     1.7M     2.1M     5.2M
31816 nginx    nginx: worker process              0     1.7M     2.1M     5.2M
31818 nginx    nginx: worker process              0     1.7M     2.1M     5.2M
I mean, memory is the cheapest component for a server. I use several ram disks for various directories that help speed things overall. Plus, in your case, you use only the country database, so the memory usage is 4MB.
 
Last edited:
You're comparing opcache and database size to GeoIP database size but the two are completely incomparable for reasons I've already pointed out. Opcode cache speeds up literally all PHP execution on the server. The database is a major component of any interactive app. The GeoIP database is used in extremely niche circumstances any unless your web app uses it for a primary function then there's absolutely zero reason to have it cached in RAM quadrupling your server worker size. I'm not sure where you read that I'm only using the country database. I use country, city, and business databases.

Anyway, I never asked for a debate on what you think appropriate process size is. The fact is it's highly inefficient and there are use cases which are not your own. Which is why I asked if there's a way to cache the database to disk.
 
If your concern is related to needless increase in average nginx size for non-geoip usage cases. Why not setup load balancing of some kind and just direct geoip usage cases to a specific nginx server backend with geoip compiled and direct non-geoip usage to a nginx server backend without geoip compiled ?
 
If your concern is related to needless increase in average nginx size for non-geoip usage cases. Why not setup load balancing of some kind and just direct geoip usage cases to a specific nginx server backend with geoip compiled and direct non-geoip usage to a nginx server backend without geoip compiled ?

Because that's a ridiculously convoluted solution to a fairly simple problem, which is that mod_geoip2 caches to disk and Nginx GeoIP won't. It's possible to do and all I wanted to know is if there was a solution for Nginx that supported the same functionality.
 
A comment from the uWSGI dev led me in the right direction on getting this working in Nginx.

In src/http/modules/ngx_http_geoip_module.c in the Nginx source code, do a find/replace for _MEMORY_CACHE/_STANDARD. Recompile with http_geoip_module and Nginx will no longer cache the database in RAM on start.

Nginx is now handing my GeoIP database from disk.

Shouldn't be too hard to create a patch to make this a config option
 
A comment from the uWSGI dev led me in the right direction on getting this working in Nginx.

In src/http/modules/ngx_http_geoip_module.c in the Nginx source code, do a find/replace for _MEMORY_CACHE/_STANDARD. Recompile with http_geoip_module and Nginx will no longer cache the database in RAM on start.

Nginx is now handing my GeoIP database from disk.

Shouldn't be too hard to create a patch to make this a config option
@HittingSmoke, nice finding and no reply so far to your post.
http://forum.nginx.org/read.php?11,246982

I'm still partial on the workers memory usage. I don't see what is the disadvantage to have the workers store the data into memory, as for me memory consumption is not an issue. Are you referring that the used memory could be allocated to other resources, since you rarely perform a check into MaxMind databases (therefore there is no need for quick response)? Thanks for your thoughts.
 
@HittingSmoke, nice finding and no reply so far to your post.
http://forum.nginx.org/read.php?11,246982

I'm still partial on the workers memory usage. I don't see what is the disadvantage to have the workers store the data into memory, as for me memory consumption is not an issue. Are you referring that the used memory could be allocated to other resources, since you rarely perform a check into MaxMind databases (therefore there is no need for quick response)? Thanks for your thoughts.

Exactly. Take two scenarios, a shared host you can compile on or a cheap VPS. In both situations you have very limited RAM. Especially on the VPS where you have OS overhead hitting your RAM allotment.

A compiled Nginx server with very basic modules is going to run ~700KB per worker. For the sake of taking the moderate ground let's throw the Pagespeed module in the mix to bring each worker to ~4MB. Now let's say I run Piwik analytics and I want location mapping on my activity charts.

In this instance the GeoIP lookups are only going to be called by my analytics suite. On each page load my analytics suite records data and writes it to the database. The GeoIP lookup is not done on the database write. It simply writes the the IP entries for each visit. When I load up my analytics software and check my location map chart the IP addresses are translated by the MaxMind database.

I check my analytics maybe twice per week. I don't need my analytics software to be extremely responsive as it's not a user-facing service. The only part of my analytics software I need to be responsive is the javascript loading. Database writes are done after the DOM is loaded and it does not affect the page load times.

Let's say I run 8 Nginx workers with ngx_pagespeed. 4MB per worker is ~32MB. 24MB per worker is nearly 200MB. That's a massive increase for the benefit of a service I'm going to use in a way which will not benefit my users in any way. That's RAM I could allocate to my innodb cache or my opcode cache. It's a complete waste to have in RAM unless I'm running on a dedicated server with a medium size or smaller database.

These are the nuances of server RAM allocation and optimization. I run a very tight server stack and I do it with .36 second average page load time on my XenForo site with a heavy theme and lots of image heavy threads. Multiplying my Nginx worker size by six makes a hit in where I can make other optimizations to keep my page load times to the absolute minimum. I'm a min/maxer when it comes to by server stack and while GeoIP is useful, it's not s6* memory usage useful.

I hope I explained that clearly. I hammered it out a bit quickly between jobs so I might have got a little rambly.
 
Top Bottom