[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Slow to add 1 million items

To: Christopher Wood <christopher_wood@pobox.com>, "openldap-technical@openldap.org" <openldap-technical@openldap.org>
Subject: Re: Slow to add 1 million items
From: Brent Bice <bbice@sgi.com>
Date: Mon, 10 Feb 2014 17:11:53 -0700
In-reply-to: <20140207214053.GA11552@iniquitous.heresiarch.ca>
References: <CAL_tfFf6LkngCwA5WvCcMO-f=icYmttKWq+jv+dCjOC=Fmw9_w@mail.gmail.com> <52F51CA9.9040908@sgi.com> <20140207214053.GA11552@iniquitous.heresiarch.ca>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

On 02/07/2014 02:40 PM, Christopher Wood wrote:

On Fri, Feb 07, 2014 at 10:49:29AM -0700, Brent Bice wrote:

(SNIP)

    I've got a few OpenLDAP instances that I use for writing log data
to, so write performance is critical, but since I'm building it from
log data, absitively, posolutely, guaranteed perfect DB consistency
isn't. I can always replay log data to rebuild the DB if, say, I had
a power outage, the UPS failed, the RAID write-cache failed, the
planets aligned, and I lost data. :-)


Out of interest, what are you using this log data for, and have you tested
how many reads you are getting?

In the recent past, I've setup a java script to logpostfix/sendmail/cuda logs to OpenLDAP and some simple php scripts toquery it. 'Makes it easier for junior admins and managerial types to beable to track how an email got from point A to point B. Say, an Exchangeuser sent an email to an internal list server - so it went from exchangeto a postfix relay to the list server, then back to the postfix relaysthen to some recips on Exchange, some recips on other lists, some ondepartmental mail servers using sendmail, etc. I can search by to/fromand/or date/time, find the email, then click on the message-ID to searchby that and show the email every hop along the way as well as all therecipients who got it. Makes it faster to sort out those "I sent anemail to list ABC and user XYZ didn't get it! Why not!" problems. Theanswer usually is "user XYZ did get it and here's the log showing it". :-)

I also recently started logging DHCP client hostnames, IPs, MACAddresses, and (if the dhcp request came from our VPN hardware)username. That way when I'm sifting through snort/FireEye/PaloAlto logsand I see some IP with a dhcp hostname of "MyPC" I can quickly tellwhich user's home machine is infected with malware-du-jour. I can seewho was on which IPs when.

Yeah, I coulda used MySQL or Postgres or something else. The firstone (the relay logs) started off as a weekend project to edjimicatemyself on the LDAP API in Java (or one of 'em). It proved useful enoughwe just kept it. And since I had that in place, adding on the vpn/dhcpstuff later was easy. I use the dds overlay to automagically throw awayrecords older than X days.

For both of those, the number of writes per second we do is low -around 4 or 5 per second last I checked.

However, we have a lot of DNS servers in a lot of differentgeographies and I've thought about trying to centralize their logs. Butthe query logs can be substantial - a terabyte per region per day - morethan I really want to shove over the WAN to a central spot. So itoccurred to me one morning that I could leave the log data distributed,but centralize how I query it. I could have one LDAP server that hadreferrals to other LDAP servers, one per region, and have all the DNSservers in a given region log their queries to their local LDAP server.Then a simple php script can do one query against the root server andfind any query handled by any DNS server in any region. (useful whenhandling an intrusion event, for instance, and you want to know everyDNS query made by some system between certain dates/times).

But SGI sells HPC equipment (big storage too, btw - grin). So it'snot unheard of for someone to spin up a big cluster in one location andgenerate thousands of DNS queries per second. So any sort of logging Ido has to scale well or it just won't work. There's likely a betterway, but this gave me a good excuse to try out OpenLDAP + mdb on xfs andto see if PHP's LDAP API would chase referrals. :-) I'll probably windup using some tool to index the textual query logs and some way tosearch all the indexes on all the regional log servers with regexpatterns instead or somethin'...

(scrolls back) Oh yeah... Reads... I haven't been paying closeattention to the number of reads per second I've been getting as writesand deletes were the bottleneck I was curious about. But the last timeI checked, I was getting something like 30k+ queries per second with 8threads on one client. But this is with zero tuning of the filesystemoptions and with a really simple-minded bit of java - this shouldn't betaken as any sort of serious benchmark. I've learned that properbenchmarking is HARD and I only use the java tool for rough guesstimates(and comparing how different config options may improve performance - ornot - in a relative sort of way).


Brent

References:
- Slow to add 1 million items
  - From: Andrew Eross <eross@locatrix.com>
- Re: Slow to add 1 million items
  - From: Brent Bice <bbice@sgi.com>
- Re: Slow to add 1 million items
  - From: Christopher Wood <christopher_wood@pobox.com>

Prev by Date: compile error with 2.4.39 on Solaris 10 (SPARC)
Next by Date: Re: compile error with 2.4.39 on Solaris 10 (SPARC)
Index(es):
- Chronological
- Thread