reduce some mutex performance problems in profile library
In profile_node_iterator we unlock a mutex in order to call
profile_update_file_data, which wants to lock that mutex itself, and
then when it returns we re-lock the mutex. (We don't use recursive
mutexes, and I would continue to argue that we shouldn't.) On the
Mac, when running multiple threads, it appears that this results in
very poor peformance, and much system and user CPU time is spent
working with the locks. (Linux doesn't seem to suffer as much.)
So: Split profile_update_file_data into a locking wrapper, and an
inner routine that does the real work but requires that the lock be
held on entry. Call the latter from profile_node_iterator *without*
unlocking first, and only unlock if there's an error. This doesn't
move any significant amount of work into the locking region; it pretty
much just joins locking regions that were disjoint for no good reason.
On my tests on an 8-core Mac, in a test program running
gss_init_sec_context in a loop in 6 threads, this brought CPU usage
per call down by 40%, and improved wall-clock time even more.
Single-threaded performance improved very slightly, probably in the
noise.
Linux showed modest improvement (5% or less) in CPU usage in a
3-thread test on a 4-core system.
Similar tests with gss_accept_sec_context showed similar contention
around the profile-library mutexes, but I haven't analyzed the
performance changes there from this patch.
More work is needed, but this will help.
ticket: 6515
tags: pullup
target_version: 1.7.1
version_reported: 1.7
git-svn-id: svn://anonsvn.mit.edu/krb5/trunk@22418
dc483132-0cff-0310-8789-
dd5450dbe970