ActivePerl の fork - あまつぶ＠はてなダイアリー

昨日の日記にコメントいただいた、ActivePerl のドキュメントを調べてみた。
まず、Release Notes を読んでみると、

The fork() emulation has known limitations. See perlfork for a detailed summary. In particular, fork() emulation will not work correctly with extensions that are either not thread-safe, or maintain internal state that cannot be cloned in the psuedo-child process. This caveat currently applies to extensions such as Tk and Storable.

と書かれていて、昨日書いたように、スレッドセーフでないモジュールを使用するとちゃんと動かないようだ。詳しくは perlfork を参照ということなので perlfork を確認してみると、

On some platforms such as Windows where the fork() system call is not available, Perl can be built to emulate fork() at the interpreter level. While the emulation is designed to be as compatible as possible with the real fork() at the level of the Perl program, there are certain important differences that stem from the fact that all the pseudo child ``processes'' created this way live in the same real process as far as the operating system is concerned.

とある。Windows のような他の（unix でない）プラットフォームでは、fork() システムコールが存在しないので、fork() をインタプリタレベルでエミュレートしているということだ。そして、Perl から見ると、本物の fork() と同じように見えるが、OS から見ると、同じプロセスの中で動いているように見える（疑似子プロセス）ということだ。たしかにそうなっている。
さらに先を読んでいくと、

Thread-safety of extensions
Since the fork() emulation runs code in multiple threads, extensions calling into non-thread-safe libraries may not work reliably when calling fork(). As Perl's threading support gradually becomes more widely adopted even on platforms with a native fork(), such extensions are expected to be fixed for thread-safety.

と書かれている。fork() はマルチスレッドでエミュレートされていることがわかる。そして、ここでもスレッドセーフでないライブラリはちゃんと動かないと書かれている。
では、どうしたらいいのか。perlthrtut (tutorial on threads in Perl) を読んでみると、

If you're using a module that's not thread-safe for some reason, you can protect yourself by using it from one, and only one thread at all. If you need multiple threads to access such a module, you can use semaphores and lots of programming discipline to control access to it. Semaphores are covered in Basic semaphores.

と書かれている。スレッドセーフでないモジュールを使う場合は、ひとつのスレッドからだけ呼び出されるようにすればよい。そのためには、semaphores や lots of programming descipline to control access を使う、と。いろいろ方法はあるようだけど、どんなものがあるのかわからないので、具体的に書かれている semaphore を調べてみる。semaphore については、さらに下の方に

Basic semaphores
Semaphores have two methods, down() and up(): down() decrements the resource count, while up increments it. Calls to down() will block if the semaphore's current count would decrement below zero. This program gives a quick demonstration:

と書かれていて、これに続けて使用例がある。semaphore には down() と up() という 2 つのメソッドがあって、ロックしたい場所の前で down() を、あとで up() を呼ぶ。down() が呼ばれると、semaphore に設定された値が減り、up() が呼ばれると逆に増える。値を減らすときには、減らされたあとの値が 0 未満になるかどうかを確認して、なってしまう場合は、減らせるようになるまで待つ。大丈夫なときは、減らして続きが実行されるという仕組み。
いちばん簡単なケースを考えてみる。最初の値は 1 で、down() でも up() でも 1 だけ値が変化するというケースだ。まず、初期状態では semaphore の値は 1 になっている。down() が呼ばれると、この値が 1 減らされて、0 になる。この状態で別のスレッドが down() を呼ぶと、0 から 1 を引くとマイナスになってしまうので、引くことができるようになるまで（1 になるまで）待つ。そして、最初のスレッドで up() が呼ばれると、1 が足されて、1 になるわけだ。こうして、あとのスレッドの続きが実行できるようになる。スレッドがいくつあっても、ひとつのスレッドから down() が呼ばれると、他のスレッドはそのスレッドが up() を呼ぶまで待つことになり、こうして必ずひとつのスレッドからだけ呼び出されるようになるということだ。

ということで、Thread::Semaphore を使って MailParse.pm を書き換えてみた。POPFile に含まれているモジュールに加えて、threads、threads::share、Thread::Semaphore、attributes などが必要になるので、ActivePerl からコピーしてみた。Perl のバージョンがあっていないからか、終了時に

Unbalanced scopes: 2 more ENTERs than LEAVEs
Unbalanced saves: 3 more saves than restores

というエラーメッセージが表示されてしまったが、問題なく動いた。また、これらの処理は、ActivePerl で fork() が使われたときだけ行えばよいので、($^O eq 'MSWin32') という条件とともに、($$ < 0) という条件を加えてみた。これは、ActivePerl の fork() で作られる疑似プロセスのプロセス ID はマイナスの値になっているからだ。fork() が使われていなければ（この場合、プロセス ID はプラスの値だ）、同時に別のスレッドから呼び出されることはないから、semaphore を使う必要がないというわけだ。
flock を使うよりも、こちらの方がよさそうな感じなので、パッチを書き換える予定。