Hello I have a large list of URLs (from a database, generated automatically during tests) that I want to download using several wget processes at the same time. With our internal web servers, this will be a lot faster than downloading the pages one at a time with a single process. So I create 20 pipes in my script with `mkfifo? and connect the read end of each one to a new wget process for that fifo. The write end of each pipe is then connected to my script, with shell commands like `exec 18>>fifo_file_name? Then my script outputs, in a loop, one line with an URL to each of the pipes, in turn, and then starts over again with the first pipe until there are no more URLs from the database client. Much to my dismay I find that there is no concurrent / parallel download with the child `wget? processes, and that for some strange reason only one wget process can download pages at a time, and after that process completes, another one can begin. My script does feed *all* the pipes with data, one line to each pipe in turn, and has all the pipes written and closed by the time the first child process has even finished downloading. Do you know why my child processes manifest this behavior of waiting in turn for each other in order to start reading the fifo and download ? I figure it must be something about the pipes, because if I use regular files instead (and reverse the order: first write the URLs, then start wget to read them) than the child processes run in parallel as expected. The child processes also run in parallel if I open the write end of the pipes first, and the start the wget processes for the read end. They even run in parallel with my pipes, but I could see them run like this only for once in all my attempts. I do not know what was special about that attempt, it happened at the beginning of the day, and the computers where not restarted nor logged off over night. The pipes are created and deleted on ever run, with mkfifo and rm. Is there something special about fifos to make them run in sequence if I open the read end first ? My script is attached here, I believe it is nicely formatted and clear enough. Thank you, Timothy Madden -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pipes.visit_boards URL: <http://lists.centos.org/pipermail/centos/attachments/20111125/0da44a9e/attachment.ksh>
This really belongs on a shell list rather than the centos list, but: On Fri, Nov 25, 2011 at 1:05 PM, Timothy Madden <terminatorul at gmail.com> wrote:> > So I create 20 pipes in my script with `mkfifo? and connect the read end of > each one to a new wget process for that fifo. The write end of each pipe is > then connected to my script, with shell commands like `exec > 18>>fifo_file_name? > > Then my script outputs, in a loop, one line with an URL to each of the > pipes, in turn, and then starts over again with the first pipe until there > are no more URLs from the database client. > > Much to my dismay I find that there is no concurrent / parallel download > with the child `wget? processes, and that for some strange reason only one > wget process can download pages at a time, and after that process completes, > another one can begin.I believe the problem is with creating all the fifos and their readers first and then creating the writers. What happens is that you create wget #1, which has some file descriptors associated with both it and the parent shell. Next you create wget #2, which (because it was forked from the parent shell) shares all the file descriptors that the shell had open to wget #1, e.g., including the input to the fifo. Repeat for all the rest of the wget. By the time you have created the last one, each of them has a set of descriptors shared with every other that was created ahead of them. Thus, even though you write to the fifo for wget #2 and close it from the parent shell, it doesn't actually see EOF and begin processing the input until the corresponding descriptor shared by wget #1 is closed when wget #1 exits. wget #3 then doesn't see EOF until #2 exits (#3 would have waited for #1, too, except #1 is already gone by then). Then #4 waits for #3, etc. So you're either going to need to do a lot more clever descriptor wrangling to make sure wget #1 is not holding open any descriptors visible to wget #2, or you're going to have to use a simpler concurrency scheme that doesn't rely on having all those fifos opened ahead of time.> The child processes also run in parallel if I open the write end of the > pipes first, and the start the wget processes for the read end.Probably you inadvertently resolved the shared open descriptor problem by whatever change you made to the script to invert that ordering.
Apparently Analagous Threads
- FreeBSD mknod refuses to create pipes and fifos
- rsyncing fifos and sockets on FreeBSD
- [Bug 1804] FreeBSD's mknod can't create FIFOs and sockets
- bash on Centos 5 can not source FIFOs ...
- DO NOT REPLY [Bug 6280] New: (Bug incl. PATCH) Linux mknod does not work when syncing fifos and sockets from Solaris