Re: [系程] 教學: 簡介 fork, exec*, pipe, dup2 - SuperTree板

首頁(home) 上頁(↑) 下頁(↓) 末頁(end)

看板 SuperTree

作者 dick51207 (dick51207.bbs@ptt.cc)
標題 Re: [系程] 教學: 簡介 fork, exec*, pipe, dup2
時間 2017年08月14日 Mon. PM 08:07:49

※ 本文轉寄自 dick51207.bbs@ptt.cc

看板 b97902HW

作者 LoganChien (簡子翔)
標題 Re: [系程] 教學: 簡介 fork, exec*, pipe, dup2
時間 Fri Mar 19 07:06:36 2010

簡介 fork, exec*, dup2, pipe

實作 Command Interpreter 的 Pipeline：上一篇的綜合練習

看完上一篇，大家應該有能力寫一個具有 Pipeline 功能的簡單
Command Interpreter。所謂的 Command Interpreter 就像是
bash、ksh、tcsh 之類的東西，我們也稱之為 shell。一般而言
會是你登入一個系統之後第一個執行的程式。

而我們所談論的 Pipeline 有一點像 IO redirection。例如我
下達以下的指令：

command1 | command2 | command3

此時 command1 的 stdout 會被當作 command2 的 stdin；command2
的 stdout 會被當作 command3 的 stdin。而當上面的指令執行時，
command1 與 command3 的標準輸出都不會顯示到螢幕上。

例如：cat /etc/passwd 指令是用來把 /etc/passwd 這一個
檔案的檔案內容印到 stdout 上面；而 grep username 是從
stdin 讀入每一行，如果某一行有 username 就輸出該行到
標準輸出。所以當他們用 pipeline 組合在一起：

cat /etc/passwd | grep username

就會變成在螢幕上顯示 /etc/passwd 之中含有 username 的
那幾行。當然，如果靈活使用 pipeline 可以用很少的指令
變化出很多功能。因此 pipeline 在 *nix 環境下是很重要的
東西。你能用 open/close/dup2/exec*/fork 寫出一個具有
Pipeline 功能的 Command Interpreter 嗎？

以下是我寫到一半到程式碼，他已經可以把使用者輸入的指令
轉換成若干個可以傳給 execvp 的 argv，只剩 pipeline 的
部分還沒有寫完，你可以試著寫寫看：

http://w.csie.org/~b97073/B/todo-pipeline-shell.c

(防雷，按 Page Down 繼續閱讀)

你也可以直接下載我隨手寫的版本：

http://w.csie.org/~b97073/B/simple-pipeline-shell.c

這一份程式碼其實沒有新得東西，就是利用先前介紹過的：IO
redirection (red.c 使用的方法)，與使用 fork/exec 來建立
child process。

我在執行 command1 的時候，我把他的 stdout 導向一個檔案。
當他結束之後，我再把這個檔案做為 stdin 導入 command2，
而 command2 的 stdout 再導入另一個檔案... 以下類推。

我們還是看一下其中的 creat_proc 與 execute_cmd_seq 二個函式：

/* Purpose: Create child process and redirect io. */
void creat_proc(char **argv, int fd_in, int fd_out)
{
/* creat_prc 函式主要的目的是建立 child process，並且做好 IO redirection。
它的參數有三個：argv 是將來要傳給 execvp 用的；fd_in、fd_out 分別是
輸入輸出的 file descriptor。 */

pid_t proc = fork();

if (proc < 0)
{
fprintf(stderr, "Error: Unable to fork.\n");
exit(EXIT_FAILURE);
}

else if (proc == 0)

{
if (fd_in != STDIN_FILENO)
{

/* 把 fd_in 複製到 STDIN_FILENO */

dup2(fd_in, STDIN_FILENO);

/* 因為 fd_in 沒有用了，就關掉他 */

close(fd_in);

}

if (fd_out != STDOUT_FILENO)
{

/* 把 fd_out 複製到 STDOUT_FILENO */

dup2(fd_out, STDOUT_FILENO);

/* 因為 fd_out 沒有用了，就關掉他 */

close(fd_out);

}

/* 載入可執行檔，我直接把 argv[0] 當成 executable name */
if (execvp(argv[0], argv) == -1)
{

fprintf(stderr,

"Error: Unable to load the executable %s.\n",

argv[0]);

exit(EXIT_FAILURE);

}

/* NEVER REACH */

exit(EXIT_FAILURE);
}
else
{
int status;
wait(&status); /* 等程式執行完畢 */
}
}

/* Purpose: Create several child process and redirect the standard output
* to the standard input of the later process.
*/
void execute_cmd_seq(char ***argvs)
{
int C;
for (C = 0; C <= MAX_CMD_COUNT; ++C)
{

char **argv = argvs[C];

if (!argv) { break; }

int fd_in = STDIN_FILENO;
int fd_out = STDOUT_FILENO;

if (C > 0)
{

/* 開啟暫存檔案 */

fd_in = open(pipeline_tmp_[C - 1], O_RDONLY);

if (fd_in == -1)

{
fprintf(stderr, "Error: Unable to open pipeline tmp r.\n");

exit(EXIT_FAILURE);

}
}

if (C < MAX_CMD_COUNT && argvs[C + 1] != NULL)
{

/* 開啟暫存檔案 */

fd_out = open(pipeline_tmp_[C],

O_WRONLY | O_CREAT | O_TRUNC,

0644);

if (fd_out == -1)

{
fprintf(stderr, "Error: Unable to open pipeline tmp w.\n");

exit(EXIT_FAILURE);

}
}

creat_proc(argv, fd_in, fd_out);

if (fd_in != STDIN_FILENO) { close(fd_in); }
if (fd_out != STDOUT_FILENO) { close(fd_out); }
}
}

直接用暫存檔案實作 pipeline 的缺點

不過上面直接用暫存檔案來達成 pipeline 有什麼缺點呢？

(1) 就是慢！因為不過是要讓二個程式相互溝通而已，實在沒有必要
把內容寫入硬碟。而且可能會用去為數不少的空間。例如：執行
這個指令一定很花時間與硬碟空間：

tar c / | tar xv -C .

(2) command1, command2, .. commandN 只能夠依序輪流執行。因為
如果 command1 還沒寫完，而 command2 讀得比較快，則 command2
可能誤以為 command1 的輸出已經結束了。所以為了避免資料不完
整，我們只能在 command1 結束之後再執行 command2。然而這樣可
能比較浪費時間。

那有沒有解決的方法呢？這就是我們下一個要介紹的系統呼叫：pipe()。

pipe：二個 Process 之間溝通的橋樑

pipe 顧名思意就是水管的意思，當我們呼叫 pipe 的時候，他會為
我們開啟二個 File descriptor，一個讓我們寫入資料，另一個讓我
們讀出資料。他的主要用途是讓二個 Process 可以互相溝通(Inter-
process Communication, IPC)。在大多數的系統中，pipe 是使用記
憶體來當 buffer，所以會比直接把檔案寫到硬碟有效率。pipe 的函
式原型如下：

int pipe(int fds[2]);

當我們呼叫 pipe 的時候，我們必需傳入一個大小至少為 2 的 int
陣列，pipe 會在 fds[0] 回傳一個 Read Only 的 File descriptor，
在 fds[1] 回傳一個 Write Only 的 File descriptor。當二個
Processs 要相互溝通的時候，就直接使用 write 系統呼叫把資料
寫進 pipe，而接收端就可以用 read 來讀取資料。

另外，和一般的檔案不同，除非 pipe 的 write-end (寫入端) 全部
都被 close 了，不然 read 會一直等待新的輸入，而不是以為已經
走到 eof。

備註：雖然我們是從 Pipeline 開始提到 pipe()，不過，Pipeline
未必要用 pipe() 實作。pipe() 的應用領域也不限於 Pipeline。
不過以 pipe() 實作 Pipeline 確實是一個很有效率的方法，
究我所知，GNU bash 就是使用 pipe() 來實作 Pipeline。

我們可以看一下一個簡單的 Multiprocess Random Generator 的範例：

/* 程式碼： pipe-example.c */

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#include <unistd.h>

enum { RANDOM_NUMBER_NEED_COUNT = 10 };

int main()
{
int pipe_fd[2];

if (pipe(pipe_fd) == -1) /* 建立 pipe */
{
fprintf(stderr, "Error: Unable to create pipe.\n");
exit(EXIT_FAILURE);
}

pid_t pid;

if ((pid = fork()) < 0) /* 注意：fork 的時候，pipe 的 fd 會被 dup */
{
fprintf(stderr, "Error: Unable to fork process.\n");
exit(EXIT_FAILURE);
}

else if (pid == 0)

{
/* -- In the Child Process -------- */

/* Close Read End */

close(pipe_fd[0]); /* close read end, since we don't need it. */
/* 我們在 Child Process 只想要當寫出端，所以我們就要先把 pipe 的 read

end 關掉 */

/* My Random Number Generator */
srand(time(NULL));

int i;
for (i = 0; i < RANDOM_NUMBER_NEED_COUNT; ++i)
{

sleep(1); // wait 1 second

int randnum = rand() % 100;

/* 把資料寫出去 */

write(pipe_fd[1], &randnum, sizeof(int));
}

exit(EXIT_SUCCESS);
}
else
{
/* -- In the Parent Process -------- */

/* Close Write End */

close(pipe_fd[1]); /* Close write end, since we don't need it. */
/* 不會用到 Write-end 的 Process 一定要把 Write-end 關掉，不然 pipe

的 Read-end 會永遠等不到 EOF。 */

int i;
for (i = 0; i < RANDOM_NUMBER_NEED_COUNT; ++i)
{

int gotnum;

/* 從 Read-end 把資料拿出來 */

read(pipe_fd[0], &gotnum, sizeof(int));

printf("got number : %d\n", gotnum);
}
}

return EXIT_SUCCESS;
}

雖然上面的例子展示了二個 Process 之間如何溝通。不過只看這個
例子看不出 pipe 的價值。我們的第二個例子就是要利用 pipe 來
攔截另一個 Program 的 standard output。

在第二個例子之中，我們會有二個 Program，也就是會有二個可執行
檔案。其中一個專門付負製造 Random Number，然後直接把 32-bit
int 寫到 standard output。而令一個會去呼叫前述的 Random Number
製造程式，然後攔截他的 standard output。

/* 程式碼： random-gen.c */
/* 這一個檔案就沒有什麼特別的，就只是不斷製造 Random Number */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#include <unistd.h>

enum { RANDOM_NUMBER_NEED_COUNT = 10 };

int main()
{
srand(time(NULL));

int i;
for (i = 0; i < RANDOM_NUMBER_NEED_COUNT; ++i)
{
sleep(1); /* Wait 1 second. Simulate the complex process of

generating the safer random number. */

int randnum = rand() % 100;

write(STDOUT_FILENO, &randnum, sizeof(int));

/* 注意：是寫到 stdout 。*/

}

return EXIT_SUCCESS;
}

/* 程式碼：pipe-example-2.c */

#include <stdio.h>
#include <stdlib.h>

#include <unistd.h>

enum { RANDOM_NUMBER_NEED_COUNT = 10 };

int main()
{
/* -- Prepare Pipe -------- */
int pipe_fd[2];

if (pipe(pipe_fd) == -1)
{
fprintf(stderr, "Error: Unable to create pipe.\n");
exit(EXIT_FAILURE);
}

/* -- Create Child Process -------- */
pid_t pid;
if ((pid = fork()) < 0)
{
fprintf(stderr, "Error: Unable to create child process.\n");
exit(EXIT_FAILURE);
}
else if (pid == 0) /* In Child Process */
{

/* Close Read End */

close(pipe_fd[0]); /* Close read end, since we don't need it. */

/* Bind Write End to Standard Out */

dup2(pipe_fd[1], STDOUT_FILENO);
/* 把第 pipe_fd[1] 個 file descriptor 複製到第 STDOUT_FILENO 個

file descriptor */

/* Close pipe_fd[1] File Descriptor */

close(pipe_fd[1]);

/* 說明：經過上面三個步驟之後，這個 Child Process 的第 1 號 File
Descriptor 會是 pipe 的 Write-end，所以在我們做標準輸出的時候，
所有的資料都跑進我們的 pipe 裡面。因此另一端的 Read-end 就可以

接收到 random-gen 的標準輸出。 */

/* Load Another Executable */
execl("random-gen", "./random-gen", (char *)0);

/* This Process Should Never Go Here */
fprintf(stderr, "Error: Unexcept flow of control.\n");
exit(EXIT_FAILURE);
}
else /* In Parent Process */
{
/* Close pipe_fd[1] File Descriptor */
close(pipe_fd[1]); /* Close write end, since we will not use it. */

/* Read Random Number From Pipe */
int i;
for (i = 0; i < RANDOM_NUMBER_NEED_COUNT; ++i)
{

int gotnum = -1;

read(pipe_fd[0], &gotnum, sizeof(int));

printf("got number : %d\n", gotnum);
}
}

return EXIT_SUCCESS;
}

再回頭寫 Command Interpreter：加上 pipe() 系統呼叫，你可以寫得更好嗎？

這是我寫得另一個版本(使用 pipe() 的版本)：

http://w.csie.org/~b97073/B/faster-pipeline-shell.c

這次我先檢查指令有多少個 '|'，這代表我要準備多少的 pipe。接
著我為每一個 commandI 都用 fork 建立一個 Process，讓所有的
Process 可以用時執行。

另外，使用 pipe() 來實作有一個好處，就是如果 command2 要
read 東西，可是 command1 還沒有算完，command2 的 read 就會
一直等下去。所以我們不用依序輪流執行。所有的 process 可以
並行運作，除非遇到 IO blocking。而且使用 pipe() 也省去了暫
存檔案命名的困擾。

但是寫 pipe 的版本就要注意：對於所有的 Process，如果該 Process
不需要 Write-end 就一定要記得關掉他，不然像是 cat 或者 grep
的程式就會一直等不到 EOF，也就不會結束了！

我們可以快速地看一下 execute_cmd_seq 與 creat_proc 二個函式：

/* Purpose: Create several child process and redirect the standard output
* to the standard input of the later process.
*/
void execute_cmd_seq(char ***argvs)
{
int C, P;

int cmd_count = 0;
while (argvs[cmd_count]) { ++cmd_count; }

int pipeline_count = cmd_count - 1;

int pipes_fd[MAX_CMD_COUNT][2];

/* 準備足夠的 pipe */

for (P = 0; P < pipeline_count; ++P)
{
if (pipe(pipes_fd[P]) == -1)
{
fprintf(stderr, "Error: Unable to create pipe. (%d)\n", P);

exit(EXIT_FAILURE);

}
}

for (C = 0; C < cmd_count; ++C)
{
int fd_in = (C == 0) ? (STDIN_FILENO) : (pipes_fd[C - 1][0]);
int fd_out = (C == cmd_count - 1) ? (STDOUT_FILENO) : (pipes_fd[C][1]);

/* 呼叫下面的 creat_proc 來建立 Child Process */
creat_proc(argvs[C], fd_in, fd_out, pipeline_count, pipes_fd);
}

/* 在建立所有 Child Process 之後，Parent Process 本身就不必使用 pipe
了，所以關閉所有的 File descriptor。*/
for (P = 0; P < pipeline_count; ++P)
{
close(pipes_fd[P][0]);
close(pipes_fd[P][1]);
}

/* 等待所有的程式執行完畢 */
for (C = 0; C < cmd_count; ++C)
{
int status;

wait(&status);

}
}

/* Purpose: Create child process and redirect io. */
void creat_proc(char **argv,

int fd_in, int fd_out,

int pipes_count, int pipes_fd[][2])

{
pid_t proc = fork();

if (proc < 0)
{
fprintf(stderr, "Error: Unable to fork.\n");
exit(EXIT_FAILURE);
}

else if (proc == 0)

{
/* 把 fd_in 與 fd_out 分別當成 stdin 與 stdout。 */
if (fd_in != STDIN_FILENO) { dup2(fd_in, STDIN_FILENO); }
if (fd_out != STDOUT_FILENO) { dup2(fd_out, STDOUT_FILENO); }

/* 除了 stdin, stdout 之外，所有的 File descriptor (pipe) 都要關閉。*/
int P;

for (P = 0; P < pipes_count; ++P)

{

close(pipes_fd[P][0]);

close(pipes_fd[P][1]);

}

if (execvp(argv[0], argv) == -1)
{

fprintf(stderr,

"Error: Unable to load the executable %s.\n",

argv[0]);

exit(EXIT_FAILURE);

}

/* NEVER REACH */

exit(EXIT_FAILURE);
}
}

結語

我們從一個簡單的 io redirect 程式談起。一路介紹了 exec, fork,
dup2, pipe 等系統呼叫。還寫了一個簡單的 Command Interpreter。
希望可以透過這二篇小小的篇幅，讓大家能對上面四個系統呼叫更為
熟悉。

備註：這二篇大部分的程式碼可以在以下的網址取得：

http://w.csie.org/~b97073/B/sp-article2.tar.gz

(完)

--
LoganChien ----- from PTT2 個板 logan -----

--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 140.112.247.159
※ 編輯: LoganChien 來自: 140.112.247.159 (03/19 07:10)

→ xflash96:推。1F 03/19 07:53

推 qcl: 推！2F 03/19 09:33

推 louisyou:推喔!3F 03/19 09:36

推 hanabi:大推!4F 03/19 13:13

→ Daniel1147:推5F 03/19 20:45

推 moonblack:推6F 03/22 16:27

→ dennis2030:推7F 03/26 00:05

推 averangeall:太厲害了!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!8F 04/18 17:39

→ Bingojkt:教學文全消推2@w<9F 04/19 18:15

※ 同主題文章: