0%

判断文件是否存在

相关函数

  • access
  • stat
  • inotify_node
  • opendir
  • readdir

注意事项

文件系统并不会实时刷新缓存,尤其是在网络文件系统中。这会导致文件即使已经创建,accessstat函数依然返回”No such file”。
但是ls可以看到文件,这是因为lsaccess函数的实现机制不同。

例如:

NFS文件系统中,创建文件之后立即调用stat命令查看文件,stat会报告“文件不存在”。
删除文件之后立即调用stat命令查看文件,发现stat依然可以看到该文件。

我们可以改用readdir来读取目录,因为目录会被更快地更新缓存。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ which rpcgen
/bin/rpcgen
$ rpcgen
usage: rpcgen infile
rpcgen [-abkCLNTM][-Dname[=value]] [-i size] [-I [-K seconds]] [-Y path] infile
rpcgen [-c | -h | -l | -m | -t | -Sc | -Ss | -Sm] [-o outfile] [infile]
rpcgen [-s nettype]* [-o outfile] [infile]
rpcgen [-n netid]* [-o outfile] [infile]
options:
-a generate all files, including samples
-b backward compatibility mode (generates code for SunOS 4.1)
-c generate XDR routines
-C ANSI C mode
-Dname[=value] define a symbol (same as #define)
-h generate header file
-i size size at which to start generating inline code
-I generate code for inetd support in server (for SunOS 4.1)
-K seconds server exits after K seconds of inactivity
-l generate client side stubs
-L server errors will be printed to syslog
-m generate server side stubs
-M generate MT-safe code
-n netid generate server code that supports named netid
-N supports multiple arguments and call-by-value
-o outfile name of the output file
-s nettype generate server code that supports named nettype
-Sc generate sample client code that uses remote procedures
-Ss generate sample server code that defines remote procedures
-Sm generate makefile template
-t generate RPC dispatch table
-T generate code to support RPC dispatch tables
-Y path directory name to find C preprocessor (cpp)

For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.

Note:

% 可以用来转义某行,任何句首带有 % 的行将被视作字符串,直接放入到输出文件中。
注意:rpcgen 可能改变该行的位置,所以应该在输出文件中对这些行进行仔细检查。

rpcgen provides an additional preprocessing feature: any line that begins with a percent sign (%) is passed directly to the output file, with no action on the line’s content. Use caution because rpcgen does not always place the lines where you intend. Check the output source file and, if needed, edit it.

例如

1
#include "abc.h"

当执行 rpcgen infile 命令时,由于 “abc.h” 不存在,可能会报错。

但是如果在句首添加一个 % 符号,则可以绕过检查。

1
%#include "abc.h"

From ChatGPT:

In general, when a program enters a signal handler in a multi-threaded environment, the behavior regarding other threads depends on how the signal handler is set up and the specific signal that is being handled.

  1. Default Behavior: By default, when a signal is delivered to a process, it interrupt the thread that is currently running and executes the signal handler in the context of that thread. Other threads in the process continue running unless they are also interrupted by signals.
  2. Thread-Specific Signal Handling: Some signals, such as SIGINT (interrupt signal), SIGTERM (termination signal), or SIGABRT (abort signal), are typically delivered to the entire process, which means they can interrupt any thread. However, other signals, like SIGSEGV (segmentation fault) or SIGILL (illegal signal), are usually delivered to the specific thread that caused the signal.
  3. Signal Masking: In a multi-threaded program, you can use signal masking (sigprocmask in POSIX systems) to block certain signals in specific threads. This can affect whether a signal handler interrupts a particular thread or not.
  4. Asynchronous-Signal-Safe Functions: Signal handlers should only execute functions are considered “asynchronous-signal-safe” according to POSIX standards. These functions are designed to be safe to call from within a signal handler. Using non-safe functions in a signal handler can lead to undefined behavior.

1. 控制线程数

  • Method 1: Use the environment variable TBB_NUM_THREADS for the gloabl setting.
1
export TBB_NUM_THREADS=4

TODO: It doesn’t seem to work!

  • Method 2: Use tbb::task_arena or tbb::task_scheduler_init (Deprecated).

TBB will use this setting locally within the scope of the tbb::task_arena.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <tbb/pipeline.h>
// Deprecated:
// #include <tbb/task_scheduler_init.h>
#include <tbb/task_arena.h>

// Define your pipeline body
class MyPipeline {
public:
void operator() (tbb::flow_control& fc) const {
// Your pipeline logic here
// ...
// Inform the pipeline that there is no more data
fc.stop();
}
};

int main() {
// Deprecated: tbb::task_scheduler_init init(1);
tbb::task_arena arena(4); // 4 threads
// Do some tasks:
tbb::parallel_pipeline(/* max_number_of_live_tokens */ 4, MyPipeline); // FIXME: 似乎需要放入 arena 的 execute 函数中

return 0;
}

2. parallel_for

API: parallel_for

  1. my_parallel_for 模拟 parallel_for 的实现:
my_parallel_for.cppview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <tbb/tbb.h>
#include <vector>
#include <iostream>

// 模拟 parallel_for 的内部实现
void my_parallel_for(const tbb::blocked_range<size_t>& range, const std::function<void(const tbb::blocked_range<size_t>&)>& body) {
if (range.is_divisible()) {
// 分割范围
tbb::blocked_range<size_t> left(range.begin(), range.begin() + (range.end() - range.begin()) / 2);
tbb::blocked_range<size_t> right(left.end(), range.end());

// 递归调用
tbb::parallel_invoke(
[&] { my_parallel_for(left, body); },
[&] { my_parallel_for(right, body); }
);
} else {
// 处理当前范围
body(range);
}
}

int main() {
std::vector<int> data(100);
for (int i = 0; i < 100; ++i) {
data[i] = i;
}

// 使用自定义的 parallel_for 进行并行处理
my_parallel_for(tbb::blocked_range<size_t>(0, data.size()), [&](const tbb::blocked_range<size_t>& r) {
for (size_t i = r.begin(); i != r.end(); ++i) {
data[i] *= 2; // 示例操作:将每个元素乘以2
}
});

// 输出结果
for (const auto& val : data) {
std::cout << val << " ";
}
std::cout << std::endl;

return 0;
}
  1. 发出任务的线程也会成为工作线程之一,并参与任务的执行,测试代码如下:
test_parallel_for.cppview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#include <tbb/tbb.h>

#include <iostream>
#include <mutex>
#include <thread>
#include <vector>
using namespace std;
using namespace tbb;

static std::atomic<int> total_blocks(0); // Atomic counter to track the number of tasks processed
static std::atomic<int> total_blocks2(0); // Atomic counter to track the number of tasks processed

// Function to process each data element
void process_data(int i, std::mutex& mtx) {
// std::this_thread::sleep_for(std::chrono::milliseconds(1)); // Simulate some processing time
for (int i = 0; i < 100; ++i)
;
}

int main(int argc, char* argv[]) {
if (argc < 3) {
std::cerr << "Usage: " << argv[0] << " <number_of_elements>"
<< "<grain_size>" << std::endl;
return 1;
}

int num_elements = std::stoi(argv[1]);
int grain_size = std::stoi(argv[2]);

std::mutex mtx;
std::vector<int> data(num_elements);
for (int i = 0; i < num_elements; ++i) {
data[i] = i;
}

// {
// std::lock_guard<std::mutex> lock(mtx);
// std::cout << "Main thread ID: " << std::this_thread::get_id() << std::endl;
// }

tbb::concurrent_unordered_map<std::thread::id, tbb::concurrent_vector<int>> thread_task_counts; // To store task counts for each thread
tbb::concurrent_unordered_map<std::thread::id, tbb::concurrent_vector<int>> thread_task_counts2; // To store task counts for each thread

tbb::parallel_for(0, static_cast<int>((data.size() + grain_size - 1) / grain_size), [&](int i) {
total_blocks++;
int cnt = 0; // Thread-local variable to avoid data
for (int j = i * grain_size; j < std::min(static_cast<int>(data.size()), (i + 1) * grain_size); ++j) {
process_data(i, mtx);
++cnt;
}
thread_task_counts[std::this_thread::get_id()].push_back(cnt);
});

tbb::parallel_for(blocked_range<int>(0u, data.size(), grain_size), [&](const blocked_range<int>& r) {
total_blocks2++;
int cnt = 0; // Thread-local variable to avoid data
for (int i = r.begin(); i < r.end(); ++i) {
process_data(i, mtx);
++cnt;
}
thread_task_counts2[std::this_thread::get_id()].push_back(cnt);
});

std::cout << "Total blocks processed: " << total_blocks.load() << std::endl;
for (const auto& pair : thread_task_counts) {
std::cout << "Thread " << pair.first << " processed: ";
for (const auto& task_count : pair.second)
std::cout << task_count << ", ";
std::cout << std::endl;
}

std::cout << "Total blocks2 processed: " << total_blocks2.load() << std::endl;
for (const auto& pair : thread_task_counts2) {
std::cout << "Thread " << pair.first << " processed: ";
for (const auto& task_count : pair.second)
std::cout << task_count << ", ";
std::cout << std::endl;
}

return 0;
}

测试结果:

1
2
3
4
5
6
7
$ ./test_parallel_for 
Main thread ID: 140220582070080
Processing data: 2 on thread 140220582070080
Processing data: 6 on thread 140220557755968
Processing data: 4 on thread 140220574795328
Processing data: 8 on thread 140220566275648
Processing data: 10 on thread 140220562015808

可见,data 2 是由主线程处理的。也就是说,parallel_for 虽然被称为 a blocking parallel construt,但线程等待所有任务完成期间是非阻塞的,它还可以充当工作线程执行任务池中的任务。

代码模拟 parallel_forwait

my_task_scheduler.cppview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
#include <iostream>
#include <vector>
#include <thread>
#include <functional>
#include <condition_variable>
#include <queue>

class TaskScheduler {
public:
TaskScheduler(size_t numThreads);
~TaskScheduler();
void enqueue(std::function<void()> task);
void wait();

private:
std::vector<std::thread> workers;
std::queue<std::function<void()>> tasks;
std::mutex queueMutex;
std::condition_variable condition;
std::condition_variable finished;
bool stop;
size_t activeTasks;

void workerThread();
void executeTask();
};

TaskScheduler::TaskScheduler(size_t numThreads) : stop(false), activeTasks(0) {
for (size_t i = 0; i < numThreads; ++i) {
workers.emplace_back(&TaskScheduler::workerThread, this);
}
}

TaskScheduler::~TaskScheduler() {
{
std::unique_lock<std::mutex> lock(queueMutex);
stop = true;
}
condition.notify_all();
for (std::thread &worker : workers) {
worker.join();
}
}

void TaskScheduler::enqueue(std::function<void()> task) {
{
std::unique_lock<std::mutex> lock(queueMutex);
tasks.push(std::move(task));
}
condition.notify_one();
}

void TaskScheduler::wait() {
std::unique_lock<std::mutex> lock(queueMutex);
while (!tasks.empty() || activeTasks > 0) {
// 如果还有任务,执行一个任务,避免当前线程被阻塞
if (!tasks.empty()) {
executeTask();
} else {
finished.wait(lock);
}
}
}

void TaskScheduler::workerThread() {
while (true) {
std::function<void()> task;
{
std::unique_lock<std::mutex> lock(queueMutex);
condition.wait(lock, [this] { return stop || !tasks.empty(); });
if (stop && tasks.empty()) return;
task = std::move(tasks.front());
tasks.pop();
++activeTasks;
}
task();
{
std::unique_lock<std::mutex> lock(queueMutex);
--activeTasks;
if (tasks.empty() && activeTasks == 0) {
finished.notify_all();
}
}
}
}

void TaskScheduler::executeTask() {
std::function<void()> task;
{
std::unique_lock<std::mutex> lock(queueMutex);
if (tasks.empty()) return;
task = std::move(tasks.front());
tasks.pop();
++activeTasks;
}
task();
{
std::unique_lock<std::mutex> lock(queueMutex);
--activeTasks;
if (tasks.empty() && activeTasks == 0) {
finished.notify_all();
}
}
}

void parallel_for(int start, int end, std::function<void(int)> func) {
static TaskScheduler scheduler(std::thread::hardware_concurrency());
for (int i = start; i < end; ++i) {
scheduler.enqueue([i, &func] { func(i); });
}
scheduler.wait();
}

int main() {
const int N1 = 100;
const int N2 = 100;

// The first parallel loop.
parallel_for(0, N1, [&](int i) {
// The second parallel loop.
parallel_for(0, N2, [&](int j) {
// Some work
});
// 线程发出 parallel_for 之后,需要等待内部所有 parallel loop 的任务完成
// 在此期间允许继续拿取外部的 parallel loop 的任务执行
});

return 0;
}

3. TBB线程池

TBB似乎总是使用同一个全局线程池。测试代码如下:

tbb_thread_pool.cppview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#include <tbb/tbb.h>
#include <unistd.h>

#include <iostream>
#include <mutex>
#include <thread>
#include <vector>

using namespace std;

std::mutex mtx;

void task_group_function() {
tbb::task_group tg;
int max_concurrency = tbb::this_task_arena::max_concurrency();
{
std::lock_guard<std::mutex> lock(mtx);
cout << "Task group max concurrency: " << max_concurrency << endl;
}
for (int i = 0; i < 16; ++i) {
tg.run([i] {
{
std::lock_guard<std::mutex> lock(mtx);
std::cout << "Task group thread " << std::this_thread::get_id() << " is running." << std::endl;
}
sleep(1);
});
}
tg.wait();
}

void task_arena_function() {
tbb::task_arena arena(4);
int max_concurrency = arena.max_concurrency();
{
std::lock_guard<std::mutex> lock(mtx);
cout << "Task arena max concurrency: " << max_concurrency << endl;
}
arena.execute([max_concurrency] {
tbb::parallel_for(0, 16, [](int i) {
{
std::lock_guard<std::mutex> lock(mtx);
std::cout << "Task arena thread " << std::this_thread::get_id() << " is running." << std::endl;
}
sleep(2);
});
});
}

int main() {
// 获取默认task_arena的最大并发线程数
int arena_max_concurrency = tbb::this_task_arena::max_concurrency();
std::cout << "Default task_arena max concurrency: " << arena_max_concurrency << std::endl;

// 创建两个线程
std::thread tg_thread(task_group_function);
std::thread ta_thread(task_arena_function);

// 等待两个线程完成
tg_thread.join();
ta_thread.join();

return 0;
}

测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ mkdir build && cd build && cmake .. && make
$ ./tbb_thread_pool > result.txt
$ cat result.txt | grep running | sort | uniq
Task arena thread 140667163379264 is running.
Task arena thread 140667167639104 is running.
Task arena thread 140667184678464 is running.
Task arena thread 140667201848896 is running.
Task group thread 140667167639104 is running.
Task group thread 140667171898944 is running.
Task group thread 140667176158784 is running.
Task group thread 140667180418624 is running.
Task group thread 140667188938304 is running.
Task group thread 140667210303040 is running.

从这两行日志可以看出,arena 和 group 重用了同一个线程 ID ,说明它们同属于同一个全局线程池。

1
2
Task arena thread 140667167639104 is running.
Task group thread 140667167639104 is running.

进一步,我们发现全局线程池中的线程总数是自适应的,比如本例就是 10 个,既不是 task_group8 个,
也不是 task_arena4 个:

TODO

1
2
$ cat result.txt | grep running | sort | uniq | wc -l
10

4. 任务调度器(Task Scheduler)

The Task Scheduler

4.1. 基于任务编程(Task-Based Programming)

当追求性能时,推荐以逻辑任务(logical tasks)而不是线程(threads)来编程,有以下原因:

  • 将并行性与可用资源匹配
  • 更快的任务启动和关闭
  • 更有效的评估顺序
  • 改进负载均衡
  • 更高层的思考

TODO

4.2. 任务调度器(Task Scheduler)如何工作

How Task Scheduler Works

4.2.1. 深度优先(depth-first)

每个线程都有自己的双端队列,头部称为 top (也称顶部),尾部称为 bottom (也称底部)。
队列的底部是队列的最深处(最末处),底部任务是最新的,顶部任务是最旧的。

深度优先有以下好处:

  • 热点缓存命中:最新的任务的缓存是最热的,所以优先执行新任务。
  • 最小化空间:广度优先会同时创建指数级数量的共存节点,而深度优先虽然也会创建相同数量的节点,但是只有线性数目的节点会同时共存,因为它创建了其他就绪任务的栈。

生产:当线程产生一个任务时,将其放置到线程自身所有的 deque 的尾部。

消费:当线程执行任务时,根据以下规则顺序选取一个任务:

  • 规则1:获取上一个任务返回的任务,如果有;
  • 规则2:从线程自己所有的 deque 尾部选取一个任务(即深度优先),如果有;
  • 规则3:随机选择一个其他线程的 deque ,从其头部窃取一个任务(即广度优先)。如果被选 deque 为空,则重复本条规则直至成功。

规则1 被称为“任务调度绕行(Task Scheduler Bypass)”。

规则2 是深度优先,这使得当前线程得以不断执行最新的任务直至其完成所有工作。

规则3 是临时的广度优先,它将潜在的并行转化为实际的并行。

4.2.2. 任务调度绕行(Task Scheduler Bypass)技术

一个任务从产生到被执行涉及以下步骤:

  • 将新任务加入线程的 deque 。
  • 执行当前任务直至完成。
  • 从线程 deque 获取一个任务执行,除非该任务被其他线程窃取走了。

其中,步骤1 和 步骤3 会引入不必要的 deque 操作,甚至更糟的是,允许窃取会损害局部性而不会增加显著的并行性。
任务调度器绕行技术可以直接指向下一个要被执行的任务,而不是生产该任务,从而避免了上述问题。
因为根据“规则1”,上一个任务产生的新任务会称为第一个备选任务。
此外,该技术几乎保证了该新任务被当前线程执行,而不是其他线程。

注意:当前唯一能使用该优化技术的是使用 tbb::task_group

4.3. 指导任务调度器的执行(Guiding Task Scheduler Execution)

Guiding Task Scheduler Execution

默认情况下,任务计划程序会尝试使用所有可用的计算资源。在某些情况下,您可能希望将任务计划程序配置为仅使用其中的一些资源。

注意:指导任务调度程序的执行可能会导致可组合性问题。

TBB 提供 task_arena 接口,通过以下方式指导任务在 arena (竞技场)内被执行:

  • 设置首选计算单元;
  • 限制部分计算单元。

4.4. 工作隔离(Work Isolation)

Work Isolation

work_isolation_eg1.cppview raw
1
2
3
4
5
// The first parallel loop.
oneapi::tbb::parallel_for( 0, N1, []( int i ) {
// The second parallel loop.
oneapi::tbb::parallel_for( 0, N2, []( int j ) { /* Some work */ } );
} );

如果当前线程被 parallel_for “阻塞”(不是真正的阻塞,只能称为 a blocking parallel construct),那么该线程被允许拿取第一个循环的任务来执行。这会导致即使是同一个线程内,也可出现乱序执行的情况。在大多数情况下,这没有什么危害。

但是少数情况可能出现错误,例如一个 thread-local 变量可能会在嵌套并行构造之外意外被更改:

work_isolation_eg2.cppview raw
1
2
3
4
5
6
7
8
9
oneapi::tbb::enumerable_thread_specific<int> ets;
oneapi::tbb::parallel_for( 0, N1, [&ets]( int i ) {
// Set a thread specific value
ets.local() = i;
oneapi::tbb::parallel_for( 0, N2, []( int j ) { /* Some work */ } );
// While executing the above parallel_for, the thread might have run iterations
// of the outer parallel_for, and so might have changed the thread specific value.
assert( ets.local()==i ); // The assertion may fail!
} );

在其它场景下,这种行为可能会导致死锁或其他问题。在这些情况下,需要更有力地保证线程内的执行次序。为此,TBB 提供了一些隔离并行构造的执行的方法,以使其任务不会干扰其他同时运行的任务。

其中一种方法是在单独的 task_arena 中执行内层循环:

work_isolation_eg3.cppview raw
1
2
3
4
5
6
7
8
9
10
11
12
oneapi::tbb::enumerable_thread_specific<int> ets;
oneapi::tbb::task_arena nested;
oneapi::tbb::parallel_for( 0, N1, [&]( int i ) {
// Set a thread specific value
ets.local() = i;
nested.execute( []{
// Run the inner parallel_for in a separate arena to prevent the thread
// from taking tasks of the outer parallel_for.
oneapi::tbb::parallel_for( 0, N2, []( int j ) { /* Some work */ } );
} );
assert( ets.local()==i ); // Valid assertion
} );

然而,使用单独的 arena 进行工作隔离并不总是方便的,并且可能会产生明显的开销。为了解决这些缺点,TBB 提供 this_task_arena::isolate 函数,通过限制调用线程仅处理在函数对象范围内(也称为隔离区域)安排的任务,来隔离地运行一个用户提供的函数对象。

当一个线程进入一个任务等待调用或(等待)在一个隔离区域内的阻塞并行结构时,该线程只能执行在该隔离区域内生成的任务及其由其他线程生成的子任务(换句话说,即使子任务是由其他线程生成的,只要属于当前隔离区域,当前线程也可以执行这些任务)。线程被禁止执行任何外层任务或属于其他隔离区域的任务。

下面的示例展示了 this_task_arena::isolate 的使用,以保证在嵌套的并行结构调用时, thread-local 变量不会被意外修改:

work_isolation_eg4.cppview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include "oneapi/tbb/task_arena.h"
#include "oneapi/tbb/parallel_for.h"
#include "oneapi/tbb/enumerable_thread_specific.h"
#include <cassert>


int main() {
const int N1 = 1000, N2 = 1000;
oneapi::tbb::enumerable_thread_specific<int> ets;
oneapi::tbb::parallel_for( 0, N1, [&ets]( int i ) {
// Set a thread specific value
ets.local() = i;
// Run the second parallel loop in an isolated region to prevent the current thread
// from taking tasks related to the outer parallel loop.
oneapi::tbb::this_task_arena::isolate( []{
oneapi::tbb::parallel_for( 0, N2, []( int j ) { /* Some work */ } );
} );
assert( ets.local()==i ); // Valid assertion
} );
return 0;
}

补充:让我们通过一个简单的例子来说明隔离区域内其他线程如何生成子任务,并且这些子任务可以由当前线程执行。

假设我们有一个隔离区域,其中有两个线程:线程A和线程B。我们在这个隔离区域内生成了一些任务,并且这些任务可能会生成子任务。

work_isolation_eg5.cppview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <tbb/tbb.h>
#include <iostream>

void taskA() {
std::cout << "Task A executed by thread " << tbb::this_task_arena::current_thread_index() << std::endl;
tbb::parallel_invoke(
{
std::cout << "Subtask A1 executed by thread " << tbb::this_task_arena::current_thread_index() << std::endl;
},
{
std::cout << "Subtask A2 executed by thread " << tbb::this_task_arena::current_thread_index() << std::endl;
}
);
}

void taskB() {
std::cout << "Task B executed by thread " << tbb::this_task_arena::current_thread_index() << std::endl;
tbb::parallel_invoke(
{
std::cout << "Subtask B1 executed by thread " << tbb::this_task_arena::current_thread_index() << std::endl;
},
{
std::cout << "Subtask B2 executed by thread " << tbb::this_task_arena::current_thread_index() << std::endl;
}
);
}

int main() {
tbb::task_arena arena;
arena.execute([&] {
tbb::this_task_arena::isolate([&] {
tbb::parallel_invoke(taskA, taskB);
});
});
return 0;
}

在这个例子中:

taskA 和 taskB 是在隔离区域内生成的任务。
taskA 生成了两个子任务 Subtask A1 和 Subtask A2。
taskB 生成了两个子任务 Subtask B1 和 Subtask B2。
假设线程A执行了 taskA,线程B执行了 taskB。在隔离区域内,线程A和线程B可以执行彼此生成的子任务。例如,线程A可以执行 Subtask B1 或 Subtask B2,而线程B可以执行 Subtask A1 或 Subtask A2,只要这些子任务属于同一个隔离区域。

5. 推荐阅读

5.1. 书籍

  1. Intel Building Blocks 编程指南. James Reinders.
  2. Patterns for Parallel Pragramming. Timothy Mattson 等.
  3. 设计模式:Design Patterns of Reusable Object-Oriented Software (Addison Wesley). Gamma, Helm, Johnson 和 Vlissides.

gcc

gcc是一个编译套件,包含c、c++、Fortran语言的编译器。

glibc

glibc是一个library,为C程序提供基础公共功能,包括系统调用、数学函数和其他核心组件。
Linux平台和vscode似乎都依赖glibc,如果擅自将LD_LIBRARY_PATH更改为其他版本的glibc路径,则bash会直接crash。

glibc包含以下bin和lib:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ cd glibc-v2.34/Linux/RHEL7.0-2017-x86_64/bin && ls
catchsegv getconf iconv locale makedb pcprofiledump sotruss tzselect zdump
gencat getent ldd localedef mtrace pldd sprof xtrace

# 进入其他版本的glibc/lib目录执行ls命令会报错,大概原因可能是因为当前路径的glibc的lib和系统的lib冲突。
$ cd ../lib && ls
ls: relocation error: ./libc.so.6: symbol __tunable_get_val, version GLIBC_PRIVATE not defined in file ld-linux-x86-64.so.2 with link time reference

$ cd .. && ls lib
Mcrt1.o libanl.so.1 libm.so libnss_hesiod.so.2
Scrt1.o libc.a libm.so.6 libpcprofile.so
audit libc.so libmcheck.a libpthread.a
crt1.o libc.so.6 libmemusage.so libpthread.so.0
crti.o libc_malloc_debug.so libmvec.a libresolv.a
crtn.o libc_malloc_debug.so.0 libmvec.so libresolv.so
gconv libc_nonshared.a libmvec.so.1 libresolv.so.2
gcrt1.o libcrypt.a libnsl.so.1 librt.a
ld-linux-x86-64.so.2 libcrypt.so libnss_compat.so librt.so.1
libBrokenLocale.a libcrypt.so.1 libnss_compat.so.2 libthread_db.so
libBrokenLocale.so libdl.a libnss_db.so libthread_db.so.1
libBrokenLocale.so.1 libdl.so.2 libnss_db.so.2 libutil.a
libSegFault.so libg.a libnss_dns.so.2 libutil.so.1
libanl.a libm-2.34.a libnss_files.so.2
libanl.so libm.a libnss_hesiod.so

查看glibc的版本:

1
2
# 从上可知,ldd是glibc的核心组件之一
$ ldd --version

寻找libc.so的路径:

1
2
3
4
5
6
7
8
$ locate libc.so
/usr/lib/x86_64-linux-gnu/libc.so
/usr/lib/x86_64-linux-gnu/libc.so.6
$ locate libstdc++.so
/usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30-gdb.py

安装glibc:

Ubuntu平台

1
sudo apt-get install lib6

RedHat平台

1
sudo yum install glibc

检查GNC C++ Library (libstdc++)的版本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ strings /usr/lib/libstdc++.so.* | grep LIBCXX
[sjcvl-zhigaoz ] /lan/cva_rel/vxe_main/24.02.650.d000/tools.lnx86/lib/64bit % strings /usr/lib/libstdc++.so.* | grep LIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
...
GLIBCXX_3.4.19
GLIBCXX_DEBUG_MESSAGE_LENGTH

$ strings /usr/lib/libc.so.* | grep GLIBC
GLIBC_2.0
GLIBC_2.1
GLIBC_2.1.1
...
GLIBC_2.17
GLIBC_PRIVATE

如果你有一个使用了libstdc++的特定的binary或application,可以用下面的命令来检查其版本:

1
$ ldd <your_binary_or_application> | grep libstdc++

使用vscode的“Remote SSH”工具试图连接到Linux时,可能会报错如下:

Warning: Missing GLIBCXX >= 3.4.25! from /usr/lib64/libstdc++.so.6.0.19
Warning: Missing GLIBC >= 2.28! from /usr/lib64/libc-2.17.so
Error: Missing required dependencies. Please refer to our FAQ https://aka.ms/vscode-remote/faq/old-linux for additional information.

这是因为Linux系统上的glibc版本中不包含GLIBCXX_3.4.25及以上的版本。此时需要降级vscode(建议做法)或升级glibc(似乎很难)。

times

  1. bash built-in
1
times
  1. function
1
2
3
#include <sys/times.h>

clock_t times(struct tms *buf);

malloc/free

See this example

1
char ** backtrace_symbols (void *const *buffer, int size) 

The return value of backtrace_symbols is a pointer obtained via the malloc function, and it is the responsibility of the caller to free that pointer. Note that only the return value need be freed, not the individual strings.

Question: Why does it say “only the return value need be freed, not the individual strings”?

Let us observe the defintion of the malloc/free functions first:

1
2
void *malloc( size_t size );
void free( void *ptr );

free takes a void* pointer to deallocate the memory, it doesn’t care what type it is, even if it is a multi-level pointer. It means that malloc has stored the memory size in some place and free will find it beforing deallocate the memory.

Let us return the question. The memory pointer returned by backtrace_symbols is the char** type, it must be a whole block contigunous memory using malloc and might be enforced to be transformed as char** pointer when returing. So when we free the memory block, the Linux kernel find its actual memory size and deallocate it.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
char** strings = (char**)malloc(3 * sizeof(char*) + 3 * 50); // assuming a maximum 50 characters per sentence
char* block = (char*)(strings + 3);
char* s1 = strcpy(block, "The first sentence"); block += strlen(s1) + 1;
char* s2 = strcpy(block, "The second sentence"); block += strlen(s2) + 1;
char* s3 = strcpy(block, "The third sentence");
strings[0] = s1;
strings[1] = s2;
strings[2] = s3;
for(int i = 0; i < 3; ++i) {
printf("%s\n", strings[i]);
}
free(strings); // deallocate all memory at once

return 0;
}

More elegant but less economical code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
char** strings = (char**)malloc(3 * sizeof(char*) + 3 * 50);
char* block = (char*)(strings + 3);
for(int i = 0; i < 3; ++i) {
strings[i] = block + i * 50; // Assuming a maximum of 50 characters per sentence
}
strcpy(strings[0], "The first sentence");
strcpy(strings[1], "The second sentence");
strcpy(strings[2], "The third sentence");

for(int i = 0; i < 3; ++i) {
printf("%s\n", strings[i]);
}

free(strings); // deallocate all memory at once

return 0;
}

Reference

shuf

cut

tr

lp

sort

Options:

-t, --field-separator=SEP
    use SEP instead of non-blank to blank transition

-k, --key=POS1[,POS2]
    start a key at POS1 (origin 1), end it at POS2 (default end of line)

-h, --human-numeric-sort
    compare human readable numbers (e.g., 2K 1G)

-n, --numeric-sort
    compare according to string numerical value

nproc

print the number of processing units avaiable.

od / xxd / hexdump

read the binary file.

Notes: byte order

1
2
3
4
5
$ echo -n "ABCD" | xxd
00000000: 4142 4344 ABCD
$ echo -n "ABCD" | hexdump
0000000 4241 4443
0000004

Reference

comm / diff / tkdiff / cmp

Can be used to compare binary or non-binary files.

comm

compare two sorted files line by line.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ cat file1.txt 
apple
banana
cherry

$ cat file2.txt
banana
cherry
date
erase

$ comm file1.txt file2.txt
apple
banana
cherry
date
erase

The file must be sorted before using the comm command. Otherwise it will complain that:

comm: file 1 is not in sorted order

and cannot work correctly. For example,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ cat file1.txt 
apple
cherry
banana

$ cat file2.txt
banana
cherry
date
erase

$ comm file1.txt file2.txt
apple
banana
cherry
comm: file 1 is not in sorted order
banana
date
erase
comm: input is not in sorted order

diff

Syntax:

diff -u file1 file2

Options:

-e, --ed
    output an ed script

-u, -U NUM, --unified[=NUM]
    output NUM (default 3) lines of unified context
    (that is, print NUM lines before and after the difference line)

tkdiff

Use a GUI to display the differences.

cmp

Prints less information comparing to diff.

Syntax:

cmp file1 file2

ed/vim/sed/awk