C++多线程怎么优化效率|极客笔记

C++多线程怎么优化效率

在软件开发中，使用多线程可以充分利用多核处理器的优势，加快程序的运行速度。然而，多线程编程并不容易，需要开发人员考虑线程安全性、线程同步、性能等问题。本文将详细讨论如何优化C++多线程的效率，以便开发人员能够更好地利用多线程提升程序的性能。

1. 使用线程池

在使用多线程时，频繁创建和销毁线程会导致性能损耗，因此可以使用线程池来重复使用线程。线程池可以避免线程的频繁创建和销毁，提高程序的运行效率。

#include <iostream>
#include <thread>
#include <vector>
#include <functional>
#include <future>
#include <queue>
#include <mutex>
#include <condition_variable>

class ThreadPool {
public:
    ThreadPool(size_t numThreads) : stop(false) {
        for (size_t i = 0; i < numThreads; ++i) {
            workers.emplace_back([this] {
                for (;;) {
                    std::function<void()> task;
                    {
                        std::unique_lock<std::mutex> lock(queueMutex);
                        condition.wait(lock, [this] { return stop || !tasks.empty(); });
                        if (stop && tasks.empty()) return;
                        task = std::move(tasks.front());
                        tasks.pop();
                    }
                    task();
                }
            });
        }
    }

    ~ThreadPool() {
        {
            std::unique_lock<std::mutex> lock(queueMutex);
            stop = true;
        }
        condition.notify_all();
        for (std::thread& worker : workers) {
            worker.join();
        }
    }

    template<class F, class... Args>
    auto enqueue(F&& f, Args&&... args) -> std::future<typename std::result_of<F(Args...)>::type> {
        using return_type = typename std::result_of<F(Args...)>::type;
        auto task = std::make_shared<std::packaged_task<return_type()>>(std::bind(std::forward<F>(f), std::forward<Args>(args)...));
        std::future<return_type> res = task->get_future();
        {
            std::unique_lock<std::mutex> lock(queueMutex);
            if (stop) {
                throw std::runtime_error("enqueue on stopped ThreadPool");
            }
            tasks.emplace([task]() { (*task)(); });
        }
        condition.notify_one();
        return res;
    }

private:
    std::vector<std::thread> workers;
    std::queue<std::function<void()>> tasks;

    std::mutex queueMutex;
    std::condition_variable condition;
    bool stop;
};

int main() {
    ThreadPool pool(4);

    std::vector<std::future<int>> results;
    for (int i = 0; i < 8; ++i) {
        results.emplace_back(pool.enqueue([i] {
            std::this_thread::sleep_for(std::chrono::seconds(1));
            return i * i;
        }));
    }

    for (auto&& result : results) {
        std::cout << result.get() << ' ';
    }
    std::cout << std::endl;

    return 0;
}

运行结果：

0 1 4 9 16 25 36 49

2. 减少锁的竞争

在多线程编程中，锁的竞争会降低程序的运行效率，因此应尽量减少锁的使用。可以通过以下方式来减少锁的竞争：

减少锁的粒度：尽量将锁的粒度缩小到最小的范围，以减少锁的持有时间。
使用无锁数据结构：例如std::atomic类型和std::atomic_flag类型可以在不需要加锁的情况下进行原子操作。

3. 避免频繁的内存分配和释放

频繁的内存分配和释放会导致内存碎片化，降低程序的运行效率。可以通过以下方式来避免频繁的内存分配和释放：

使用对象池：事先分配一定数量的对象，在需要时从对象池中获取对象，使用完后放回对象池，避免频繁的内存分配和释放。
使用内存池：自行实现内存池，减少系统调用，提高内存分配的效率。

4. 使用无锁队列

在多线程编程中，常常需要使用队列来存储任务或数据，而使用锁来实现队列的线程安全会降低程序的运行效率。可以使用无锁队列来提高程序的性能。

#include <iostream>
#include <queue>
#include <atomic>
#include <thread>
#include <mutex>

template <typename T>
class LockFreeQueue {
public:
    LockFreeQueue() {
        head = new Node;
        tail = head;
    }

    ~LockFreeQueue() {
        while (head != nullptr) {
            Node* tmp = head;
            head = head->next;
            delete tmp;
        }
    }

    void push(const T& value) {
        Node* newNode = new Node(value);
        Node* oldTail = tail.load();
        Node* oldNext = oldTail->next;

        if (oldTail == tail.load()) {
            if (oldNext == nullptr) {
                if (oldTail->next.compare_exchange_weak(oldNext, newNode)) {
                    tail.compare_exchange_weak(oldTail, newNode);
                }
            } else {
                tail.compare_exchange_weak(oldTail, oldNext);
            }
        }

    }

    bool pop(T& value) {
        Node* oldHead = head;
        Node* oldTail = tail.load();
        Node* oldNext = oldHead->next;

        if (oldHead == head) {
            if (oldHead == oldTail) {
                if (oldNext == nullptr) {
                    return false;
                }
                tail.compare_exchange_weak(oldTail, oldNext);
            } else {
                value = oldNext->data;
                if (head.compare_exchange_weak(oldHead, oldNext)) {
                    delete oldHead;
                    return true;
                }
            }
        }

        return false;
    }

private:
    struct Node {
        T data;
        Node* next;

        Node() : next(nullptr) {}
        Node(const T& value) : data(value), next(nullptr) {}
    };

    std::atomic<Node*> head;
    std::atomic<Node*> tail;
};

int main() {
    LockFreeQueue<int> queue;

    std::thread producer([&queue] {
        for (int i = 0; i < 10; ++i) {
            queue.push(i);
        }
    });

    std::thread consumer([&queue] {
        int value = 0;
        while (value != 9) {
            if (queue.pop(value)) {
                std::cout << "Consumer: " << value << std::endl;
            }
        }
    });

    producer.join();
    consumer.join();

    return 0;
}

5. 使用原子操作

原子操作是指不会被中断的操作，可以保证线程安全。C++11提供了std::atomic模板，可以实现原子操作的数据类型。可以使用原子操作来避免锁的使用，提高程序的性能。

#include <iostream>
#include <thread>
#include <atomic>

std::atomic<int> counter(0);

void increment() {
    for (int i = 0; i < 1000000; ++i) {
        counter++;
    }
}

void decrement() {
    for (int i = 0; i < 1000000; ++i) {
        counter--;
    }
}

int main() {
    std::thread t1(increment);
    std::thread t2(decrement);

    t1.join();
    t2.join();

    std::cout << "Counter: " << counter << std::endl;

    return 0;
}

运行结果：

Counter: 0

在上面的示例中，我们使用了std::atomic模板来定义了一个原子变量counter，两个线程分别对counter进行自增和自减操作。由于counter是原子变量，不需要显式加锁，可以保证数据的一致性。

6. 使用内存屏障

在多线程编程中，为了保证数据的一致性，有时需要使用内存屏障来阻止指令重排。C++11中提供了std::atomic_thread_fence函数来实现内存屏障。

#include <iostream>
#include <atomic>

std::atomic<int> x, y;
std::atomic<bool> flag;

void write_x_then_y() {
    x.store(1, std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag.store(true, std::memory_order_relaxed);
}

void read_y_then_x() {
    while (!flag.load(std::memory_order_relaxed));
    std::atomic_thread_fence(std::memory_order_acquire);
    int read_x = x.load(std::memory_order_relaxed);
    std::cout << "Read x: " << read_x << std::endl;
}

int main() {
    flag.store(false, std::memory_order_relaxed);
    std::thread t1(write_x_then_y);
    std::thread t2(read_y_then_x);

    t1.join();
    t2.join();

    return 0;
}

运行结果：