|
<!DOCTYPE html>
|
|
<html lang="en">
|
|
<head>
|
|
|
|
<title>Supercharge Your Bash Scripts with Multiprocessing :: Fr1nge's Personal Blog</title>
|
|
|
|
<meta http-equiv="content-type" content="text/html; charset=utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
<meta name="description" content="Bash is a great tool for automating tasks and improving you work flow. However, it is ***SLOW***. Adding multiprocessing to the scripts you write can improve the performance greatly." />
|
|
<meta name="keywords" content="bash, scripting" />
|
|
<meta name="robots" content="noodp" />
|
|
<link rel="canonical" href="http://fr1nge.xyz/posts/supercharge-your-bash-scripts-with-multiprocessing/" />
|
|
|
|
|
|
|
|
|
|
<link rel="stylesheet" href="http://fr1nge.xyz/assets/style.css">
|
|
|
|
<link rel="stylesheet" href="http://fr1nge.xyz/assets/blue.css">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<link rel="apple-touch-icon" href="http://fr1nge.xyz/img/apple-touch-icon-192x192.png">
|
|
|
|
<link rel="shortcut icon" href="http://fr1nge.xyz/img/favicon/blue.png">
|
|
|
|
|
|
|
|
<meta name="twitter:card" content="summary" />
|
|
|
|
|
|
<meta name="twitter:site" content="yigitcolakoglu.com" />
|
|
|
|
<meta name="twitter:creator" content="theFr1nge" />
|
|
|
|
|
|
|
|
<meta property="og:locale" content="en" />
|
|
<meta property="og:type" content="article" />
|
|
<meta property="og:title" content="Supercharge Your Bash Scripts with Multiprocessing">
|
|
<meta property="og:description" content="Bash is a great tool for automating tasks and improving you work flow. However, it is ***SLOW***. Adding multiprocessing to the scripts you write can improve the performance greatly." />
|
|
<meta property="og:url" content="http://fr1nge.xyz/posts/supercharge-your-bash-scripts-with-multiprocessing/" />
|
|
<meta property="og:site_name" content="Fr1nge's Personal Blog" />
|
|
|
|
<meta property="og:image" content="http://fr1nge.xyz/">
|
|
|
|
<meta property="og:image:width" content="2048">
|
|
<meta property="og:image:height" content="1024">
|
|
|
|
|
|
<meta property="article:published_time" content="2021-05-05 17:08:12 +0300 +03" />
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</head>
|
|
<body class="blue">
|
|
|
|
|
|
<div class="container center headings--one-size">
|
|
|
|
<header class="header">
|
|
<div class="header__inner">
|
|
<div class="header__logo">
|
|
<a href="/">
|
|
<div class="logo">
|
|
fr1nge.xyz
|
|
</div>
|
|
</a>
|
|
|
|
</div>
|
|
|
|
<div class="menu-trigger">menu</div>
|
|
|
|
</div>
|
|
|
|
<nav class="menu">
|
|
<ul class="menu__inner menu__inner--desktop">
|
|
|
|
|
|
|
|
<li><a href="/about">About</a></li>
|
|
|
|
|
|
|
|
<li><a href="/awards">Awards & Certificates</a></li>
|
|
|
|
|
|
|
|
<li><a href="/projects">Projects</a></li>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</ul>
|
|
|
|
<ul class="menu__inner menu__inner--mobile">
|
|
|
|
|
|
<li><a href="/about">About</a></li>
|
|
|
|
|
|
|
|
<li><a href="/awards">Awards & Certificates</a></li>
|
|
|
|
|
|
|
|
<li><a href="/projects">Projects</a></li>
|
|
|
|
|
|
|
|
</ul>
|
|
</nav>
|
|
|
|
|
|
</header>
|
|
|
|
|
|
<div class="content">
|
|
|
|
<div class="post">
|
|
<h1 class="post-title">
|
|
<a href="http://fr1nge.xyz/posts/supercharge-your-bash-scripts-with-multiprocessing/">Supercharge Your Bash Scripts with Multiprocessing</a></h1>
|
|
<div class="post-meta">
|
|
|
|
<span class="post-date">
|
|
2021-05-05 [Updated: 2021-05-05]
|
|
</span>
|
|
|
|
|
|
<span class="post-author">:: Yigit Colakoglu</span>
|
|
|
|
</div>
|
|
|
|
|
|
<span class="post-tags">
|
|
|
|
#<a href="http://fr1nge.xyz/tags/bash/">bash</a>
|
|
|
|
#<a href="http://fr1nge.xyz/tags/scripting/">scripting</a>
|
|
|
|
#<a href="http://fr1nge.xyz/tags/programming/">programming</a>
|
|
|
|
</span>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<div class="post-content"><div>
|
|
<p>Bash is a great tool for automating tasks and improving you work flow. However,
|
|
it is <em><strong>SLOW</strong></em>. Adding multiprocessing to the scripts you write can improve
|
|
the performance greatly.</p>
|
|
<h2 id="what-is-multiprocessing">What is multiprocessing?<a href="#what-is-multiprocessing" class="hanchor" ariaLabel="Anchor">⌗</a> </h2>
|
|
<p>In the simplest terms, multiprocessing is the principle of splitting the
|
|
computations or jobs that a script has to do and running them on different
|
|
processes. In even simpler terms however, multiprocessing is the computer
|
|
science equivalent of hiring more than one
|
|
worker when you are constructing a building.</p>
|
|
<h3 id="introducing-">Introducing “&”<a href="#introducing-" class="hanchor" ariaLabel="Anchor">⌗</a> </h3>
|
|
<p>While implementing multiprocessing the sign <code>&</code> is going to be our greatest
|
|
friend. It is an essential sign if you are writing bash scripts and a very
|
|
useful tool in general when you are in the terminal. What <code>&</code> does is that it
|
|
makes the command you added it to the end of run in the background and allows
|
|
the rest of the script to continue running as the command runs in the
|
|
background. One thing to keep in mind is that since it creates a fork of the
|
|
process you ran the command on, if you change a variable that the command in the
|
|
background uses while it runs, it will not be affected. Here is a simple
|
|
example:</p>
|
|
|
|
|
|
|
|
<div class="collapsable-code">
|
|
<input id="1" type="checkbox" />
|
|
<label for="1">
|
|
<span class="collapsable-code__language">bash</span>
|
|
|
|
<span class="collapsable-code__toggle" data-label-expand="Show" data-label-collapse="Hide"></span>
|
|
</label>
|
|
<pre class="language-bash" ><code>
|
|
foo="yeet"
|
|
|
|
function run_in_background(){
|
|
sleep 0.5
|
|
echo "The value of foo in the function run_in_background is $foo"
|
|
}
|
|
|
|
run_in_background & # Spawn the function run_in_background in the background
|
|
foo="YEET"
|
|
echo "The value of foo changed to $foo."
|
|
wait # wait for the background process to finish
|
|
</code></pre>
|
|
</div>
|
|
|
|
|
|
<p>This should output:</p>
|
|
<pre><code>The value of foo changed to YEET.
|
|
The value of foo in here is yeet
|
|
</code></pre><p>As you can see, the value of <code>foo</code> did not change in the background process even though
|
|
we changed it in the main function.</p>
|
|
<h2 id="baby-steps">Baby steps…<a href="#baby-steps" class="hanchor" ariaLabel="Anchor">⌗</a> </h2>
|
|
<p>Just like anything related to computer science, there is more than one way of
|
|
achieving our goal. We are going to take the easier, less intimidating but less
|
|
efficient route first before moving on to the big boy implementation. Let’s open up vim and get to scripting!
|
|
First of all, let’s write a very simple function that allows us to easily test
|
|
our implementation:</p>
|
|
|
|
|
|
|
|
<div class="collapsable-code">
|
|
<input id="1" type="checkbox" />
|
|
<label for="1">
|
|
<span class="collapsable-code__language">bash</span>
|
|
|
|
<span class="collapsable-code__toggle" data-label-expand="Show" data-label-collapse="Hide"></span>
|
|
</label>
|
|
<pre class="language-bash" ><code>
|
|
function tester(){
|
|
# A function that takes an int as a parameter and sleeps
|
|
echo "$1"
|
|
sleep "$1"
|
|
echo "ENDED $1"
|
|
}
|
|
</code></pre>
|
|
</div>
|
|
|
|
|
|
<p>Now that we have something to run in our processes, we now need to spawn several
|
|
of them in controlled manner. Controlled being the keyword here. That’s because
|
|
each system has a maximum number of processes that can be spawned (You can find
|
|
that out with the command <code>ulimit -u</code>). In our case, we want to limit the
|
|
processes being ran to the variable <code>num_processes</code>. Here is the implementation:</p>
|
|
|
|
|
|
|
|
<div class="collapsable-code">
|
|
<input id="1" type="checkbox" />
|
|
<label for="1">
|
|
<span class="collapsable-code__language">bash</span>
|
|
|
|
<span class="collapsable-code__toggle" data-label-expand="Show" data-label-collapse="Hide"></span>
|
|
</label>
|
|
<pre class="language-bash" ><code>
|
|
num_processes=$1
|
|
pcount=0
|
|
for i in {1..10}; do
|
|
((pcount=pcount%num_processes));
|
|
((pcount++==0)) && wait
|
|
tester $i &
|
|
done
|
|
</code></pre>
|
|
</div>
|
|
|
|
|
|
<p>What this loop does is that it takes the number of processes you would like to
|
|
spawn as an argument and runs <code>tester</code> in that many processes. Go ahead and test it out!
|
|
You might notice however that the processes are run int batches. And the size of
|
|
batches is the <code>num_processes</code> variable. The reason this happens is because
|
|
every time we spawn <code>num_processes</code> processes, we <code>wait</code> for all the processes
|
|
to end. This implementation is not a problem in itself, there are many cases
|
|
where you can use this implementation and it works perfectly fine. However, if
|
|
you don’t want this to happen, we have to dump this naive approach all together
|
|
and improve our tool belt.</p>
|
|
<h2 id="real-chads-use-job-pools">Real Chads use Job Pools<a href="#real-chads-use-job-pools" class="hanchor" ariaLabel="Anchor">⌗</a> </h2>
|
|
<p>The solution to the bottleneck that was introduced in our previous approach lies
|
|
in using job pools. Job pools are where jobs created by a main process get sent
|
|
and wait to get executed. This approach solves our problems because instead of
|
|
spawning a new process for every copy and waiting for all the processes to
|
|
finish we instead only create a set number of processes(workers) which
|
|
continuously pick up jobs from the job pool not waiting for any other process to finish.
|
|
Here is the implementation that uses job pools. Brace yourselves, because it is
|
|
kind of complicated.</p>
|
|
|
|
|
|
|
|
<div class="collapsable-code">
|
|
<input id="1" type="checkbox" />
|
|
<label for="1">
|
|
<span class="collapsable-code__language">bash</span>
|
|
|
|
<span class="collapsable-code__toggle" data-label-expand="Show" data-label-collapse="Hide"></span>
|
|
</label>
|
|
<pre class="language-bash" ><code>
|
|
job_pool_end_of_jobs="NO_JOB_LEFT"
|
|
job_pool_job_queue=/tmp/job_pool_job_queue_$$
|
|
job_pool_progress=/tmp/job_pool_progress_$$
|
|
job_pool_pool_size=-1
|
|
job_pool_nerrors=0
|
|
|
|
function job_pool_cleanup()
|
|
{
|
|
rm -f ${job_pool_job_queue}
|
|
rm -f ${job_pool_progress}
|
|
}
|
|
|
|
function job_pool_exit_handler()
|
|
{
|
|
job_pool_stop_workers
|
|
job_pool_cleanup
|
|
}
|
|
|
|
function job_pool_worker()
|
|
{
|
|
local id=$1
|
|
local job_queue=$2
|
|
local cmd=
|
|
local args=
|
|
|
|
exec 7<> ${job_queue}
|
|
while [[ "${cmd}" != "${job_pool_end_of_jobs}" && -e "${job_queue}" ]]; do
|
|
flock --exclusive 7
|
|
IFS=$'\v'
|
|
read cmd args <${job_queue}
|
|
set -- ${args}
|
|
unset IFS
|
|
flock --unlock 7
|
|
if [[ "${cmd}" == "${job_pool_end_of_jobs}" ]]; then
|
|
echo "${cmd}" >&7
|
|
else
|
|
{ ${cmd} "$@" ; }
|
|
fi
|
|
|
|
done
|
|
exec 7>&-
|
|
}
|
|
|
|
function job_pool_stop_workers()
|
|
{
|
|
echo ${job_pool_end_of_jobs} >> ${job_pool_job_queue}
|
|
wait
|
|
}
|
|
|
|
function job_pool_start_workers()
|
|
{
|
|
local job_queue=$1
|
|
for ((i=0; i<${job_pool_pool_size}; i++)); do
|
|
job_pool_worker ${i} ${job_queue} &
|
|
done
|
|
}
|
|
|
|
function job_pool_init()
|
|
{
|
|
local pool_size=$1
|
|
job_pool_pool_size=${pool_size:=1}
|
|
rm -rf ${job_pool_job_queue}
|
|
rm -rf ${job_pool_progress}
|
|
touch ${job_pool_progress}
|
|
mkfifo ${job_pool_job_queue}
|
|
echo 0 >${job_pool_progress} &
|
|
job_pool_start_workers ${job_pool_job_queue}
|
|
}
|
|
|
|
function job_pool_shutdown()
|
|
{
|
|
job_pool_stop_workers
|
|
job_pool_cleanup
|
|
}
|
|
|
|
function job_pool_run()
|
|
{
|
|
if [[ "${job_pool_pool_size}" == "-1" ]]; then
|
|
job_pool_init
|
|
fi
|
|
printf "%s\v" "$@" >> ${job_pool_job_queue}
|
|
echo >> ${job_pool_job_queue}
|
|
}
|
|
|
|
function job_pool_wait()
|
|
{
|
|
job_pool_stop_workers
|
|
job_pool_start_workers ${job_pool_job_queue}
|
|
}
|
|
</code></pre>
|
|
</div>
|
|
|
|
|
|
<p>Ok… But that the actual fuck is going in here???</p>
|
|
<h3 id="fifo-and-flock">fifo and flock<a href="#fifo-and-flock" class="hanchor" ariaLabel="Anchor">⌗</a> </h3>
|
|
<p>In order to understand what this code is doing, you first need to understand two
|
|
key commands that we are using, <code>fifo</code> and <code>flock</code>. Despite their complicated
|
|
names, they are actually quite simple. Let’s check their man pages to figure out
|
|
their purposes, shall we?</p>
|
|
<h4 id="man-fifo">man fifo<a href="#man-fifo" class="hanchor" ariaLabel="Anchor">⌗</a> </h4>
|
|
<p>fifo’s man page tells us that:</p>
|
|
<pre><code>NAME
|
|
fifo - first-in first-out special file, named pipe
|
|
|
|
DESCRIPTION
|
|
A FIFO special file (a named pipe) is similar to a pipe, except that
|
|
it is accessed as part of the filesystem. It can be opened by multiple
|
|
processes for reading or writing. When processes are exchanging data
|
|
via the FIFO, the kernel passes all data internally without writing it
|
|
to the filesystem. Thus, the FIFO special file has no contents on the
|
|
filesystem; the filesystem entry merely serves as a reference point so
|
|
that processes can access the pipe using a name in the filesystem.
|
|
</code></pre><p>So put in <strong>very</strong> simple terms, a fifo is a named pipe that can allows
|
|
communication between processes. Using a fifo allows us to loop through the jobs
|
|
in the pool without having to delete them manually, because once we read them
|
|
with <code>read cmd args < ${job_queue}</code>, the job is out of the pipe and the next
|
|
read outputs the next job in the pool. However the fact that we have multiple
|
|
processes introduces one caveat, what if two processes access the pipe at the
|
|
same time? They would run the same command and we don’t want that. So we resort
|
|
to using <code>flock</code>.</p>
|
|
<h4 id="man-flock">man flock<a href="#man-flock" class="hanchor" ariaLabel="Anchor">⌗</a> </h4>
|
|
<p>flock’s man page defines it as:</p>
|
|
<pre><code> SYNOPSIS
|
|
flock [options] file|directory command [arguments]
|
|
flock [options] file|directory -c command
|
|
flock [options] number
|
|
|
|
DESCRIPTION
|
|
This utility manages flock(2) locks from within shell scripts or from
|
|
the command line.
|
|
|
|
The first and second of the above forms wrap the lock around the
|
|
execution of a command, in a manner similar to su(1) or newgrp(1).
|
|
They lock a specified file or directory, which is created (assuming
|
|
appropriate permissions) if it does not already exist. By default, if
|
|
the lock cannot be immediately acquired, flock waits until the lock is
|
|
available.
|
|
|
|
The third form uses an open file by its file descriptor number. See
|
|
the examples below for how that can be used.
|
|
</code></pre><p>Cool, translated to modern English that us regular folks use, <code>flock</code> is a thin
|
|
wrapper around the C standard function <code>flock</code> (see <code>man 2 flock</code> if you are
|
|
interested). It is used to manage locks and has several forms. The one we are
|
|
interested in is the third one. According to the man page, it uses and open file
|
|
by its <strong>file descriptor number</strong>. Aha! so that was the purpose of the <code>exec 7<> ${job_queue}</code> calls in the <code>job_pool_worker</code> function. It would essentially
|
|
assign the file descriptor 7 to the fifo <code>job_queue</code> and afterwards lock it with
|
|
<code>flock --exclusive 7</code>. Cool. This way only one process at a time can read from
|
|
the fifo <code>job_queue</code></p>
|
|
<h2 id="great-but-how-do-i-use-this">Great! But how do I use this?<a href="#great-but-how-do-i-use-this" class="hanchor" ariaLabel="Anchor">⌗</a> </h2>
|
|
<p>It depends on your preference, you can either save this in a file(e.g.
|
|
job_pool.sh) and source it in your bash script. Or you can simply paste it
|
|
inside an existing bash script. Whatever tickles your fancy. I have also
|
|
provided an example that replicates our first implementation. Just paste the
|
|
below code under our “chad” job pool script.</p>
|
|
|
|
|
|
|
|
<div class="collapsable-code">
|
|
<input id="1" type="checkbox" />
|
|
<label for="1">
|
|
<span class="collapsable-code__language">bash</span>
|
|
|
|
<span class="collapsable-code__toggle" data-label-expand="Show" data-label-collapse="Hide"></span>
|
|
</label>
|
|
<pre class="language-bash" ><code>
|
|
function tester(){
|
|
# A function that takes an int as a parameter and sleeps
|
|
echo "$1"
|
|
sleep "$1"
|
|
echo "ENDED $1"
|
|
}
|
|
|
|
num_workers=$1
|
|
job_pool_init $num_workers
|
|
pcount=0
|
|
for i in {1..10}; do
|
|
job_pool_run tester "$i"
|
|
done
|
|
|
|
job_pool_wait
|
|
job_pool_shutdown
|
|
</code></pre>
|
|
</div>
|
|
|
|
|
|
<p>Hopefully this article was(or will be) helpful to you. From now on, you don’t
|
|
ever have to write single threaded bash scripts like normies :)</p>
|
|
|
|
</div></div>
|
|
|
|
|
|
|
|
|
|
|
|
<div id="disqus_thread"></div>
|
|
<script type="application/javascript">
|
|
var disqus_config = function () {
|
|
|
|
|
|
|
|
};
|
|
(function() {
|
|
if (["localhost", "127.0.0.1"].indexOf(window.location.hostname) != -1) {
|
|
document.getElementById('disqus_thread').innerHTML = 'Disqus comments not available by default when the website is previewed locally.';
|
|
return;
|
|
}
|
|
var d = document, s = d.createElement('script'); s.async = true;
|
|
s.src = '//' + "fr1nge-xyz" + '.disqus.com/embed.js';
|
|
s.setAttribute('data-timestamp', +new Date());
|
|
(d.head || d.body).appendChild(s);
|
|
})();
|
|
</script>
|
|
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
|
|
<a href="https://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
|
|
|
|
</div>
|
|
|
|
</div>
|
|
|
|
|
|
<footer class="footer">
|
|
<div class="footer__inner">
|
|
|
|
<div class="copyright copyright--user">
|
|
<span>Yigit Colakoglu</span>
|
|
|
|
<span>:: Theme made by <a href="https://twitter.com/panr">panr</a></span>
|
|
</div>
|
|
</div>
|
|
</footer>
|
|
|
|
<script src="http://fr1nge.xyz/assets/main.js"></script>
|
|
<script src="http://fr1nge.xyz/assets/prism.js"></script>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</div>
|
|
|
|
</body>
|
|
</html>
|