|
|
- <!DOCTYPE html>
- <html lang="en">
- <head>
-
- <title>Supercharge Your Bash Scripts with Multiprocessing :: Fr1nge's Personal Blog</title>
-
- <meta http-equiv="content-type" content="text/html; charset=utf-8">
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
- <meta name="description" content="Bash is a great tool for automating tasks and improving you work flow. However, it is ***SLOW***. Adding multiprocessing to the scripts you write can improve the performance greatly." />
- <meta name="keywords" content="bash, scripting" />
- <meta name="robots" content="noodp" />
- <link rel="canonical" href="http://fr1nge.xyz/posts/supercharge-your-bash-scripts-with-multiprocessing/" />
-
-
-
-
- <link rel="stylesheet" href="http://fr1nge.xyz/assets/style.css">
-
- <link rel="stylesheet" href="http://fr1nge.xyz/assets/blue.css">
-
-
-
-
-
-
- <link rel="apple-touch-icon" href="http://fr1nge.xyz/img/apple-touch-icon-192x192.png">
-
- <link rel="shortcut icon" href="http://fr1nge.xyz/img/favicon/blue.png">
-
-
-
- <meta name="twitter:card" content="summary" />
-
-
- <meta name="twitter:site" content="yigitcolakoglu.com" />
-
- <meta name="twitter:creator" content="theFr1nge" />
-
-
-
- <meta property="og:locale" content="en" />
- <meta property="og:type" content="article" />
- <meta property="og:title" content="Supercharge Your Bash Scripts with Multiprocessing">
- <meta property="og:description" content="Bash is a great tool for automating tasks and improving you work flow. However, it is ***SLOW***. Adding multiprocessing to the scripts you write can improve the performance greatly." />
- <meta property="og:url" content="http://fr1nge.xyz/posts/supercharge-your-bash-scripts-with-multiprocessing/" />
- <meta property="og:site_name" content="Fr1nge's Personal Blog" />
-
- <meta property="og:image" content="http://fr1nge.xyz/">
-
- <meta property="og:image:width" content="2048">
- <meta property="og:image:height" content="1024">
-
-
- <meta property="article:published_time" content="2021-05-05 17:08:12 +0300 +03" />
-
-
-
-
-
-
-
-
-
-
-
-
- </head>
- <body class="blue">
-
-
- <div class="container center headings--one-size">
-
- <header class="header">
- <div class="header__inner">
- <div class="header__logo">
- <a href="/">
- <div class="logo">
- fr1nge.xyz
- </div>
- </a>
-
- </div>
-
- <div class="menu-trigger">menu</div>
-
- </div>
-
- <nav class="menu">
- <ul class="menu__inner menu__inner--desktop">
-
-
-
- <li><a href="/about">About</a></li>
-
-
-
- <li><a href="/awards">Awards & Certificates</a></li>
-
-
-
- <li><a href="/projects">Projects</a></li>
-
-
-
-
-
-
- </ul>
-
- <ul class="menu__inner menu__inner--mobile">
-
-
- <li><a href="/about">About</a></li>
-
-
-
- <li><a href="/awards">Awards & Certificates</a></li>
-
-
-
- <li><a href="/projects">Projects</a></li>
-
-
-
- </ul>
- </nav>
-
-
- </header>
-
-
- <div class="content">
-
- <div class="post">
- <h1 class="post-title">
- <a href="http://fr1nge.xyz/posts/supercharge-your-bash-scripts-with-multiprocessing/">Supercharge Your Bash Scripts with Multiprocessing</a></h1>
- <div class="post-meta">
-
- <span class="post-date">
- 2021-05-05 [Updated: 2021-05-05]
- </span>
-
-
- <span class="post-author">:: Yigit Colakoglu</span>
-
- </div>
-
-
- <span class="post-tags">
-
- #<a href="http://fr1nge.xyz/tags/bash/">bash</a>
-
- #<a href="http://fr1nge.xyz/tags/scripting/">scripting</a>
-
- #<a href="http://fr1nge.xyz/tags/programming/">programming</a>
-
- </span>
-
-
-
-
-
-
- <div class="post-content"><div>
- <p>Bash is a great tool for automating tasks and improving you work flow. However,
- it is <em><strong>SLOW</strong></em>. Adding multiprocessing to the scripts you write can improve
- the performance greatly.</p>
- <h2 id="what-is-multiprocessing">What is multiprocessing?<a href="#what-is-multiprocessing" class="hanchor" ariaLabel="Anchor">⌗</a> </h2>
- <p>In the simplest terms, multiprocessing is the principle of splitting the
- computations or jobs that a script has to do and running them on different
- processes. In even simpler terms however, multiprocessing is the computer
- science equivalent of hiring more than one
- worker when you are constructing a building.</p>
- <h3 id="introducing-">Introducing “&”<a href="#introducing-" class="hanchor" ariaLabel="Anchor">⌗</a> </h3>
- <p>While implementing multiprocessing the sign <code>&</code> is going to be our greatest
- friend. It is an essential sign if you are writing bash scripts and a very
- useful tool in general when you are in the terminal. What <code>&</code> does is that it
- makes the command you added it to the end of run in the background and allows
- the rest of the script to continue running as the command runs in the
- background. One thing to keep in mind is that since it creates a fork of the
- process you ran the command on, if you change a variable that the command in the
- background uses while it runs, it will not be affected. Here is a simple
- example:</p>
-
-
-
- <div class="collapsable-code">
- <input id="1" type="checkbox" />
- <label for="1">
- <span class="collapsable-code__language">bash</span>
-
- <span class="collapsable-code__toggle" data-label-expand="Show" data-label-collapse="Hide"></span>
- </label>
- <pre class="language-bash" ><code>
- foo="yeet"
-
- function run_in_background(){
- sleep 0.5
- echo "The value of foo in the function run_in_background is $foo"
- }
-
- run_in_background & # Spawn the function run_in_background in the background
- foo="YEET"
- echo "The value of foo changed to $foo."
- wait # wait for the background process to finish
- </code></pre>
- </div>
-
-
- <p>This should output:</p>
- <pre><code>The value of foo changed to YEET.
- The value of foo in here is yeet
- </code></pre><p>As you can see, the value of <code>foo</code> did not change in the background process even though
- we changed it in the main function.</p>
- <h2 id="baby-steps">Baby steps…<a href="#baby-steps" class="hanchor" ariaLabel="Anchor">⌗</a> </h2>
- <p>Just like anything related to computer science, there is more than one way of
- achieving our goal. We are going to take the easier, less intimidating but less
- efficient route first before moving on to the big boy implementation. Let’s open up vim and get to scripting!
- First of all, let’s write a very simple function that allows us to easily test
- our implementation:</p>
-
-
-
- <div class="collapsable-code">
- <input id="1" type="checkbox" />
- <label for="1">
- <span class="collapsable-code__language">bash</span>
-
- <span class="collapsable-code__toggle" data-label-expand="Show" data-label-collapse="Hide"></span>
- </label>
- <pre class="language-bash" ><code>
- function tester(){
- # A function that takes an int as a parameter and sleeps
- echo "$1"
- sleep "$1"
- echo "ENDED $1"
- }
- </code></pre>
- </div>
-
-
- <p>Now that we have something to run in our processes, we now need to spawn several
- of them in controlled manner. Controlled being the keyword here. That’s because
- each system has a maximum number of processes that can be spawned (You can find
- that out with the command <code>ulimit -u</code>). In our case, we want to limit the
- processes being ran to the variable <code>num_processes</code>. Here is the implementation:</p>
-
-
-
- <div class="collapsable-code">
- <input id="1" type="checkbox" />
- <label for="1">
- <span class="collapsable-code__language">bash</span>
-
- <span class="collapsable-code__toggle" data-label-expand="Show" data-label-collapse="Hide"></span>
- </label>
- <pre class="language-bash" ><code>
- num_processes=$1
- pcount=0
- for i in {1..10}; do
- ((pcount=pcount%num_processes));
- ((pcount++==0)) && wait
- tester $i &
- done
- </code></pre>
- </div>
-
-
- <p>What this loop does is that it takes the number of processes you would like to
- spawn as an argument and runs <code>tester</code> in that many processes. Go ahead and test it out!
- You might notice however that the processes are run int batches. And the size of
- batches is the <code>num_processes</code> variable. The reason this happens is because
- every time we spawn <code>num_processes</code> processes, we <code>wait</code> for all the processes
- to end. This implementation is not a problem in itself, there are many cases
- where you can use this implementation and it works perfectly fine. However, if
- you don’t want this to happen, we have to dump this naive approach all together
- and improve our tool belt.</p>
- <h2 id="real-chads-use-job-pools">Real Chads use Job Pools<a href="#real-chads-use-job-pools" class="hanchor" ariaLabel="Anchor">⌗</a> </h2>
- <p>The solution to the bottleneck that was introduced in our previous approach lies
- in using job pools. Job pools are where jobs created by a main process get sent
- and wait to get executed. This approach solves our problems because instead of
- spawning a new process for every copy and waiting for all the processes to
- finish we instead only create a set number of processes(workers) which
- continuously pick up jobs from the job pool not waiting for any other process to finish.
- Here is the implementation that uses job pools. Brace yourselves, because it is
- kind of complicated.</p>
-
-
-
- <div class="collapsable-code">
- <input id="1" type="checkbox" />
- <label for="1">
- <span class="collapsable-code__language">bash</span>
-
- <span class="collapsable-code__toggle" data-label-expand="Show" data-label-collapse="Hide"></span>
- </label>
- <pre class="language-bash" ><code>
- job_pool_end_of_jobs="NO_JOB_LEFT"
- job_pool_job_queue=/tmp/job_pool_job_queue_$$
- job_pool_progress=/tmp/job_pool_progress_$$
- job_pool_pool_size=-1
- job_pool_nerrors=0
-
- function job_pool_cleanup()
- {
- rm -f ${job_pool_job_queue}
- rm -f ${job_pool_progress}
- }
-
- function job_pool_exit_handler()
- {
- job_pool_stop_workers
- job_pool_cleanup
- }
-
- function job_pool_worker()
- {
- local id=$1
- local job_queue=$2
- local cmd=
- local args=
-
- exec 7<> ${job_queue}
- while [[ "${cmd}" != "${job_pool_end_of_jobs}" && -e "${job_queue}" ]]; do
- flock --exclusive 7
- IFS=$'\v'
- read cmd args <${job_queue}
- set -- ${args}
- unset IFS
- flock --unlock 7
- if [[ "${cmd}" == "${job_pool_end_of_jobs}" ]]; then
- echo "${cmd}" >&7
- else
- { ${cmd} "$@" ; }
- fi
-
- done
- exec 7>&-
- }
-
- function job_pool_stop_workers()
- {
- echo ${job_pool_end_of_jobs} >> ${job_pool_job_queue}
- wait
- }
-
- function job_pool_start_workers()
- {
- local job_queue=$1
- for ((i=0; i<${job_pool_pool_size}; i++)); do
- job_pool_worker ${i} ${job_queue} &
- done
- }
-
- function job_pool_init()
- {
- local pool_size=$1
- job_pool_pool_size=${pool_size:=1}
- rm -rf ${job_pool_job_queue}
- rm -rf ${job_pool_progress}
- touch ${job_pool_progress}
- mkfifo ${job_pool_job_queue}
- echo 0 >${job_pool_progress} &
- job_pool_start_workers ${job_pool_job_queue}
- }
-
- function job_pool_shutdown()
- {
- job_pool_stop_workers
- job_pool_cleanup
- }
-
- function job_pool_run()
- {
- if [[ "${job_pool_pool_size}" == "-1" ]]; then
- job_pool_init
- fi
- printf "%s\v" "$@" >> ${job_pool_job_queue}
- echo >> ${job_pool_job_queue}
- }
-
- function job_pool_wait()
- {
- job_pool_stop_workers
- job_pool_start_workers ${job_pool_job_queue}
- }
- </code></pre>
- </div>
-
-
- <p>Ok… But that the actual fuck is going in here???</p>
- <h3 id="fifo-and-flock">fifo and flock<a href="#fifo-and-flock" class="hanchor" ariaLabel="Anchor">⌗</a> </h3>
- <p>In order to understand what this code is doing, you first need to understand two
- key commands that we are using, <code>fifo</code> and <code>flock</code>. Despite their complicated
- names, they are actually quite simple. Let’s check their man pages to figure out
- their purposes, shall we?</p>
- <h4 id="man-fifo">man fifo<a href="#man-fifo" class="hanchor" ariaLabel="Anchor">⌗</a> </h4>
- <p>fifo’s man page tells us that:</p>
- <pre><code>NAME
- fifo - first-in first-out special file, named pipe
-
- DESCRIPTION
- A FIFO special file (a named pipe) is similar to a pipe, except that
- it is accessed as part of the filesystem. It can be opened by multiple
- processes for reading or writing. When processes are exchanging data
- via the FIFO, the kernel passes all data internally without writing it
- to the filesystem. Thus, the FIFO special file has no contents on the
- filesystem; the filesystem entry merely serves as a reference point so
- that processes can access the pipe using a name in the filesystem.
- </code></pre><p>So put in <strong>very</strong> simple terms, a fifo is a named pipe that can allows
- communication between processes. Using a fifo allows us to loop through the jobs
- in the pool without having to delete them manually, because once we read them
- with <code>read cmd args < ${job_queue}</code>, the job is out of the pipe and the next
- read outputs the next job in the pool. However the fact that we have multiple
- processes introduces one caveat, what if two processes access the pipe at the
- same time? They would run the same command and we don’t want that. So we resort
- to using <code>flock</code>.</p>
- <h4 id="man-flock">man flock<a href="#man-flock" class="hanchor" ariaLabel="Anchor">⌗</a> </h4>
- <p>flock’s man page defines it as:</p>
- <pre><code> SYNOPSIS
- flock [options] file|directory command [arguments]
- flock [options] file|directory -c command
- flock [options] number
-
- DESCRIPTION
- This utility manages flock(2) locks from within shell scripts or from
- the command line.
-
- The first and second of the above forms wrap the lock around the
- execution of a command, in a manner similar to su(1) or newgrp(1).
- They lock a specified file or directory, which is created (assuming
- appropriate permissions) if it does not already exist. By default, if
- the lock cannot be immediately acquired, flock waits until the lock is
- available.
-
- The third form uses an open file by its file descriptor number. See
- the examples below for how that can be used.
- </code></pre><p>Cool, translated to modern English that us regular folks use, <code>flock</code> is a thin
- wrapper around the C standard function <code>flock</code> (see <code>man 2 flock</code> if you are
- interested). It is used to manage locks and has several forms. The one we are
- interested in is the third one. According to the man page, it uses and open file
- by its <strong>file descriptor number</strong>. Aha! so that was the purpose of the <code>exec 7<> ${job_queue}</code> calls in the <code>job_pool_worker</code> function. It would essentially
- assign the file descriptor 7 to the fifo <code>job_queue</code> and afterwards lock it with
- <code>flock --exclusive 7</code>. Cool. This way only one process at a time can read from
- the fifo <code>job_queue</code></p>
- <h2 id="great-but-how-do-i-use-this">Great! But how do I use this?<a href="#great-but-how-do-i-use-this" class="hanchor" ariaLabel="Anchor">⌗</a> </h2>
- <p>It depends on your preference, you can either save this in a file(e.g.
- job_pool.sh) and source it in your bash script. Or you can simply paste it
- inside an existing bash script. Whatever tickles your fancy. I have also
- provided an example that replicates our first implementation. Just paste the
- below code under our “chad” job pool script.</p>
-
-
-
- <div class="collapsable-code">
- <input id="1" type="checkbox" />
- <label for="1">
- <span class="collapsable-code__language">bash</span>
-
- <span class="collapsable-code__toggle" data-label-expand="Show" data-label-collapse="Hide"></span>
- </label>
- <pre class="language-bash" ><code>
- function tester(){
- # A function that takes an int as a parameter and sleeps
- echo "$1"
- sleep "$1"
- echo "ENDED $1"
- }
-
- num_workers=$1
- job_pool_init $num_workers
- pcount=0
- for i in {1..10}; do
- job_pool_run tester "$i"
- done
-
- job_pool_wait
- job_pool_shutdown
- </code></pre>
- </div>
-
-
- <p>Hopefully this article was(or will be) helpful to you. From now on, you don’t
- ever have to write single threaded bash scripts like normies :)</p>
-
- </div></div>
-
-
-
-
-
- <div id="disqus_thread"></div>
- <script type="application/javascript">
- var disqus_config = function () {
-
-
-
- };
- (function() {
- if (["localhost", "127.0.0.1"].indexOf(window.location.hostname) != -1) {
- document.getElementById('disqus_thread').innerHTML = 'Disqus comments not available by default when the website is previewed locally.';
- return;
- }
- var d = document, s = d.createElement('script'); s.async = true;
- s.src = '//' + "fr1nge-xyz" + '.disqus.com/embed.js';
- s.setAttribute('data-timestamp', +new Date());
- (d.head || d.body).appendChild(s);
- })();
- </script>
- <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
- <a href="https://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
-
- </div>
-
- </div>
-
-
- <footer class="footer">
- <div class="footer__inner">
-
- <div class="copyright copyright--user">
- <span>Yigit Colakoglu</span>
-
- <span>:: Theme made by <a href="https://twitter.com/panr">panr</a></span>
- </div>
- </div>
- </footer>
-
- <script src="http://fr1nge.xyz/assets/main.js"></script>
- <script src="http://fr1nge.xyz/assets/prism.js"></script>
-
-
-
-
-
-
-
-
- </div>
-
- </body>
- </html>
|