Skip to main content
Untangling the web using the simplest language possible.


JavaScript Closures: A use case



I recently started working for a small dev agency here, and the first item I was supposed to take over was a data scraping tool.  If you don't know what that is, basically it's going through websites and targeting specific elements and grabbing the data off of them.  How do you do that? Well, you can use something like Puppeteer to run an automated Chromium browser that can carry out tasks such as "navigate to this page" and "click the element with this CSS selector", etc.  In this project we were grabbing XHR responses from the server (if you open your browser dev tools, go to 'network' and click 'XHR' you can see requests and responses, their headers, etc) and then saving the JSON in the responses to our database (a SQL database).

Puppeteer has a handy method that places a listener for requests or responses.  So any time the browser receives a response it would fire off a callback.  It was within this callback that I had an async function saving to the database.  At this point I had two problems:
  1. We had thousands of items being saved to the database in rapid succession
  2. We are sending thousands of items over the wire to the SQL database (this was on a cloud machine and the database on a separate cloud machine) 
Now, in hindsight, neither of these was causing my problem, BUT it was better that I fixed it.  Sequelize has this cool method called bulkCreate() that takes an array of values, and saves them to your database all at once.  That sure beats constantly sending data over the wire and constantly giving instructions to the database.

My problem THEN became my callback: it's just a function that fires up, does its thing, and ends; there is no way for me to save the array each pass since it's an async function.  Enter the closure.

What is a closure?


A closure in JavaScript is when you have one function that returns another function.  So what's the big deal?  Well, when that function is returned, a closure is erected around all of the variables in the first functions scope, and sends them along with the returned function (this is called the lexical environment). 

The best analogy I've seen is thinking about a closure like a storage box for a function; when you make a closure it creates a "box" that holds all the variables in the scope of the function.  So, you write a function, this function then has variables, then at the end it returns another function, and that function can operate on those variables even after the return has happened.  That's the cool part: after we initialize that function in a variable, we would think those initial scope variables would disappear, but they don't. They get boxed up and shipped along with the returned function.  If you're familiar with react, you could think of creating a closure almost like having state in a component.

Here's an example:


So above we can see that I still have access to the variables in the initial function inside my returned function.  That is what a closure does.  Pretty neat!

In my case, I created a closure outside of the listener with the variables I needed, along with an array.  Then, every time the listener fired I had it push items to the array if they met some logic requirements (using data from the XHR response to tell what the data was, if it fit what I wanted, it got added).  So now it's adding the data I want on the page to the array, but how does it know when it's done with the page?  Another if clause that fires when the page source changes; basically everything had a "page type" id on it, and when that changed in the response, I knew that we were done with that page.  Since my callback fired on every response, the first one that wasn't what I wanted tripped the if, and inside that block I had Sequelize fire off the bulkCreate method with my array for one clean insert into the database.

JavaScript closures are pretty handy and for some reason people seem to think it's this lofty uber concept.  There are many use cases for them and this is just one.  I hope this has helped you some and hopefully you can apply something I've said to your own code!

Thanks for reading!

// Corey

Comments

  1. Thanks for sharing the best information and suggestions, If you are looking for the best Data Scraping Services, then visit Worth Web scraping services. Highly energetic blog, I’d love to find out some additional information.

    ReplyDelete

Post a Comment

Popular posts from this blog

What is front end? Back end? Full stack?

JavaScript, HTML, CSS, Ruby, Node, PHP, Version Control, GIT, Java, JVM, React, Angular, Angular 2+, Redux, Vue, jQuery, Hadoop, WordPress, T-SQL, PostgreSQL, noSQL, es6, es7. ecma2015, babel, Webpack, Grunt, Gulp..... what the f%$k . When I first started in the web world I had no idea what a "Front End" or a "Back End" was, I knew that "Full Stack" meant doing both, but that didn't help much.  Engineering fields can be very intimidating, and many times we don't know what something is but there is often talk of things being "trivial", said usually with a condescending tone, and it scares us away from asking.  Well I don't believe in a dumb question.  If you don't know something it's because you were never taught it.  There are very few innate behaviors, like breathing.  Most of what we know is learned. So let's learn you real good like. Front End All bow to the mighty Front End Triad!  Front End  develo

What the hell is automated testing?

I was recently contacted by a recruiter wanting to submit for a position that specifically asked about my experience with Automated Testing ™.  Also if (read: when) I place something in Bold Italics  with a  ™ after I am being extremely sarcastic. I feel like automated testing is one of those things that is built up to be much more than it actually is.  It conjures images (at least in me) of the above; or a robot in lab automatically testing things, or  perhaps an AI system that just knows what to do. Well it turns out that automated testing is just a test or group of tests that you write once and then call whenever the tests are needed. What is a test? I'm so glad you asked!!  A test is just a bit of code that checks your other code.  But what the hell does that really mean? Say I have a function, it takes arguments, and returns a value (I know, sometimes I get  wild ): Now we want to test this function, so we'd write a test: Essentially we just wrote a second