3

Recently, I was asked to help with some side optimization project at our company, I've made some good research. I'm still not 100% sure if this is most efficient way to do this.

Problem:

  • Scraping for over a dozen different information from a internal system ( website ) and passing them into Microsoft Office document template.

Restrictions:

  • Website is working only in IE 9
  • System does not have any API / web services
  • This will be used at more than 100 different workstations
  • On workstations, there is only IE 9, no FF or Chrome allowed
  • Getting acceptance for installation any software except default Windows tools on workstations is almost impossible

We have made a small working proof of concept for this. It is using Visual Basic + javascript combo. Short description: Visual Basic opens instance of IE, then using javascript we are able to log into system, go to tabs that we need and find proper information, then we are able to push this data into the Office template.

It is working, but I'm not really sure that this approach is the best one.

We have tried some different solutions, nodejs server, Selenium, some other web scrapers, but they all seem to have some limitations.

Adam Zuckerman
  • 3,715
  • 1
  • 19
  • 27
kuba
  • 133
  • 6
  • 2
    How much would it cost to create an option in the internal website to directly download the data in Excel format? – miniBill Jul 08 '15 at 19:39
  • @miniBill, according to informations from team behind this system, around 2017 :-)) seriously – kuba Jul 08 '15 at 20:13
  • Is this something that will be needed on those 100 workstations for any length of time? For a short-term project, either get something that works or suggest they just enter the data manually. – JeffO Jul 08 '15 at 21:12

1 Answers1

5

Uncle Bob once said: "We don't ship shit...".

Working with legacy tools(IE 9) and scraping website you are:

  1. Spending significant amount of time on unscalable, potentially unreliable technology.
  2. Introducing very custom component understanding which would require future developers a lot of time understand and potentially will cost you a lot more in along run.
  3. Run into risk of upgrading Windows on those machines and being unable to use IE9 any further, which requires application update, etc...

What I would do if I was in this situation:

Option 1: API and Node.js based client. This one is simple and probably most smart cost and future wise. I would try to convince the managing party that we need additional software installed on the machines. I understand it's their policy not to, but they have to understand that they are the ones requesting extra functionality, so they got to make a choice there.

Update the system with API capabilities - potentially you might need them somewhere else in the future. What if your company introduces mobile-tablet based terminals, there is no IE9 anymore there, so API is a good idea long term.

Option 2: Node.js based scraper without API. Again, requires installation of additional piece on the Windows machines, can't get around that and business has to understand that. Using Node.js you can use a number of modules that simulate DOM and you can navigate in it just like you would in the browser, except with no UI. It will probably take you longer than API and probably will cost more, so this option is viable if you don't own the scrapee :)

Options 3: Proceed with your approach, but I personally wouldn't. You are hired as a professional developer, and part of being a professional is letting business know that sometimes it's too wrong to proceed :)

Alexus
  • 2,368
  • 15
  • 24
  • 1
    There will not be for 100% sure any tablet/ mobile version of this is system, but still, first option is also for me the best one. You know for sure, that sometimes it is really hard to convince business to anything more technical than IE9 and MS Excel. – kuba Jul 08 '15 at 20:17
  • 2
    When working with business unit, it's important to base your arguments on things that are "IMPORTANT TO THEM" :) Their value is MONEY most of the time, so explain that they will spend lot of money and get an outdated software right away with extra maintenance cost going forward. Alternatively they can spend the same amount or a tad more and get properly built scalable system that will up to modern standards an can be used for years with little maintenance. – Alexus Jul 08 '15 at 20:31