This handout is a supplement to the full presentation given by Mike Schrenk at the cij summer school 2009.
Online research often requires repetitive downloading of web pages. That process – along with extracting information found on websites, is tedious and error prone. Screen scraping and iMacros allow journalists to automate the process of computer aided research.
screen scraper
A screen scraper is a software that conducts automated browsing activities on the internet. A primary purpose of a screen scraper is to extract information from websites.
iMacros
iMacros is a browser plug-in that lets you to write ‘macros’ which are ‘pseudo’ programming tools that allow the automation of standard programs (like browsers).
iMacros is available for Internet Explorer and Firefox. I have had better results with Firefox and highly recommend its use over Internet Explorer.
Location for iMacros download (for Firefox)
https://addons.mozilla.org/en-US/firefox/addon/3863
initiating iMacros
The iMacros button in Firefox is located in the browser tool bar next to the url.
resources
Firefox download page https://addons.mozilla.org/en-US/firefox/addon/3863
iMacros home page http://www.iopus.com/
iMacros command reference http://wiki.imacros.net/Command_Reference
iMacros user forums http://forum.iopus.com/
Demo website http://www.schrenk.com/cij/imacros_demo.php
command reference
The following is a lists of all available iMacros commands. Each command has either zero or more parameters. If parameters can be omitted, they are enclosed by square brackets. If several choices are possible for the same parameter, they appear in brackets and are separated by the | character. Integer numbers are denoted by the letters n or m, all other name denote a series of characters (strings).
The ‘ character indicates a comment. If a line starts with ‘ everything behind the ‘ is ignored. Typically this is used for comments or to disable specific parts of a macro. Note: a macro cannot have empty lines, as an empty line indicates the end of the macro. So every line in the macro must have at least the comment symbol.
ADD result_var added_value
Adds a value to a variable.
BACK
Opens the previously visited web page.
CLEAR
Clears browser cache and cookies on the hard drive.
CLICK X=n Y=m [CONTENT=some_content] “Clicks” on the element at the specified X/Y coordinates.
CMDLINE variable default_value
Sets the variable to a value retrieved from the command line.
DISCONNECT
Disconnects the current dial-up connection.
EXTRACT POS=[R]n TYPE=(TXT|HREF|TITLE|ALT) ATTR=Anchor*
Extracts data from websites.
FILEDELETE NAME=file_name
Deletes a file.
FILTER TYPE=IMAGES STATUS=(ON|OFF)
Filters web site elements. Currently the support for filtering is experimental. If you need any other data filtered, please let us know what kind of filter you would like to see added.
FRAME F=n
Directs all following TAG or EXTRACT commands to the specified frame.
IMAGECLICK IMAGE=image_file
CONFIDENCE=n [CONTENT=some_content]
Sends a WINCLICK command to the specified image.
IMAGESEARCH IMAGE=image_file CONFIDENCE=n
Searches for the the input image specified via the IMAGE attribute.
ONCERTIFICATEDIALOG C=n
Selects the client side certificate from a dialog.
ONDIALOG POS=n BUTTON=(YES|NO|CANCEL) [CONTENT=some_content]
Handles JavaScript dialogs.
ONDOWNLOAD FOLDER=folder_name FILE=file_name
Handles download dialogs.
ONERRORDIALOG BUTTON=(YES|NO) CONTINUE=(YES|NO)
Handles error dialogs.
ONLOGIN USER=username PASSWORD=password
Handles login dialogs.
ONSECURITYDIALOG BUTTON=(YES|NO) CONTINUE=(YES|NO)
Handles security dialogs.
ONWEBPAGEDIALOG KEYS=some_keys
Handles web page dialogs.
PRINTPrints the current browser window.
PROMPT prompt_text variable_name [default_value]
Displays a popup to ask for a value. This value is stored in the variable.
PROXY ADDRESS=proxy_URL:port [BYPASS=page_name]
Connects to a proxy server to run the current macro.
REDIAL ISP
Redials a connection.
REFRESH
Refreshes (Reloads) current browser window.
SAVEAS TYPE=(CPL|MHT|HTM|TXT|EXTRACT|BMP) FOLDER=folder_name FILE=file_name
Saves information to a file.
SET variable_name variable_value
Assigns values to built-in variables.
SIZE X=n Y=m
Resizes the iMacros Browser Window.
STOPWATCH ID=id
TAB T=(n|OPEN|CLOSE|CLOSEALLOTHERS)
Sets focus on the tab with number n.
TAG POS=n TYPE=type [FORM=form] ATTR=attr [CONTENT=some_content]
Selects a webpage element.
URL GOTO=some_URL
Navigates to a URL in the currently active tab.
VERSION BUILD=4213805
Specifies the version of iMacros that created this macro.
WAIT SECONDS=(n|#DOWNLOADCOMPLETE#)
Waits for a specific time.
For more notes and demos please visit http://www.schrenk.com/cij/