@rem
@rem This is a compilation of several scripts i wrote several years ago
@rem along with comments. If anyone finds them of interest, please feel
@rem free to use the idea. They were written to extract data from Web
@rem log files in order to do custom log file data extraction,
@rem scripting, creation of poll results and so forth.
@rem
@rem
@rem e-comm2.bat awk script
@rem 1) set-up copy-new.bat for the name of the website you want to evaluate.
@rem 2) run copy-new.bat (it will create f-name.bat, fname2.bat, fname2.awk)
@rem only after you type [exit] on the DOS command line to exit 4DOS
@rem 3) You must then edit the 3 files as copy-new.bat successively loads all
@rem 3 files for editing.
@rem 4) To start the data gathering from the log files below for the web site
@rem of your request, simply type in the f-name.bat of your requested web
@rem site to extract data from. ie. poll for the /pollster/poll.html site.
@rem ------------------------------------------------------------------------
@rem Script e-comm2.bat (harvests data)
@rem
awk.exe e-comm2.awk < log-0510.txt > e-comm2.dat
awk.exe e-comm2.awk < log-1110.txt >> e-comm2.dat
awk.exe e-comm2.awk < log-1610.txt >> e-comm2.dat
awk.exe e-comm2.awk < log-2010.txt >> e-comm2.dat
awk.exe e-comm2.awk < log-2610.txt >> e-comm2.dat
awk.exe e-comm2.awk < log-2810.txt >> e-comm2.dat
awk.exe e-comm2.awk < log.txt >> e-comm2.dat
@rem ---------------------------------------
@rem 1) run e-comm2.bat (creates database e-comm2.dat)
@rem 2) sort e-comm2.dat > e-comm2.srt
@rem 3) sed -f e-comm.ex e-comm2.srt > e-comm2.op
@rem
@rem Note: NB when runing copy-new.bat, always leave the following files
@rem alone and as the are!
@rem e-comm.ex, e-comm3.awk and e-comm4.pl are to always remain the same,
@rem even when writing new extraction scripts for other web sites!
@rem -------------------------------------------------------------
@rem Poll.bat file contents:
@rem
@rem call poll2.bat (creates database poll2.dat)
@rem sort poll2.dat > poll2.srt sorted poll2.srt
@rem awk e-comm3.awk < poll2.srt > poll3.op extract specified fields
@rem sed -f e-comm.ex poll2.srt > poll2.op clean up data gathered (no need)
@rem sort poll2.op > poll2.htm sorted poll2.op
@rem perl386 e-comm4.pl poll2.dat > poll6.op clean up sorted data gathered
@rem sort poll6.op > poll6.srt sorted again on new fields
@rem
@rem
@rem ----------------------------------------------------
@rem Sed program e-comm.ex
@rem Different Search Engines have different signatures
@rem need to remove their search signatures
@rem {
@rem /6&layer=/d aol layer has no search data (so delete line)
@rem /browse.psp?/d netscape with this signature has no search data
@rem s/.*query?q=//g ALTAVISTA
@rem s/.*query?p=//g GOOGLE.YAHOO Visited by Crawler SE BOT
@rem s/.*query_ca?p=//g ca.google.yahoo
@rem s/.*query?pg=//g ALTAVISTA Visited by Scooter, Mercator or Scrub
@rem s/.*showmore&query=//g aolsearch.aol
@rem s/.*prevq=//g ALTAVISTA
@rem s/.*query=//g AOL LYCOS Visited by T-Rex spider BOT
@rem Had to replace this one here with the next 7 versions @rem s/.*search=//g
@rem s/.*8&search=//g netscape utilizes namely or trivial as BOT
@rem s/.*es&search=//g |
@rem s/.*op&search=//g |
@rem s/.*xr&search=//g |
@rem s/.*rc&search=//g |
@rem s/.*ll&search=//g |
@rem s/.*psp&search=//g |
@rem s/.*bot&search=//g |
@rem s/.*psp?search=//g |
@rem s/.*scnetscape?search=//g netscape
@rem s/.*?keywords=//g |
@rem s/.*_psp&Keywords=//g earthlink
@rem s/.*MT=//g Hotbot.lycos
@rem s/.*?MT=//g search.msn
@rem s/.*adp?//g teensearch (AOL)
@rem s/.*search?z//g visisimo
@rem s/.*search?hl//g google canada, germany
@rem s/.*search?q=//g google uk, zdnet, alltheweb
@rem s/.*www.google.com\/search?q=//g google uses crawler as SE bot
@rem s/.*search?as_q=//g google
@rem s/.*&q=//g google
@rem s/.*ie?q=//g google
@rem s/.*as_q=//g google
@rem s/.*q=cache://g google
@rem s/.*url=//g google images
@rem s/.*images%3Fq//g google images
@rem s/.*html?q=//g radarvol.vol.com
@rem s/.*gw?=web//g excite visited by Architext spider
@rem s/.*search.gw?c=//g excite also atext
@rem s/.*search.gw?//g excite also multitext
@rem s/.*web?q=//g altavista
@rem s/.*aq&q=//g fr.altavista
@rem s/.*q&q=//g uk.altavista
@rem s/.*q?pg=//g uk.altavista
@rem s/.*?c=//g excite.com
@rem s/.*?wf,//g webferret
@rem s/.*cc%3A//g redaruol.uol
@rem s/.*srch&qt=//g search.com
@rem s/.*cgi?keywords=//g search.com
@rem s/.*search?channel//g search.com
@rem s/.*ch&q=//g search.com channel
@rem s/.*php?qry=//g directhit
@rem s/.*results.asp?q=//g msn
@rem s/.*&title=//g websearch.cs.com
@rem s/.*&ask=//g askjeeves
@rem s/.*dir.asp?cat=//g open directory
@rem s/.*fir&search=//g netscape
@rem ---------
@rem s/.infoseek uses sidewinder and WISEnotbot
@rem ---------
@rem s/.*?cat=//g netmenu.nl
@rem slurp@inktomi spider feeds inktomi
@rem to many search engines.
@rem ---------
@rem infoseek uses sidewinder BOT
@rem ---------
@rem s/.*search?cat=//g alltheweb uses FAST-search as BOT
@rem ---------
@rem s/.*nlquery.fcg?cb=0&qr=//g northernlight Visited by Gulliver BOT
@rem s/+/ /g And clean up various CGI control codes
@rem s/%20/ /g
@rem s/%22/ /g
@rem s/%26/ /g
@rem s/%27/ /g
@rem s/%28/ /g
@rem s/%29/ /g
@rem s/%2b/ /g
@rem s/%2c/ /g
@rem s/%2d/-/g
@rem s/%2D/-/g
@rem s/%2e/ /g
@rem s/%2F/ /g
@rem s/%2B/ /g
@rem s/%2f/ /g
@rem s/%3A/ /g
@rem s/%3b/ /g
@rem s/%3D/ /g
@rem s/%3F/-/g
@rem s/%40/(/g
@rem s/%60/'/g
@rem s/%B4/ /g
@rem s/%E9/ /g
@rem ---------
@rem }
@rem s/&.*// sub routine, Clean up everything after & on all lines
@rem s/^ *// get rid of white spaces at beginning of lines
@rem s/[,,].*// get rid of EOL starting with ,,
@rem s/num.*// get rid of anoying EOL that start with num
For site visitor tracking software check here