This post continues my notes from Introduction to Databases, a MOOC taught by Stanford’s Jennifer Widom. The topic of this post is XQuery.

XQuery Defition

XQuery is a query and functional programming language that is designed to query collections of XML data.

XQuery Syntax: FLWOR

FLWOR Syntax
1
2
3
4
5
for $var in expr
let $var := expr
where expr
order by expr
return expr

A couple examples

An example with last posts countries.xml file:

FLWOR Syntax
1
2
3
4
for $c in doc('countries.xml')/countries/country
where $c/@population > 100000000
order by $c/@population
return $c/@name

Grab countries with a city larger than 10,000,000:

FLWOR Syntax
1
2
3
4
for $c in doc('countries.xml')/countries/country
where $c/city/population > 10000000
order by $c/@name
return $c/@name

All of the expr in the syntax can be XPath expressions. The FLWOR syntax should be simple to pick up for anyone familiar with scripting languages or SQL.

This spring I took Introduction to Databases a MOOC (Massive Open Online Class) from Stanford’s Jennifer Widom. It was very interesting, I learned a great deal from it. And one of the sections of the class that I really enjoyed, even though it sounded like it was going to be pretty dry, was the section on XML. I took quite a few pages of notes on the different XMl tools, so I figured I might as well post them to my blog.

XPath Definition

XPath is a query language for selecting nodes from an XML document.

from Wikipedia

XML for the Examples

countries.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<countries>
  <country name="Afghanistan" population="22664136" area="647500">
    <language percentage="11">Turkic</language>
    <language percentage="35">Pashtu</language>
    <language percentage="50">Afghan Persian</language>
  </country>
  <country name="Albania" population="3249136" area="28750"/>
  <country name="Algeria" population="29183032" area="2381740">
    <city>
      <name>Algiers</name>
      <population>1507241</population>
    </city>
  </country>
</countries>

Basic Path

Selects all of countries from the country root node.

1
doc('countries.xml')/countries/country

Path with Alternation

This will select countries with populations that speak Pashtu or Swedish.

1
doc('countries.xml')/countries/country[language(Pashtu|Swedish)]

Attributes are prepended with “@”

This will select the names of all countries

1
doc('countries.xml')/countries/country/@name

Path with Conditional

Select the names of all countries that have a population of less than 1,000,000.

1
doc('countries.xml')/countries/country[@population < 1000000]/@name

Complex Conditional

Select all countries that have a population of less than 1,000,000.

1
doc('countries.xml')/countries/country[@population < 1000000 and city/population > 50000]

data() prints out just the text value.

1
doc('countries.xml')/countries/country[@population < 1000000]/data(@name)

Just a quick post to write up an bash alias I just created to list the sites enabled on a server and each sites location.

I often need to look up the list of sites enabled on a web server and also the document roots of each of these sites. In the past this meant loading the Apache configuration file into emacs or vim and searching for the site I was looking for. Today I decided to create an alias for this task. Here is the original combination of grep commands.

Grep Command

1
2
grep ''^ServerName" -A 3 /etc/apache2/sites-available/default | \
   grep "^ServerName\|^DocumentRoot\|^--"
  1. ^ServerName:
    • Searches the file for all lines that begin with ServerName at the beginning of the line.
  2. -A 3
    • Prints the 3 lines after the match.
  3. 3. grep “^ServerName\|^DocumentRoot\|^–”’
    • Prints only the lines that we are interested (ie lines that begin with ServerName or DocumentRoot or –)
    • The “\|” in the second grep command separates the regex with an “or” statement.

Alias Command

  1. Choose a name for the command, in my case, show-sites
  2. Add the alias to the .bashrc or .bash_alias file in your home directory
1
2
alias show-sites='grep "^ServerName" -A 3 /etc/apache2/sites-available/default | \
 grep "^ServerName\|^DocumentRoot\|^--"'

Reload Your Bash Configuration

1
. ~/.bashrc

Show All Current Aliases

Type alias at the command line to view all of the currently defined aliases.

1
$ alias

Bash prints out all of your aliases, like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
alias ducks='du -cks * |sort -rn |head -11'
alias gb='git branch'
alias gba='git branch -a'
alias gc='git commit -v'
alias gca='git commit -v -a'
alias gd='git diff | vim -'
alias gl='git log --stat'
alias gp='git push'
alias graceful='sudo apache2ctl graceful'
alias gst='git status'
alias ls='ls --color=auto'
alias show-sites='grep '\''^ServerName'\'' -A 3 /etc/apache2/sites-available/default \
  | grep '\''^ServerName\|^DocumentRoot\|^--'\'''

Drupal 7 includes jQuery 1.4.4 by default. So getting started using jQuery in your module or theme is really simple. The easiest way to include a javascript file is to include it in a .info file with a line like “scripts[]= myfile.js”.

How to include your js file in a module or theme

Add this to the .info file

1
scripts[] = js/myjavascript.js

this will include the javascript file on every page that includes your module or theme.

Include it in your Theme’s template.php

Themes have more granular control of where javascript file is included in template.php

1
drupal_add_js(drupal_get_path('theme','kepler6_omega') . 'js/myjavascript.js');

Use drupal_add_js in your module

Include js in yarr.module

1
2
3
4
function yarr_form_alter(&amp;$form, &amp;$form_state, $form_id) {
  drupal_add_js(drupal_get_path('module,'yarr') . 'js/myjavascript.js');
  ...
}

Use drupal_add_library to add libraries included in Drupal

example add jQueryUI Accordion library

1
drupal_add_library('system','ui.accordion');

While jQuery core is included by default, jQuery UI is not, so you need to include that with drupal_add_library if you need it, as above.)

Drupal Behaviors

Drupal behaviors are used instead of $(document).ready();

$(document).ready() is used to wait until the entire DOM is loaded before adding your javascript functionality to the DOM.

Drupal adds its own way to add your javascript to the DOM

Drupal attach behaviors

attach your functionality to the Drupal objects behavior array

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(function($) {

Drupal.behaviors.yarr = {
  attach: function(context, settings) {
    $('h1 &gt; a').each(function(i) {
      var h1text = $(this).text();
      $(this).text("Yarrr, matey! " + h1text);
    });
    $('h1:not(:has(a))').each(function(i) {
      var h1text = $(this).text();
      $(this).text("Yarr, " + h1text);
    });
  }
};

})(jQuery)

Notes on the code example

  • The outside wrapper is required if you wish to use the short jquery “$” syntax.
  • $(‘h1 > a’) selects all of the H1 headers (and the inner anchor) that have immediate anchor children
  • $(‘h1:not(:has(a))’) selects all H1 headers that do not have anchors embedded in them.
  • .each, iterates over each element in the selector.

Inside the function the “this” object refers to the current DOM element. If you want the jQuery object use the $(this) syntax.

Further Reading

Drupal Documentation on Javascript and jQuery

jQuery Documentation

jQuery API Documentation

spam filters fall apart; the centre cannot hold.

Spam Gets By …

From time to time, despite precautions, spammers get by the Drupal filters and captchas. Or perhaps, the problem gets put in your lap and you have the pleasure of deleting thousands of spam comments.

The Drupal Administrative interface doesn’t provide many resources unless you enjoy clicking “select all” and “delete selected comments”, page after page, 50 comments at a time.

Drush to the Rescue

But, if you are comfortable with sql, one easy solution is to use drush sql-query to delete the comments.

Before you start Deleting anything…

If you are unfamiliar at all with sql, first back up your drupal database.

1
drush sql-dump --result-file=/path/to/dump-file.sql

You could also dump just the comment table. (The ‘~’ says to save it in your home directory.)

1
drush sql-dump --result-file=~/comment-table.sql --tables-list=comment

Now let’s get busy!

First take a look at the comment table:

1
drush sqlq "desc comment"

( drush sqlq is the short form of drush sql-query)

You could also use drush sql-cli to view the comment table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ drush sql-cli
....
mysql> desc comment;
+----------+---------------------+------+-----+---------+----------------+
| Field    | Type                | Null | Key | Default | Extra          |
+----------+---------------------+------+-----+---------+----------------+
| cid      | int(11)             | NO   | PRI | NULL    | auto_increment |
| pid      | int(11)             | NO   | MUL | 0       |                |
| nid      | int(11)             | NO   | MUL | 0       |                |
| uid      | int(11)             | NO   | MUL | 0       |                |
| subject  | varchar(64)         | NO   |     |         |                |
| hostname | varchar(128)        | NO   |     |         |                |
| created  | int(11)             | NO   | MUL | 0       |                |
| changed  | int(11)             | NO   |     | 0       |                |
| status   | tinyint(3) unsigned | NO   |     | 1       |                |
| thread   | varchar(255)        | NO   |     | NULL    |                |
| name     | varchar(60)         | YES  |     | NULL    |                |
| mail     | varchar(64)         | YES  |     | NULL    |                |
| homepage | varchar(255)        | YES  |     | NULL    |                |
| language | varchar(12)         | NO   |     |         |                |
+----------+---------------------+------+-----+---------+----------------+
14 rows in set (0.00 sec)

Next take a look at the comments:

1
drush sqlq "select subject,cid from comment "

You can show comments from the last 30 days with the following.

The created field is stored as a unix timestamp (seconds since the Unix Epoch), so to get last month timestamp:

1
Seconds to subtract = (seconds in an hour)*(24 hours in a day)*(30 days) = 3600*24*30
1
drush sqlq "select subject,cid from comment where created > unix_timestamp(now())-3600*24*30"

Delete comments within a range of comment ids.

If you know the range of spam comment ids, you can give sql a range of comment ids:

Show the range of comments:

1
drush sqlq "select subject,cid from comment where cid > 3013 and cid < 3134 "

Delete the range of comments:

1
drush sqlq "delete from comment where cid > 3013 and cid < 3134 "

Show comments, group by name of commenter

1
drush sqlq "select subject,name,cid from comment where cid > 300 group by name"

But, honestly, there’s probably a drush delete-comments module that needs to be written!

I’ll keep you posted of my progress on that. In the meantime, you can try using the previous commands. Just remember to backup your database first, its only a single drush command after all!

1
drush dump --result-file=~/drupal-backup.sql

After reading Randall Degges excellent post How I learned to Program, I decided to give creating a screenshot program a try. It’s always a bit of work to take a screenshot, upload it to the web and then grab its permanent link. So being a geek, I thought it high time this process was automated!

I tried writing the program in python only, so that there were fewer dependencies, but using the commandline program scrot as the screenshot snapping program added a lot of flexibility to the options for the screenshot.

Without further ado, here are the parts of the program.

Take Screenshot with scrot

I start with a bash script that calls scrot, and passes in my program as a parameter to scrot.

screen-upload.bash

1
2
#!/bin/bash
/usr/local/bin/scrot '%m-%d-%Y-screenshot.png' -d 5 -u -e '/home/bart/projects/screen-upload/screen-upload.py $f' 

Next, chmod +x the bash script:

1
chmod +x screen-upload.bash

Scrot Options explained

  • -d 5: Delays the snap for 5 seconds.
  • -u: Use the currently focused window.
  • -e ‘script’: Execute the following script after the screenshot is snapped.
  • $f: passes in filename to script.

On to the Python Picture Uploading Program

Scrot takes a screenshot after a delay of 5 seconds of the currently active window, it names the file with the current date-screenshot.png, it then passes the filename into my program, “screen-upload.py”.

Upload Image to Picasaweb

For uploading the image to Picasaweb, I used the gdata module. Installing it involved the standard: Download the module, extract it and cd into the directory and run “sudo python setup.py install”.

In version 2 of screen-upload, I will rewrite the authentication to use Oauth2, so that the authentication is more secure, for the alpha version, I am just passing in the users email and password into the gd_client object.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import gdata.photos.service
import gdata.media
import gdata.geo

filename = fname
gd_client = gdata.photos.service.PhotosService()
gd_client.email = username
gd_client.password = passwd
gd_client.source = 'screen-upload-1'
gd_client.ProgrammaticLogin()
print 'uploading',  filename

album_url = '/data/feed/api/user/%s/albumid/%s' % (username, album_id)
photo = gd_client.InsertPhotoSimple(album_url, 'New Photo',
               'Uploaded using the API', filename, content_type='image/png')

Copy URL into the Clipboard

Copying the URL to the clipboard is easy as long as you are not too concerned with cross-platform compatibility. I used the gtk module to gain access to the clipboard on Linux systems.

1
2
3
4
5
6
from gtk import Clipboard

def clip_store(pic_url):
    cb = Clipboard()
    cb.set_text(pic_url)
    cb.store()

This can be made more cross-platform with a series of try statements that load platform specific libraries.

1
2
3
4
5
6
try:
  import win32

...
try:
  from gtk import Clipboard

Setting up the Keyboard Shortcut in Gnome 3.

I had a bit of a bother creating a keyboard shortcut in Gnome. I created a new shortcut in the keyboard settings configuration, but nothing happened.

Long story short: I created a shortcut under “Custom Shortcuts”. I called it “Take Screenshot” and I named the script “screen-upload”. Then I created a symbolic link to the script in /usr/local/bin.

1
ln -s ~/projects/screen-upload/screen-upload.bash /usr/local/bin/screen-upload.

This did the trick, now when I press the PrtSc button, a screenshot is taken of the current window, and its uploaded to Picasa and the URL of the image is copied in to my clipboard!

"Screenshot of screen-upload.py in Emacs"

References

NearlyFreeSpeech.net is a good simple, bare-bones fit for a Jekyll site once everything is set up. The price can’t be beat, you can put down a $10 payment and just try it out. With a Jekyll site, that $10 investment should last a while!

Set Up SSH Alias

The first thing to do is set up an ssh alias in your .bashrc file. The hostname and username that are assigned at nfsn are generally pretty easy to forget, so I always create an alias for them, like:

alias sshnfsn='username_sitename@ssh.yourhostname.nearlyfreespeech.net';

Once you save the .bashrc file, you can source it with:

. ~/.bashrc

and then test that it works with by running it.

sshnfsn

Aside: Public, Private, Protected Directories

On a nfsn site, there are 3 directories in your /home folder.

  • The /home/public folder is your public facing directory. This is your website’s public Document Root.

  • The /home/private folder is your private folder where you can store git repos, your ruby gems, anything private you need to use or back up.

  • The /home/protected: let me know if find a use for this.

Set Up Git on NFSN

Once you are ssh’d into nfsn, cd into /home/private and create a directory for your git repositories and create a bare git repository for your site.

cd /home/private
mkdir git
cd git
mkdir mysite.git
cd mysite.git
git init --bare

Now you have a bare git repository for your jekyll site. The following script will create a git post-receive hook, that will run every time you push to your git repository.

#!/bin/sh

REPONAME=mysite.git
GIT_REPO=$HOME/git/$REPONAME
TMP_GIT_CLONE=$HOME/git/tmp_deploy/$REPONAME
PUBLIC_WWW=/home/public

# clone the git repo into the tmp_deploy directory.
git clone $GIT_REPO $TMP_GIT_CLONE
# jekyll-ify the site and place html files in /home/public
jekyll --no-auto $TMP_GIT_CLONE $PUBLIC_WWW

rm -Rf $HOME/git/tmp_deploy/$REPONAME/.git/objects
rm -Rf $HOME/git/tmp_deploy/$REPONAME/.git
rm -Rf $HOME/git/tmp_deploy/$REPONAME

exit

Be sure to chmod +x the file:

chmod ug+x /home/private/git/mysite.git/post-receive

Now, every time you “git push” into nfsn this post-receive hook will auto-regenerate your jekyll site.

Set Up a Git Remote to NFSN

Back on your home computer or wherever you are building your jekyll site, you need to create a git remote for your nsfn git repository.

cd ~/projects/mysite.git
# substitute in your username, sitename and nfsn hostname
git remote add nfsn ssh://&lt;nsfnusername&gt;-&lt;sitename&gt;@ssh.&lt;hostname&gt;.nearlyfreespeech.net/home/private/git/mysite.git

Now that you have a git remote set up for your jekyll site, you can Create a page in your jekyll “_post” directory.
Run “jekyll –server” to test it locally at http://localhost:4000.

jekyll --server

If it looks good, add it to your git repository.

git commit -am "new brilliant post!"

Next, push it to your website on nfsn.

git push nfsn master

The post-receive script you created earlier will generate your new public jekyll website automatically.

Test it and enjoy!

How I did part of a SAS assignment in R

I was working on a SAS assignment for my Regression Analysis class. The residual plot in SAS was corrupted, luckily I was able to recreate it in R. (And actually it looks better in R too.)

# read in data from file 
gpa.data <- read.table("GPA.txt");
# create multiple linear regression model
gpa.fit <- lm(V4 ~ V1 + V2 + V3 + V5 + V6 + V7 + V8, data=gpa.data)

# save residuals to a variable
gpa.resid <- residuals(gpa.fit)
# save Predicted values to a variable
gpa.yhat <- fitted.values(gpa.fit)

# create png file, that plot will be saved to.
#png("resid.png")
# create a plot of residuals vs predicted values
plot(gpa.yhat,gpa.resid, ylab="Residuals", xlab="Predicted Values of Cum GPA", main="Plot of Residuals*Predicted Values")
# create a line#
abline(0,0)
# write plot to file
#dev.off()

resid.png

ANOVA table of the Multiple Regression Model

Here is R’s ANOVA command. Run it on the regression model gpa.fit.
anova(gpa.fit)
Here is the output of the ANOVA command.
Analysis of Variance Table

Response: V4
           Df  Sum Sq Mean Sq F value    Pr(> F)    
V1          1   0.567  0.5669  1.8709  0.172001    
V2          1  26.488 26.4877 87.4135 < 2.2e-16 ***
V3          1   9.096  9.0964 30.0194 6.839e-08 ***
V5          1   2.446  2.4459  8.0717  0.004683 ** 
V6          1   1.032  1.0324  3.4069  0.065524 .  
V7          1   0.020  0.0199  0.0655  0.798068    
V8          1   0.278  0.2784  0.9187  0.338288    
Residuals 492 149.084  0.3030                      
---
codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The stars in R’s output are the signoificance code. In the above summary output, V2, V3, V5 are very significant. V1, V7, and V8 are not very significant; the regression model should be tested without them.

Summary of Linear Regression Model

summary(gpa.fit)
Call:
lm(formula = V4 ~ V1 + V2 + V3 + V5 + V6 + V7 + V8, data = gpa.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3345 -0.3387 -0.0059  0.3429  1.9661 

Coefficients:
              Estimate Std. Error t value Pr(> |t|)    
(Intercept)  1.7792451  0.5165056   3.445  0.00062 ***
V1          -0.0383924  0.0527532  -0.728  0.46710    
V2           0.2716557  0.0417213   6.511 1.84e-10 ***
V3           0.0010404  0.0003173   3.279  0.00111 ** 
V5           0.0010332  0.0003425   3.016  0.00269 ** 
V6          -0.0391177  0.0239319  -1.635  0.10279    
V7          -0.0006554  0.0028471  -0.230  0.81805    
V8          -0.0054118  0.0056462  -0.958  0.33829    
---
codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.5505 on 492 degrees of freedom
Multiple R-squared: 0.2112, Adjusted R-squared:   0.2 
F-statistic: 18.82 on 7 and 492 DF,  p-value: < 2.2e-16
Vim

So, I was trying to convert an orgmode file to html. There is an orgmode-export-as-html function, but it was choking on something, so I needed to do the conversion by hand. I tried many, many times to convert the file with emacs regular expressions, but was unsuccessful. The difficulty was the orgmode headlines, which look like this:
*** Use the sort function
I wanted to simply transform this into
<h3&gt;Use the sort function</h3&gt;

Vim Regexp

I was finally able to match the headline in vim with the following regular expression:
\*\*\* \(\(w*\).*\)
  • The literal asterisks had to be matched with \*.
  • Next, to match to the end of the line I used: \(\(w*\).*\)
  • \( \) creates a matching group, which can be used later in the replacement string with \1 or \2 etc.
  • Also \(w*\).* matches 1 or more words

    Vim Substitution Expression

    %s/\*\*\* \(\(w*\).*\)/<h3&gt;\1<\/h3&gt;/
  • The % matches the entire file
  • The \1 matches the first matching group in the search string That is, \1 will match everything on the line after the 3 literal asterisks.

    Testing Matching String

    Its easy to test a regular expression in vim. While in command mode (hit Esc if you are in input mode), type a / and then type your regular expression and then press “return”, like so:
    /\*\*\* [RETURN]
    Your matches will be highlighted. Click ‘n’ to jump to the next match.

    Getting back to previous substitution

    Vim has a cool, intuitive way to explore your command history. In command mode, type “:” as for typing an ex command, and then press the up arrow to scroll up to through your command history. This is really handy for using previous, complex regular expressions.
  • The Data Step

    Subsetting IF

    In a Data Step you can exclude some observations from the dataset with an if statement.
    data tornados_1980s;
      infile FileName;
      input year city damages;
      * this limits input data to 1980s data;
      if 1980 <= year <= 1989;
    run;
  • IF .. IN statement
    if year in (1980, 1981, 1982);
  • AND, OR
    if year = '1980' and city = 'Baltimore';

    infile ‘filename’

    In the data step, import data from a file with the infile command.
    data tornados;
      infile 'tornados.dat';
      input year city cost;
    run;

    Set

  • Use the set command to create a new data set from an already created set. The following creates a dataset of 1980s tornado data from the larger set of tornado data.
    data tornados_1980s;
      set tornados;
      if 1980 <= year <= 1989;
    run;

    The PROC Step

    PROC SORT

  • The sort procedure, sorts data. You can sort by multiple fields.
  • Also you can print by a field.
    proc sort data=tornados;
      by year city;
    proc print data=tornados;
      by year;
    run;

    PROC Univariate

    PROC Univariate generates descriptive statistics
    proc univariate data=tornados;
      histogram year;
    run;

    PROC means

    Use proc means when you are only interested in basic descriptive statistics.

    PROC freq

  • generates tables for data in categories.

    PROC gplot

    proc gplot data=tornados;
      plot year*cost;
      title 'Year by Cost tornados';
    run;

    PROC corr

    compute the correlation

    proc corr data=grades;
      var exam1 exam2 hwscore;
    run;

    PROC reg

  • p: prints obs, predicted, residuals
  • r: same as p, plus more
  • clm: 95% conf interval for mean of each obs
  • cli: 95% prediction intervals.
    proc reg data=grades;
      model final=exam1 hwscore / p r cli clm;
      plot final*hwscore;
    run;

    Multiple Regression Analysis

    Variable Selection

    SAS has several methods for selecting variables
    proc reg data=cdi;
      model y = x1-x8 /selection=rsquare best=1;
      model y = x1-x8 /selection=adjrsq best=5;
      model y = x1-x8 /selection=cp best=10;
      model y = x1-x8 /selection=forward slentry=0.10;
      model y = x1-x8 /selection=stepwise slentry=0.10 slstay=0.10;
      model y = x1-x8 /selection=backward slstay=0.10;
    run;
    additional pages to try: more sas