kindageeky

Tag Results

8 posts tagged R

Twitter Sentiment & Brand Management

The last post talked about R in the cloud and made reference to how United Airlines was possibly influencing brand sentiment through a contest they are having for a trip to the Olympics.  After a bit of hacking in R this is the recent sentiment I’ve seen after treating the campaign #FlyMetoLondon as neutral (as these are optionally generated after a user registers for a contest).  I would say this is a very effective campaign for brand management.

Nearly 1200 of the most recent 1500 posts were coming from this contest, more than offsetting the otherwise neutral to negative sentiment from other tweets.  I guess we’ll see how this campaign shifts the ambient sentiment when the campaign is over, as not scientifically meaningful, but previous assessment showed an approval rating of 27% (i.e. 73% neutral to negative). 

Update


This is from sentiment140 run at 11pm EST using @united as the search query. 

To clarify, last time I measured the sentiment, it was seemingly very negative overall (as roughly confirmed by the chart above), now factoring in the campaign, it is seemingly very positive, but what will be the sentiment when the campaign is over?

Here is the relevant R code for reference …

library(twitteR)
united.tweets = searchTwitter('@united', n=1500)
tweet = united.tweets[[1]]
class(tweet)
tweet$screenName
tweet$text

library(plyr)
united.text = laply(united.tweets, function(t) t$getText())
head(united.text, 5)

hu.liu.pos = scan('/users/kindageeky/Downloads/positive-words.txt', what='character', comment.char=';')
hu.liu.neg = scan('/users/kindageeky/Downloads/negative-words.txt', what='character', comment.char=';')

pos.words = c(hu.liu.pos, 'upgraded', 'early', 'safe')
neg.words = c(hu.liu.neg, 'wtf', 'delay', 'late', 'broke', 'lost', 'dated', 'damage', 'herd', 'epicfail', 'sitting', 'mech', 'tarmac', 'ugh' )
 
score.sentiment = function(sentences, pos.words, neg.words, .progress='none', brand)
{
	require(plyr)
	require(stringr)
	
	# we got a vector of sentences. plyr will handle a list or a vector as an "l" for us
	# we want a simple array of scores back, so we use "l" + "a" + "ply" = laply:
	scores = laply(sentences, function(sentence, pos.words, neg.words) {
		
		# clean up sentences with R's regex-driven global substitute, gsub():
		sentence = gsub('[[:punct:]]', '', sentence)
		sentence = gsub('[[:cntrl:]]', '', sentence)
		sentence = gsub('\\d+', '', sentence)
		# and convert to lower case, may blow up:
		t  0)
		{
			#print('marketing') 
			return(0)		
		}
		else {
			# compare our words to the dictionaries of positive & negative terms
			pos.matches = match(words, pos.words)
			neg.matches = match(words, neg.words)
	
			# match() returns the position of the matched term or NA
			# we just want a TRUE/FALSE:
			pos.matches = !is.na(pos.matches)
			neg.matches = !is.na(neg.matches)

			# and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
			score = sum(pos.matches) - sum(neg.matches)
		}
			return(score)
		
	}, pos.words, neg.words, .progress=.progress )

	scores.df = data.frame(score=scores, text=sentences)
	return(scores.df)
}

united.scores = score.sentiment(united.text, pos.words, neg.words, .progress='text', 'flymetolondon')

library(ggplot2)
qplot(united.scores$score)
hist(united.scores$score)
  

Project Euler #9

This is what Pier came up with using only Math … plan to whip up an R implementation soon … brilliant

Without programming:

\(a= 2mn; b= m^2 -n^2;\)

\(c= m^2 + n^2;\)

\(a + b + c = 1000;\)

\(2mn + (m^2 -n^2) + (m^2 + n^2) = 1000;\)

\(2mn + 2m^2 = 1000;\)

\(2m(m+n) = 1000;\)

\(m(m+n) = 500;\)

\(m>n;\)

//Simple R routine goes here …

\(m= 20; n= 5; a= 200; b= 375; c= 425;\)

project euler #4 - largest palindrome from the product of two 3 digit numbers

more fun with R parsing, yick!

reverse <- function(x)
{
    sapply(lapply(strsplit(x, NULL), rev), paste,collapse=”“)
}

isPalindrome <- function(x, y)
{
    x<-toString(x)
    y<-reverse(x)
    a <- strtoi(x, 10)
    b <- strtoi(y, 10)
    if(a==b){ return(TRUE) } else {return(FALSE)}
}
   
maxPalindrome <- 0
number1 <- 0
number2 <- 0
for(i in 100:999)
{
    for(j in 100:999)
    {
        k <- i*j
        if(k>maxPalindrome)
        {
            if(isPalindrome(k))
            {
                maxPalindrome <- k
                number1 <- i
                number2 <- j
            }
        }
    }
}
print(maxPalindrome)
print(number1)
print(number2)

project euler #8 - parsing in R, well it’s good at other things

what I learned tonight is that string parsing in R kinda sucks … overcame limitations with the stringr library … problem was finding the max product of a 5 digit sequence in a 1000 digit number.

productMax <- 0
for(i in 1:996)
{
    j <- i+4
    x <-str_sub(raw, i, j)
    #print(x)
    y <- as.integer(str_sub(x, 1, 1))
    for(k in 2:5)
    {
        y <- c(y,as.integer(str_sub(x, k, k)))
    }
    #print(y)
    z <- 1
    for(m in 1:5)
    {
        z <- z*y[m];
       
        if(z==0)
        {
            break
        }
    }
    #print(z)   
    if(z>productMax) {productMax <- z}

}
print(productMax)

project euler #6 - difference in square of sums and sum of squares 1 to 100

pretty simple …

x <- seq(1,100,1)
> x
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26
 [27]  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52
 [53]  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78
 [79]  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
> sumOfSquares <- 0
> for(i in x){sumOfSquares <- c(sumOfSquares, i^2)}
> sumOfSquares
  [1]     0     1     4     9    16    25    36    49    64    81   100   121   144   169   196   225   256
 [18]   289   324   361   400   441   484   529   576   625   676   729   784   841   900   961  1024  1089
 [35]  1156  1225  1296  1369  1444  1521  1600  1681  1764  1849  1936  2025  2116  2209  2304  2401  2500
 [52]  2601  2704  2809  2916  3025  3136  3249  3364  3481  3600  3721  3844  3969  4096  4225  4356  4489
 [69]  4624  4761  4900  5041  5184  5329  5476  5625  5776  5929  6084  6241  6400  6561  6724  6889  7056
 [86]  7225  7396  7569  7744  7921  8100  8281  8464  8649  8836  9025  9216  9409  9604  9801 10000
> sum(sumOfSquares)
[1] 338350
> sum(x)^2
[1] 25502500
> sum(x)^2 - sum(sumOfSquares)
[1] 25164150

project euler #5 - smallest number evenly divisible by 1 thru 20

this one was pretty straightforward, trick was starting and stepping by the smallest number evenly divisible by 1 thru 10 which was in the example, execution time 2 seconds.

isDivisible <- function(x)
{
    for(i in 2:20)
    {
        if(x%%i==0){next}
        else return(F);
    }
    return(T);
}

i <- 2520
> while(i < y){if(isDivisible(i)) {print(i); break}; i <- i+2520; if(i%%1000000==0) print(“…”)}

project euler #3 - prime factorization

I was turned onto this project by Bobby at Auerilius who works on the Tinkerpop project.  Basically project euler is a collection of math problems that demand computational solutions.  Since i’m hobbying in math and trying to get acquainted with R, thought i’d start chipping away at this one.  First couple problems were very easy, though some time was spent just figuring out R language constructs to program the solutions.  The third was interesting, prime factorization to 600851475143.  Below is my cleaned up R console solution … only perf tricks I used were only inspecting values to \(\sqrt600851475143\), testing for even divisibility for modulus before testing whether it was a prime, and short circuiting the prime test.  Perf threshold for solutions on project euler is 1 minute, guessing I can improve upon the 15 seconds below.  Lastly, in futzing about for prime factorization algorithms, came across this ancient algorithm - Sieve of Eratosthenes, which is pretty impractical for the >600B number above, but nonetheless very interesting for smaller numbers.  Apparently sieving is very useful in cryptography.

> isPrime <- function(x)
+ {
+     y = sqrt(x)
+     for(i in 2:y)
+     {
+         if(x%%i==0) return(F)
+     }
+     return(T)
+ }
> for(i in 1:y){if(600851475143%%i==0 && isPrime(i)) {x<-c(x, i); print(i)}; if(i%%1000000==0) print(“…”)}
[1] 71
[1] 839
[1] 1471
[1] 6857
[1] “…”
[1] “…”
[1] “…”
[1] “…”
[1] “…”