Code Slinger

Computer hardware is great and I have lots of it as the owner of a computer company. However when you want that cold plastic composite and metal thingy to do something...you need code.

Code Projects
  • Code Projects in C++,C#,PHP,Java

    Code Projects in C++,C#,PHP,Java

  • Future Tech

    Future Tech

X-NLP

X-NLP

Extreme Natural Language Processing.
Read More
Data Bots

Data Bots

Internet Bots for Crawling, Scraping and Data Mining
Read More
Vid Automatic

Vid Automatic

Creating rich real time video products and streaming services.
Read More
VR Hacker

VR Hacker

The Hacking of Virtual Reality with the Oculus Rift.
Read More

XAMPP Blocked by W3PS on Windows 10

When trying to run XAMPP (a local web an PHP service) on my newly upgraded windows 10 machine I was tetting getting the following message:

(OS 10013) An attempt was made to access a socket in a way forbidden by its access permissions. : make_sock: cound not bind to address [::]:80
(OS 10013) An attempt was made to access a socket in a way forbidden by its access permissions. : make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
unable to open logs

xampp_windows10_socket_error

The problem is the installed IIS coponent World Wide Web Publishing Service (W3PS), which allows the computer accept HTTP requests and host pages like XAMPP does. Running this service blocks other programs from having access to port 80. To fix this problem you just need to disable the service. Here is the procedure.


In the windows 10 search box type in "Service" and click on "View Local Services".

windows_10_services_world_wide_web_publishing_service


Find and right click on "World Wide Web Publishing Service" and select "Properties".

world_wide_web_publishing_service_properties


Click on "Stop" and wait for it to stop the service.


On the startup type pulldown select "Manual" and click "Apply" then click on "Ok".

After the W3PS service is disabled you should now be able to start XAMPP normally. Happy XAMPPing :)

My Blogging New Years Resolution for 2016

My Blogging New Years Resolution 2016

There isn’t a week that goes by, sometimes several times a day, that I am not working on a project or brainstorming that I don’t say to myself “Hey that would make a great blog article, I should writ it.” Of course as this blog site clearly shows I have neglected to take action on those statements. Usually I am more eager to dive right into whatever the idea is work on it and before long something else comes along and that idea is pushed aside.

So I am writing the article as a proclimation to myself to do better this year and dedicate at leat 1 blog a week. Each blog I write should be no shorter then 2 paragraphs in length and I should either know the subject well or spend a half our or more researching the topic before writing about it. I know that most of the blogs will revolve about what current task or project I am muddling around in at the time, which is fine. That will actually allow me to look back and see what I have been working on and the relative time frames.

Ok 2016, lets do this!

PHP CVS to MySQL

csvtomysql

This PHP program is intended as a heavy duty automated tool for converting CSV (comma separated value) to a MySQL import file or querying it directly into a MySQL database. It can be used simply with a single static method call or with more flexibility and power by creating it as an object.

Download this CVS to MySQL from github.

It is not so trivial as one may first think to convert data from one format to another. To truly take advantage of MySQL querying of data in useful ways the data must be properly assigned a data type, such as INT, FLOAT, VARCHAR, TEXT, TIME and so on. The method implemented in this script to do this is REGEX. The regex pattern matching is leveraged to pigeon hole the data into the most appropriate type for MySQL to use. Every data entry must be scanned to detect if it is a certain type of data, a data type that maybe only integer numbers would of course become an INT type, but if even one of ten thousand entries has a decimal in it, then the entire set must become a float, double or numeric.

An INT simply converted to a VARCHAR is not very useful when trying to query data that should be an integer. The default regex rules file is “regex_mysql_data.txt” and has comment lines in it that start with the # character. You may want to go about modifying this file to fit your needs or to improve upon the matching capabilities.

Besides regex pattern matching, the data string length is also considered. This is done first to determine if that data should even be considered being compared to the regex pattern type. This is most useful for text based data to determine if it should be of types VARCHAR, TEXT, MEDIUMTEXT, LONGTEXT
Here is a simple static example to create a CSV to MySQL import.

CSVtoMySQL::ToHTML('test.csv');

All the static methods assume there is a header as the first row of the CSV file. This static method will try to detect a primary key, if it cannot determine a suitable primary key it will assign an INT at the beginning named ‘id’.

If you do not want to rely upon the auto detection of a primary key use this example:

CSVtoMySQL::ToHTMLMyKey ('test.csv', ‘MyID’);

Where “MyID” (optional) will become the name of the new primary key and no auto detection will be attempted.

-= THE STATIC METHODS =-

ToString – These static methods will display no output but only return the results as a string.

$string = CSVtoMySQL::ToString( $in_file [,$delim = ‘,’] )
$string = CSVtoMySQL::ToStringMyKey( $in_file [,$my_key = ‘id’ [,$delim = ‘,’]] )

ToFile – These methods will send the information to a file supplied as $out_file.
Null = CSVtoMySQL::ToFile( $in_file, $out_file, [$delim = ‘,’] )
Null = CSVtoMySQL::ToFileMyKey( $in_file, $out_file [,$my_key = ‘id’, [$delim = ‘,’]] )

ToScreen – These methods will print the mysql import information directly to the screen.

Null = CSVtoMySQL::ToScreen( $in_file [,$delim = ‘,’] )
Null = CSVtoMySQL::ToScreenMyKey( $in_file [,$my_key = ‘id’ [,$delim = ‘,’]] )

ToHTML – Like ToScreen methods but they will also add the HTML line break tag where the new line is.

Null = CSVtoMySQL::ToHTML( $in_file [,$delim = ‘,’] )
Null = CSVtoMySQL::ToHTMLMyKey( $in_file [,$my_key = ‘id’ [,$delim = ‘,’]] )

ToMySQL – These methods will use your mysql connection to send the mysql query directly to the database. You must have already connected to the mysql server and database before calling either of these methods.

Null = CSVtoMySQL::ToMySQL( $in_file [,$delim = ‘,’] )
Null = CSVtoMySQL::ToMySQLMyKey( $in_file [,$my_key = ‘id’ [,$delim = ‘,’]] )

-= CLASS USAGE =-

Creating a class object is more powerful then the static methods as there a lot of helper methods for fine tuning and debugging.

To create as an object:

$c2m = new CSVtoMySQL('test.csv');

//Then you can do something like:
$c2m->add_blank_tag('NA');
$c2m->add_blank_tag('M','PHONE');
$c2m->set_mysql_file(‘mymysql.sql’);
$c2m->detect_primary_key();
$c2m->to_file();

Here is another example where you export the CSV file and import directly to the mysql database.


<?php

require_once('CSVtoMySQL.php');

$sql = mysql_connect('xxx.xxx.xxx.xxx', 'user', 'password');
mysql_select_db('database',$sql);

$c2m = new CSVtoMySQL('test.csv');
$c2n->set_table_name(‘mytable’);
If($c2m->detect_primary_key() == false)
	{
	$c2m->add_primary_key(‘id’);
}
$c2m->to_mysql();

-= CLASS METHODS =-

The constructor:
__construct($csv, [$mysql = “mysql.sql” [,$hashead = true]])

This method loads the regex file and can be load a custom regex file.
Null = load_regex($regex_file = ”)

Reserved words are words that conflict with mysql syntax statements, such as VARCAR, INSERT, UPDATE, DATASE to prevent conflicts a rule file named “reserved_mysql_words.txt” is loaded and used to compare against the CSV header names. Any matches are renamed to prevent conflicts. You can override this file with your own using this method.
Null =load_reserved_words($f = ”)

Method to set the CSV file
Null = set_csv_file($file)

Method to set the path and name of the mysql output file, but only needed if actually creating an out file.
Null = set_mysql_file($file)

By default the CSV delimiter (data separator character) is comma “,” but there is an auto detect pass that will try and match with other common delimiters (such as |,tabs, spaces). If you need to set this manually use this method.
Null = set_delimiter($v)

Use this method to set the mysql table name, by default the table name is the name of the CSV file itself minus the extension.
Null = set_table_name($s)

When reading in CSV file line by line, the max length of each line is set to 0, which in PHP 5.1+ is unlimited to end of line. However, if you need to set this to a specific length use this method.
Null = set_max_line_length($v)

This method allows you to insert a new field that does not exist in the CSV file. $v is the name of the field, and the optional secondary value is the type which is defaulted to VARCHAR(255)
Null = add_field($v [,$type = ‘VARCHAR(255)’])

This method allows you to change the field name based on $n which can be an index number or name and $name is the new name to be given.
Bool = change_field_name($n,$name])

Use this method to set the primary key index. If $v is a number then the key is the field index, if a name it is matched against the header field name.
Bool = primary_key($v)

Like above method but only applies to the field name, not the index
Bool = primary_key_col_by_name($s)

Like above but only applies to setting the primary key by index, where the first field index = 0, not 1!
Bool = primary_key_col_by_number($n)

Add your own custom primary key with this method. This should be an INT as it will also be set to auto increment. Set the starting point of the auto increment public variable $user_primary_key_inc [ = 0]
Null = add_primary_key ([$name = ‘id’ [,$type = ‘INT’ [,$start_at = -1]]])

This method is used to try and detect which field in the CSV file should be used as the primary key. It begins with the first column and tries to match any INT or VARCHAR type that is all unique and contains no empty records. As soon as it finds one it sets that as the primary key. Also see notes in the “regex_mysql_data.txt” file. If $n is supplied it can either be a number which matches the index of the CSV column (where first column is 0, not 1) or the name of the actual column. This method retruns true if it was able to match a primary key, and false if it failed.
Bool = detect_primary_key($n = ”)

A helper method to test the types of fields detected
Null = print_types()

Same as above but outputs as HTML
Null = print_html_types()

The method to call for returning the results as a string.
String = to_string()

Send the output to the screen. I use it for when I am working in telnet or ssh
Null = to_screen()

Send the output like the to_screen() method but includes html breaks at the new line locations.
Null = to_html()

This method writes the output to a file, if you hadn’t already set the output file name you can supply it.
Bool = to_file([$file = ”])

This method sends the parsed CSV file directly to the MySQL database, you must have a connection already established (see usage above for an example.)
Bool = to_mysql()

Adds a blank tag identifier to the blank_tags array. Sometimes data will be in a CSV file that should be treated as if it were blank, such as with ‘NA’, ‘-‘, or the like. You can add global tag blanks with this method that cause this type of data to be ignored or treated as if it were empty. You can set the column field name here which apply the blank tag to just a specific column otherwise if blank it is treated globally against all columns.
Null = add_blank_tag($v [,$col = ”])

This method is ran automatically by several functions, but if you need to call it yourself you can. This method will attempt to determine the data type a column is using the “regex_mysql_data.txt” file and its rules.
Null = detect_types()

Used to try and detect if the CVS file contains a header. This is very problematic and not 100% accurate. By default the public variable $detect_header = false and must be set to true for this method to work. Otherwise it assumed there is a header. The method returns true if it detected a header and false if it did not.
Bool = detect_header($s)

-= ADDITIONAL CLASS HELPERS =-

CSVtoMySQL_DetectType is a class that is created and stored in the $regex_match_file array that contains the information from the “regex_mysql_data.txt” file.

CSVtoMySQL_FieldType is a class that is created and stored in the $fields array and contains information regard each CSV column and it’s fields.

Norton Anti-Virus Live Update Bug Causes Internet Explorer Unusable

Norton IE Bug

On Feb 20th 2015 Norton Anti-Virus live update was rolled out with a bug that has made Internet Explorer unusable for millions of users. Other browsers such as Chrome and FireFox seem to be unaffected and unless you already had one these alternatives installed it would be hard to find out any details on what is going on. How else do you download an alternative browser when your only browser doesn’t work?

The common error dump for this bug is:


Description
Faulting Application Path: C:\Program Files (x86)\Internet Explorer\iexplore.exe
Problem signature
Problem Event Name: BEX
Application Name: IEXPLORE.EXE
Application Version: 11.0.9600.17631
Application Timestamp: 54b31a70
Fault Module Name: IPSEng32.dll
Fault Module Version: 14.2.1.9
Fault Module Timestamp: 54c8223b
Exception Offset: 000c61e2
Exception Code: c0000417
Exception Data: 00000000
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 4105
Additional Information 1: 4f07
Additional Information 2: 4f072c04aa91eb87d88d7dd565652530
Additional Information 3: a15b
Additional Information 4: a15b24e56acca2f6a7c59c85b7f20aea

The file reported to be causing the error is the DLL file IPSEng32.dll part of Norton’s Identity Safe (NIS) however just turning that protection method off or uninstalling NIS does not fix the problem. The only current solution I have found is to actually fully remove the Norton product entirely.

After nearly 24 hours Norton still has yet to release a patch to fix this problem.

The community forums regarding the bug are going crazy starting with this thread Tonight’s update crashing IE11 started by Sunfox.

How High Can I get This Blog Post on Google Search?

graph

This is just a quick experiment to see where it will turn up on Google’s search results for ” How High Can I get This Blog Post on Google Search?”. It is not intended to be any kind of SEO trick or gimmick. There are no links going to it other than from this blog and I am not really trying to do any keyword stuffing either just writing a few paragraphs of whatever comes to mind.

If you have a unique title I think you can get to the top of the SERPS pretty easily. Not in all cases as on some long tail titles the engines will cherry pick a few keywords instead of using an exact match. Anyway after I post this I will add one link to the search query for the title and we can watch and see where it is and I will follow up with info in the comments section. Here goes….

Simple Ajax Tutorial

ajax

Ajax is at the heart of Web2.0 design it is used across the most popular websites from Facebook to Twitter. What is Ajax? It is a method of a web page making a request to another file or program to obtain data and then to dynamically show that data without reloading the page.

There are lots of Ajax libraries available, most of which claim they are easy to setup and understand but I find that incorrect in most instances as their API is fairly bloated and one can find it difficult to grasp what is the real Ajax part and the rest of the bloated API part. So here I will attempt to explain just the most basic fundamental aspect of Ajax.

Ajax uses just a few lines of JavaScript to request another page similar to how a form would access another page using either GET or POST method. However it is not necessary to send any GET or POST data unless the page that is being accessed is in itself dynamic responding to the GET or POST method request data fields. In this example we will ignore sending any data and forget about the page we are calling being any type of PHP, Perl, ASP, CGI dynamic page, we will simply request a regular text file.

The comments in the below code should tell you everything you need to know. I’ve stripped the entire Ajax process down to its most basic elements so you can look at the code and see what it is with no extra baggage that can me it confusing.

File: blurb.txt (make this file and safe it to your webserver)

My voice is my password, verify.

File: ajax.html (make this file and save it to your webserver)

<html>
<head>
<title>Simple Ajax Example</title>

<script language="Javascript">
function simpleAjax()
	{
	var xmlhttp = new XMLHttpRequest(); // The XMLhttpRequest is the built object that actually dos the Ajax call
	xmlhttp.onreadystatechange = function () // this function is what will be called AFTER the requested page has been fetched
		{
		if (xmlhttp.readyState == 4 && xmlhttp.status == 200) // verifies the status of the fetched page to be OK
			{
				var x = xmlhttp.responseText; // Get the results of the fetched page and put in the variable "x"
				document.getElementById("data").innerHTML = x; // change the contents of DIV with id "data" to the value of "x"
			}
		}
	xmlhttp.open("GET", "blurb.txt", true); // the page request to make, using method GET (could be POST) and is located in the same dir path as calling file
	xmlhttp.send(); // send the request which then calls the above function to process the results
	}
</script>

</head>
<body>

<a href="" onclick="simpleAjax();return false;">Click Here for Ajax</a>

<div id="data">
	What is your Password?
</data>

</body>
</html>

Goto http://(yourdomain)/ajax.html and click on “Click Here for Ajax”. You should see the text in the DIV with id=”data” change. That is all Ajax is.

Text to Speech with PHP (TTS)

elephanttalk

A really quick and easy Text to Speech class for PHP that will generate an MP3 file. You can also easily setup an Ajax call on a website to play the text to speech audio file as you generate it. The quality is pretty good compared to other solutions but I haven’t figured out how to adjust pitch and tone or mix background music with it (yet.)

This is not really a solid solution for robust TTS as it relies on Google’s TTS API service; for more advanced solutions with lots of controls and embedding into videos we use Microsoft’s TTS system. But for easy to deploy and on demand web services this solution is a synch.

This Text to Speech with PHP version can be found on my CodeSlinger GitHub page.

<?php
/******************************************************************
Projectname:   PHP Text 2 Speech Class 
Version:       1.0 
Author:        Radovan Janjic <rade@it-radionica.com> 
Last modified: 11 06 2013 
Copyright (C): 2012 IT-radionica.com, All Rights Reserved 

* GNU General Public License (Version 2, June 1991) 
* 
* This program is free software; you can redistribute 
* it and/or modify it under the terms of the GNU
* General Public License as published by the Free
* Software Foundation; either version 2 of the License, 
* or (at your option) any later version.
* 
* This program is distributed in the hope that it will
* be useful, but WITHOUT ANY WARRANTY; without even the 
* implied warranty of MERCHANTABILITY or FITNESS FOR A 
* PARTICULAR PURPOSE. See the GNU General Public License 
* for more details. 

Description: 

PHP Text 2 Speech Class 

This class converts text to speech using Google text to  
speech API to transform text to mp3 file which will be  
downloaded and later used as eg. embed file.  

Example: 

****************************************************************** 
<?php
$t2s = new PHP_Text2Speech; 
?> 

// Simple example 
<audio controls="controls" autoplay="autoplay"> 
  <source src="<?php echo $t2s->speak('If you hear this sount it means that you are using PHP text to speech class.'); ?>" type="audio/mp3" /> 
</audio>

// Example use of other language 
<audio controls="controls" autoplay="autoplay"> 
  <source src="<?php echo $t2s->speak('Wie geht es Ihnen', 'de'); ?>" type="audio/mp3" /> 
</audio> 

******************************************************************/ 

class PHP_Text2Speech { 
     
    /** Max text characters
     * @var    Integer  
     */ 
    var $maxStrLen = 100; 
     
    /** Text len
     * @var    Integer  
     */ 
    var $textLen = 0; 
     
    /** No of words 
     * @var    Integer  
     */ 
    var $wordCount = 0; 
     
    /** Language of text (ISO 639-1) 
     * @var    String  
     * @link https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes 
     */ 
    var $lang = 'en'; 
     
    /** Text to speak 
     * @var    String  
     */ 
    var $text = NULL; 
     
    /** File name format 
     * @var    String  
     */ 
    var $mp3File = "%s.mp3"; 
     
    /** Directory to store audio file
     * @var    String  
     */ 
    var $audioDir = "audio/"; 

    /** Contents 
    * @var    String 
    */ 
    var $contents = NULL; 
     
    /** Function make request to Google translate, download file and returns audio file path 
     * @param     String     $text        - Text to speak 
     * @param     String     $lang         - Language of text (ISO 639-1) 
     * @return     String     - mp3 file path 
     * @link https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
     */ 
    function speak($text, $lang = NULL) { 
         
        if ($lang !== NULL) { 
            $this->lang = $lang; 
        } 

        // Create dir if not exists 
        if (!is_dir($this->audioDir)) { 
            mkdir($this->audioDir, 0755) or die('Could not create audio dir: ' . $this->audioDir); 
        } 
         
        // Try to set writing permissions for audio dir. 
        if (!is_writable($this->audioDir)) {  
            chmod($this->audioDir, 0755) or die('Could not set appropriate permissions for audio dir: ' . $this->audioDir); 
        } 
         
        // Can not handle more than 100 characters so split text 
        if (strlen($text) > $this->maxStrLen) {
            $this->text = $text;

            // Generate unique mp3 file name 
            $file = sprintf($this->mp3File, $this->audioDir . md5($this->text)); 
            if (!file_exists($file)) {
                $texts = array();
                $words = explode(' ', $this->text);
                $i = 0;
                $texts[$i] = NULL;
                foreach ($words as $w) {
                    $w = trim($w);
                    if (strlen($texts[$i] . ' ' . $w) < $this->maxStrLen) {
                        $texts[$i] = $texts[$i] . ' ' . $w;
                        if (preg_match('/[:;,.!?-]$/', $w)) { $i++; } // seperate at common breaks
                    } else {
                        $texts[++$i] = $w;
                    } 
                }

                // Get get separated files contents and marge them into one
                foreach ($texts as $txt) {
                    $pFile = $this->speak($txt, $this->lang); 
                    $this->contents .= $this->stripTags(file_get_contents($pFile)); 
                    unlink($pFile);
                }
                unset($words, $texts); 
                 
                // Save file
                file_put_contents($file, $this->contents); 
                $this->contents = NULL;
            }
        } else {
             
            // Generate unique mp3 file name 
            $file = sprintf($this->mp3File, $this->audioDir . md5($text)); 

            if (!file_exists($file)) { 
                // Text lenght 
                $this->textLen = strlen($text); 
                 
                // Words count 
                $this->wordCount = str_word_count($text); 

                // Encode string 
                $text = urlencode($text);

                // Download new file
                $this->download("http://translate.google.com/translate_tts?ie=UTF-8&q={$text}&tl={$this->lang}&total={$this->wordCount}&idx=0&textlen={$this->textLen}", $file);
            }
        } 
         
        // Returns mp3 file path 
        return $file; 
    } 
     
    /** Function to find the beginning of the mp3 file 
     * @param     String     $contents        - File contents 
     * @return     Integer 
     */  
    function getStart($contents) { 
        for($i=0; $i < strlen($contents); $i++){ 
            if(ord(substr($contents, $i, 1)) == 255){ 
                return $i; 
            } 
        } 
    } 
     
    /** Function to find the end of the mp3 file 
     * @param     String     $contents        - File contents 
     * @return     Integer 
     */  
    function getEnd($contents) { 
        $c = substr($contents, (strlen($contents) - 128)); 
        if(strtoupper(substr($c, 0, 3)) == 'TAG'){ 
            return $c; 
        }else{ 
            return FALSE; 
        } 
    } 

    /** Function to remove the ID3 tags from mp3 files 
     * @param     String     $contents        - File contents
     * @return     String
     */
    function stripTags($contents) {
        // Remove start
        $start = $this->getStart($contents);
        if ($start === FALSE) { 
            return FALSE;
        } else { 
            return substr($contents, $start);
        } 
        // Remove end tag 
        if ($this->getEnd($contents) !== FALSE){ 
            return substr($contents, 0, (strlen($contents) - 129));
        } 
    } 

    /** Function to download and save file 
     * @param     String     $url        - URL 
     * @param     String     $path         - Local path 
     */
    function download($url, $path) {  
        // Is curl installed? 
        if (!function_exists('curl_init')){ // use file get contents  
            $output = file_get_contents($url);
        }else{ // use curl  
            $ch = curl_init();  
            curl_setopt($ch, CURLOPT_URL, $url);  
            curl_setopt($ch, CURLOPT_AUTOREFERER, true);
            curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");  
            curl_setopt($ch, CURLOPT_HEADER, 0);  
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);  
            curl_setopt($ch, CURLOPT_TIMEOUT, 10);
            $output = curl_exec($ch);  
            curl_close($ch);
        } 
        // Save file
        file_put_contents($path, $output);
    }
}

Here is an example of using the TTS class in PHP.

<?php
include 'PHP_Text2Speech.class.php';

$t2s = new PHP_Text2Speech;
?>

<audio controls="controls" autoplay="autoplay">
  <source src="<?php echo $t2s->speak('What are you looking at? Wipe that face off your head.'); ?>" type="audio/mp3" />
</audio>

New gTLD Domain Extensions Timeline (spreadsheet)

landrush

There is a paradigm shift coming to the way we view domain names. I predict the .com to be a dinosaur in the following years. I am so sure of that I’m betting on it by acquiring some of them that fit my current business model. However it has been extremely confusing on the new gTLD release dates.

First there is the Sunrise phase, which allows national and international registered trademark holders to secure their name brands. Self use, local and pending applications need not apply to this phase—you must wait till the next phase if applicable.

Landrush typically comes after Sunrise (however some gTLD’s do not offer it.) During the Landrush you can pay extra get it sooner and beat out the competition.
Finally there is General Availability and that is where the scraps are picked up. Sure you can still get lucky at this point, I have. But keep in mind many services out there allow for pre-registration and payment for general availability, so if several others have also pre-registered this same domain (which is common) your chances of getting that stellar name is very low.

To make the gTLD domain name release dates a bit more easy to digest I put together a Google Spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0AiSCqnEYeYZcdEkyLXRncUdKaGx0THJnNmxjM1duVVE&usp=sharing

    KEY:

  • SR (yellow) = Sunrise
  • PL (pink) = Pre Land Rush
  • LR (orange) = Land Rush
  • PG (light green) = Pre General Availability
  • GA (green) = General Availability

PHP Class Wrapper for Stanford Part of Speech Tagger

electric_green_brain

File: class_Stanford_POS_Tagger.php

Over the last several years I have been dabbling in part of speech tagging, using various natural language processing (NLP) systems. I especially wanted something that would work with PHP as most of my web programming is done with this scripting language. PHP offers a lot of advantages for quick prototyping and testing with its very flexible use of variables, strings and arrays. Unfortunately nearly every POS tagger I have tested written in PHP was either poorly designed, broken, had erroneous errors or was no longer supported, in many cases all the above. The best results of any NLP tagging system seemed to be the one developed by Stanford but only available in Java.

I had written a wrapper some years ago in C# for testing, but it was not very useful for the projects I had in mind. Eventually I came up with the following class in PHP to also wrap. This class includes a lot more functionality then a simple tagger but has variable settings you can change. Most of these functionality options are necessary for my own project but others may find them useful as well.

You will need to download the Stanford post tagger from here http://nlp.stanford.edu/downloads/tagger.shtml

This in turn requires that you have Java 1.6 or newer installed to run it.
When using this class you will need to pass the directory location of the above Stanford tagger to the constructor like so:


$pos = new Stanford_POS_Tagger(‘somewhere/StanfordNLP/stanford-postagger-2014-01-04’);

<?php

/**
 * PHP Class Stanford POS Tagger 1.1.0 - PHP Wrapper for Stanford's Part of Speech Java Tagger
 * Copyright (C) 2014 Charles R Hays http://www.charleshays.com
 *
 * file: class_Stanford_POS_Tagger.php
 *
 * @version 1.1.0 (2/4/2014)
 *		1.0.0 - release
 *		1.1.0 - added merge cardinal numbers
 *
 * @requirements
 *		1)Requires stanford postagger 3.3.1 or newer. Download @ http://nlp.stanford.edu/downloads/tagger.shtml
 *
 *		2)In turn the stanford postagger requires Java 1.6+ to be installed and about 60MB of memory.
 *
 * @example
 * 		require('class_Stanford_POS_Tagger.php');
 *		$pos = new Stanford_POS_Tagger();
 * 		print_r($pos->array_tag("The cow jumped over the moon and the dish ran away with the spoon."));
 *

    This library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
    License as published by the Free Software Foundation; either
    version 2.1 of the License, or (at your option) any later version.

    This library is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    Lesser General Public License for more details.

    You should have received a copy of the GNU Lesser General Public
    License along with this library; if not, write to the Free Software
    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
 */

class Stanford_POS_Tagger
	{
	////////////////////////////////////////////////////////////////////////////
	// POS TAGGER MODELS
	////////////////////////////////////////////////////////////////////////////
	/*
	english-bidirectional-distsim.tagger
	Trained on WSJ sections 0-18 using a bidirectional architecture and
	including word shape and distributional similarity features.
	Penn Treebank tagset.
	*/
	//private $model = 'english-bidirectional-distsim.tagger'; // 97.32% accuracy - slow

	/*
	english-left3words-distsim.tagger
	Trained on WSJ sections 0-18 and extra parser training data using the
	left3words architecture and includes word shape and distributional
	similarity features. Penn tagset.
	*/
  	private $model = 'english-left3words-distsim.tagger'; // 96.97% accuracy - fast

	////////////////////////////////////////////////////////////////////////////
	// Java variables
	////////////////////////////////////////////////////////////////////////////
	private $java_path = 'java'; // the command to run java
	private $java_options = array(); // array of java switch options
	private $jar = 'stanford-postagger.jar'; // the jar to use located in $path
	private $path = '';	// path to where the standford postagger directory resides

	////////////////////////////////////////////////////////////////////////////
	// Temporary files - the text is stored in a tmp file which is parsed
	////////////////////////////////////////////////////////////////////////////
	private $tmp_path = '/tmp'; // directory to store tmp file
	private $tmp_prefix = 'posttagger'; // prefix of tmp file
	private $tmp_permission = 0644; // permission to set tmp file

	////////////////////////////////////////////////////////////////////////////
	// POS Tag seperator such as John_NNP where _ is the seperator
	////////////////////////////////////////////////////////////////////////////
	private $separator = '_'; // used for tagged output
	private $best_separator = '#_#'; // used for better seperation when used in array output

	////////////////////////////////////////////////////////////////////////////
	// Sanatizing text
	////////////////////////////////////////////////////////////////////////////
	private $use_pspell = true; // Use Pspell for spell checking (if installed)

	////////////////////////////////////////////////////////////////////////////
	// In Array Tag Options - For us with array_tag() method only
	////////////////////////////////////////////////////////////////////////////
	private $hash_type = 'md5'; // Hash types for sentence include 'none', 'md5', 'base64', 'sha1' (http://us3.php.net/manual/en/function.hash.php)

	private $merge_proper_nouns = true; // so "John_NNP" "Smith_NNP" becomes "John Smith_NNP"
	private $merge_cardinal_numbers = true; // so "one hundred and thirty" or "two and a half" is grouped as a single CD

	private $sequence_tags = true; // numbers the order of each tag occurance

	private $tag_mask_types = true; // adds new field in array records that masks with a * specified tag types in list below.
	private $tag_mask_list = array(
		//'#',		// Pound sign
		//'$',		// Dollar sign
		//'"',		// Close double quote
		//'``',		// Open double quote
		//"'",		// Close single quote
		//'`',		// Open single quote
		//',',		// Comma
		'.',		// Final punctuation
		//':',		// Colon or semi-colon
		//'-LRB-',// Left bracket
		//'-RBR-',// Right bracket
		//'CC',		// Coordinating conjunction : and, but, or, yet, for, nor, so
		'CD',			// Cardinal number 1, 2, 3, one, two, three hundred

		//'DT',			// Determiner :
		//'EX',			// Existential there : There is a cult of ignorance in the United States.
		//'FW',			// Foreign word :
		//'IN',			// Preposition : links nouns, pronouns and phrases to other words in a sentence. on, beneath, against, beside, over, during

		'JJ',			// adjective : sweet, angry, bright, cold, long : also orignal numbers like "3rd" fastest, "6th" place
		'JJR',		// comparitive adjective : sweeter, angrier, brighter, colder, longer
		'JJS',		// superlative adjective : sweetest, angriest, brightest, coldest, longest

		//'LS',			// List item marker :
		//'MD',			// Modal : can, may, must, should, would

		'NN', 		// singular noun : girl, mother, nurse, city, town, bicycle, doll, train, dream, truth, pride, colony, team, litter, covey
		'NNS',	 	// plural noun : children, men, girls, mothers, nurses, cities, towns, bikes, dolls, trains, dreams, colonies, teams, litters
		'NNP',		// proper singular noun : John, Smith, Pizza Hut
		'NNPS',		// proper plural noun: Kennedys

		//'PDT',	// predeterminer : all, both, half
		//'POS',	// possessive ending : 's, s'
		//'PRP',	// personal pronoun : I, me, myself, we us, ourselves, you, yourself (http://en.wikipedia.org/wiki/English_personal_pronouns)
		//'PP?',	// possessive pronouns : her, your, his, hers, my, their, yours, whose, one's, theirs, its, our (http://examples.yourdictionary.com/examples-of-possessive-pronouns.html)
		'RB',		// adverb : slowly, now, soon, suddenly (http://en.wikipedia.org/wiki/Adverb)
		'RBR',		// comparative adverb : more quietly, more carefully, more happily, harder, faster, earlier
		'RBS',		// superlative adverb : most quiely, most carefully, most happily, hardest, fastest, earliest
		'RP',		// particle : prepositions that modify a verb instead of a noun. along, away, back, by, down, forward, in, off, on, out, over, round, under, up
		//'SYM',	// symbol :
		//'TO',		// to
		'UH',		// injection : ah, oh, brrr, oops, huh?, booh, eh, mwahaha, bwahaha, yay, yuck, yeah (http://www.vidarholen.net/contents/interjections/)

		'VB',		// verb, base form : walk, skip, jump
		'VBD',		// verb, past tense : walked, shipped, jumped
		'VBG',		// verb, gerund/present participle : walking, skipping, jumping
		'VBN',		// verb, past participle : have walked, have skipped, have jumped
		'VBP',		// verb, non 3rd person: sing, present :
		'VBZ',		// verb, 3rd person: sing, present :
		//'WDT',		// wh-determiner : what, which, whose, whatever, whichever
		//'WP',		// wh-pronoun : what, which, where, when, who, whom, whose. (And maybe: whether.)
		//'WP$',		// possesive wh-pronoun : whose
		//'WRB',		// wh-adverb : how, where, when
		//' '			// blank space
		);

	////////////////////////////////////////////////////////////////////////////
	// Methods
	////////////////////////////////////////////////////////////////////////////

	public function __construct($path = '', $java_options = array('-mx300m'))
		{
		if(trim($path) == '')
			{
			$path = __DIR__;
			}
		$this->set_path($path);
		$this->set_java_options($java_options);
		$this->set_model($this->model);
		}

	public function set_path($path)
		{
		$this->path = trim(rtrim(trim($path),'/')).'/';
		}

	public function merge_proper_nounds($val = true)
		{
		$this->merge_proper_nouns = $val;
		}

	public function sequence_tags($val = true)
		{
		$this->sequence_tags = $val;
		}

	public function tag_mask_types($val = true)
		{
		$this->tag_mask_types = $val;
		}

	public function tag_mask_list($taglist = array())
		{
		$this->tag_mask_types_list= $taglist;
		}

	public function set_hash($val = '')
		{
		if($val == '') $val = 'none';

		$this->hash_type = $val;
		}


	public function set_stanford_path($path)
		{
		$this->path = trim(rtrim($path,'/'));
		}

	public function set_model($model)
		{
		$this->model = trim(ltrim($model));
		}

	public function get_model()
		{
		return rtrim($this->path,'/').'/models/'.ltrim($this->model,'/');
		}

	public function get_jar()
		{
		return rtrim($this->path,'/').'/'.ltrim($this->jar,'/');
		}

	public function set_jar($jar)
		{
		$this->jar = trim(ltrim($jar));
		}

	public function set_java_path($java_path)
		{
		$this->java_path = trim($java_path);
		}

	public function set_java_options($java_options = array())
		{
		$this->java_options = $java_options;
		}

	public function set_tmp_path($path)
		{
		$this->tmp_path = trim(rtrim($path,'/'));
		}

	public function set_tmp_prefix($prefix)
		{
		$this->tmp_prefix = trim(ltrim($prefix,'/'));
		}

	public function set_tmp_permission($perm)
		{
		$this->tmp_permission = $perm;
		}

	public function set_tag_separator($separator = '_')
		{
		$this->separator = trim($separator);
		}

	public function get_tag_separator()
		{
		return $this->separator;
		}

	public function tag($txt,$normalize = true,$separator = '')
		{
		if(!file_exists($this->get_jar()))
			{
			throw new Exception("Jar not found: ".$this->get_jar());
			}
		if(!file_exists($this->get_model()))
			{
			throw new Exception("Model not found: ".$this->get_model());
			}
		if($separator == '')
			{
			$separator = $this->separator;
			}

		$tf = tempnam($this->tmp_path, $this->tmp_prefix);
		chmod($tf, octdec($this->tmp_permission));

		chmod($tf, 0644);

		$words = explode(' ',$txt);

		if($this->use_pspell)
			{
			$txt = $this->spellcheck($txt);
			}

		file_put_contents($tf, $txt);

		$options = implode(' ', $this->java_options);
		$model = $this->path.'/'.$this->model;

		$descriptorspec = array(
			0 => array("pipe", "r"),  // stdin
			1 => array("pipe", "w"),  // stdout
			2 => array("pipe", "w")   // stderr
			);

		$cmd = escapeshellcmd('java '.$options.' -cp "'.$this->jar.';" edu.stanford.nlp.tagger.maxent.MaxentTagger -model '.$this->get_model().' -textFile '.$tf.' -outputFormat slashTags -tagSeparator '.$separator.' -encoding utf8');


		$process = proc_open($cmd, $descriptorspec, $pipes, dirname($this->get_jar()));

		$output = null;
		$errors = null;
		if(is_resource($process))
			{
			// ignore stdin - input
			fclose($pipes[0]);

			// get stdout - output
			$output = stream_get_contents($pipes[1]);
			fclose($pipes[1]);

			// get stderr - errors
			$errors = stream_get_contents($pipes[2]);
			fclose($pipes[2]);

			// prevent deadlock by closing pipe before calling proc_close
			$return_value = proc_close($process);
			if($return_value == -1)
				{
				throw new Exception("Java process error: ".$cmd);
				}
			}

		unlink($tf);

		return $output;
		}

	public function array_tag($txt,$normalize = true)
		{
		return $this->tagged_to_array($this->tag($txt,$normalize,$this->best_separator),$this->best_separator);
		}

	public function tagged_to_array($tagged, $separator)
		{
		$arr = array();

		if(!$tagged) return $arr;

		if($separator == '')
			{
			$separator = $this->separator;
			}

		$sentences = explode("\n", $tagged);
		foreach($sentences as $k => $v)
			{
			$sequence = array();
			if(trim($v) == '')
				{
				continue;
				}
			$tagrec = array();
			$tags = explode(' ', trim($v));
			$last_tag = 'START';
			$i = 0;
			foreach($tags as $kk => $vv)
				{
				$parts = explode($separator, trim($vv));
				$tag = array();

				// start - merge proper nouns
				if($this->merge_proper_nouns)
					{
					if(($parts[1] == 'NNP') || ($parts[1] == 'NNPS'))
						{
						if(($last_tag == 'NNP') || ($last_tag == 'NNPS'))
							{
							$tagrec[$i - 1][token] .= ' '.$parts[0]; // append this word to last token
							$tagrec[$i - 1][tag] = $parts[1]; // the final proper noun type is used
							continue;
							}
						}
					}

				// end - merge proper nouns

				// start - merge cardinal numbers
				if($this->merge_cardinal_numbers)
					{
					if($parts[1] == 'CD')
						{
						if($last_tag == 'CD')
							{
							$tagrec[$i - 1][token] .= ' '.$parts[0]; // append this word to last token
							continue;
							}
						}
					}

				// end - merge cardinal numbers

				$last_tag = $parts[1];

				$tag[token] = $parts[0];
				$tag[tag] = $parts[1];

				// start - sequence tags
				if($this->sequence_tags)
					{
					if($sequence[$parts[1]] > 0)
						{
						$sequence[$parts[1]]++;
						}
					else
						{
						$sequence[$parts[1]] = 1;
						}
					$tag[seq] = $sequence[$parts[1]];
					}
				// end sequence proper nouns

				// start - tag masking
				if($this->tag_mask_types)
					{
					if(in_array($parts[1],$this->tag_mask_list))
						{
						$tag[mask] = '*';
						}
					else
						{
						$tag[mask] = $parts[0];
						}

					}
				// end - tag masking

				$tagrec[] = $tag;
				$i++;

				}

			$tagdata = array();
			$tagdata[tagged] = $tagrec;

			$tagdata[sentence] = '';
			$tagdata[tag_set] = '';
			$tagdata[mask_set] = '';
			foreach($tagrec as $k => $v)
				{
				// sentence
				if($tagdata[sentence] != '') $tagdata[sentence] .= ' ';
				$tagdata[sentence] .= $v[token];

				// tag set
				if($tagdata[tag_set] != '') $tagdata[tag_set] .= ' ';
				if($this->sequence_tags)
					{
					$tagdata[tag_set] .= '{'.$v[tag].'-'.$v[seq].'}';
					}
				else
					{
					$tagdata[tag_set] .= '{'.$v[tag].'}';
					}

				// mask set
				if($tagdata[mask_set] != '') $tagdata[mask_set] .= ' ';
				if($v[mask] == '*')
					{
					if($this->sequence_tags)
						{
						$tagdata[mask_set] .= '{'.$v[tag].'-'.$v[seq].'}';
						}
					else
						{
						$tagdata[mask_set] .= '{'.$v[tag].'}';
						}
					}
				else
					{
					$tagdata[mask_set] .= $v[mask];
					}
				}

			// generate hashes
			if($this->hash_type == 'md5')
				{
				$tagdata[hash_sentence] = md5($tagdata[sentence]);
				$tagdata[hash_tag_set] = md5($tagdata[tag_set]);
				$tagdata[hash_mask_set] = md5($tagdata[mask_set]);
				}
			else if($this->hash_type == 'base64')
				{
				$tagdata[hash_sentence] = base64_encode($tagdata[sentence]);
				$tagdata[hash_tag_set] = base64_encode($tagdata[tag_set]);
				$tagdata[hash_mask_set] = base64_encode($tagdata[mask_set]);
				}
			else if($this->hash_type == 'sha1')
				{
				$tagdata[hash_sentence] = sha1($tagdata[sentence]);
				$tagdata[hash_tag_set] = sha1($tagdata[tag_set]);
				$tagdata[hash_mask_set] = sha1($tagdata[mask_set]);
				}

			$arr[] = $tagdata; // add seqntence array to output array
			}

		return $arr;
		}

	public function spellcheck($txt)
		{
		$o = '';
		if(function_exists(pspell_new))
			{
			$pspell_link = pspell_new("en");
			foreach($words as $k => $v)
				{
				if (!pspell_check($pspell_link, $v))
					{
					$o .= pspell_suggest($pspell_link, $v).' ';
					}
				}
			$txt = $o;
			}
		return $txt;
		}	

	}

// EOF

A quick example:


require('class_Stanford_POS_Tagger.php');
$pos = new Stanford_POS_Tagger();
print_r($pos->array_tag("The cow jumped over the moon and the dish ran away with the spoon."));

Resulting output:

Array
(
    [0] => Array
        (
            [tagged] => Array
                (
                    [0] => Array
                        (
                            [token] => The
                            [tag] => DT
                            [seq] => 1
                            [mask] => The
                        )

                    [1] => Array
                        (
                            [token] => cow
                            [tag] => NN
                            [seq] => 1
                            [mask] => *
                        )

                    [2] => Array
                        (
                            [token] => jumped
                            [tag] => VBD
                            [seq] => 1
                            [mask] => *
                        )

                    [3] => Array
                        (
                            [token] => over
                            [tag] => IN
                            [seq] => 1
                            [mask] => over
                        )

                    [4] => Array
                        (
                            [token] => the
                            [tag] => DT
                            [seq] => 2
                            [mask] => the
                        )

                    [5] => Array
                        (
                            [token] => moon
                            [tag] => NN
                            [seq] => 2
                            [mask] => *
                        )

                    [6] => Array
                        (
                            [token] => and
                            [tag] => CC
                            [seq] => 1
                            [mask] => and
                        )

                    [7] => Array
                        (
                            [token] => the
                            [tag] => DT
                            [seq] => 3
                            [mask] => the
                        )

                    [8] => Array
                        (
                            [token] => dish
                            [tag] => NN
                            [seq] => 3
                            [mask] => *
                        )

                    [9] => Array
                        (
                            [token] => ran
                            [tag] => VBD
                            [seq] => 2
                            [mask] => *
                        )

                    [10] => Array
                        (
                            [token] => away
                            [tag] => RB
                            [seq] => 1
                            [mask] => *
                        )

                    [11] => Array
                        (
                            [token] => with
                            [tag] => IN
                            [seq] => 2
                            [mask] => with
                        )

                    [12] => Array
                        (
                            [token] => the
                            [tag] => DT
                            [seq] => 4
                            [mask] => the
                        )

                    [13] => Array
                        (
                            [token] => spoon
                            [tag] => NN
                            [seq] => 4
                            [mask] => *
                        )

                    [14] => Array
                        (
                            [token] => .
                            [tag] => .
                            [seq] => 1
                            [mask] => *
                        )

                )

            [sentence] => The cow jumped over the moon and the dish ran away with the spoon .
            [tag_set] => {DT-1} {NN-1} {VBD-1} {IN-1} {DT-2} {NN-2} {CC-1} {DT-3} {NN-3} {VBD-2} {RB-1} {IN-2} {DT-4} {NN-4} {.-1}
            [mask_set] => The {NN-1} {VBD-1} over the {NN-2} and the {NN-3} {VBD-2} {RB-1} with the {NN-4} {.-1}
            [hash_sentence] => c2e9c7366d2f86736fa292b1425e9cf8
            [hash_tag_set] => 8bcfb4f7c0bc8de88bcc7252ace64267
            [hash_mask_set] => da050153d01b3e8045c9c7afe12d7945
        )

)

This project also available on Github @ https://github.com/TheCodeSlinger/PHP-Class-Stanford-POS-Tagger

Who is Googhydr-20? It is Amazon!

whoisit

You probably found this blog from the keyword “googhydr” either ending in -20 or -21. This is an Amazon affiliate id and the -20 indicates US market while -21 is the UK market. I too did some digging into this after researching keywords and SERP listings and Adwords listings and kept seeing this Amazon affiliate linking directly to Amazon. Being an Amazon affiliate myself it was easy to figure out the affiliate ID googhydr-20 from the link.

The name is kind of ominous if where googhydr is pronounced “Goog Hider” which could mean “Google Hider”, so is the associate trying to hide from Google?

I had been contimplating doing the same thing this googhydr character is doing for some time. It seems so easy why isn’t everyone doing it? I started digging around and reading up on Amazon’s rules.

This Adwords listing by the Amazon affiliate was doing exactly what is prohibited to Amazon associates to do. In the Amazon Associates Operating Agreement, section 7:

“Prohibited Paid Search Placement” means an advertisement that you purchased through bidding on keywords, search terms, or other identifiers (including Proprietary Terms) or other participation in keyword auctions. “Proprietary Term” means keywords, search terms, or other identifiers that include the word “amazon,” “Kindle,” “myhabit,” or “Javari,” or any other trademark of Amazon or its affiliates ( see a non-exhaustive list of our trademarks), or variations or misspellings of any of those words (e.g., “ammazon,” “amaozn,” “kindel,” and “javary”). “Redirecting Link” means a link that sends users indirectly to the Amazon Site via an intermediate site or webpage and without requiring the user to click on a link or take some other affirmative action on that intermediate site or webpage. “Search Engine” means Google, Yahoo, Bing, or any other search engine, portal, sponsored advertising service, or other search or referral service, or any site that participates in any of their respective networks.

If you do a search for Googhydr you might come across this website http://www.googhydr.com/ where the page claims to be an independent reseller who has tagged every possible Google search keyword. The page then goes onto to convince you to click on a link to buy a book on “Exploiting Amazon and Adwords PPC”. I can tell you that the claim on the website is false and a scam.

Googhydr is Amazon itself running its own Adwords campaigns.

It might not seem fair that Amazon who already dominates the organic search engine keywords for just about every product also adds itself to Adwords keywords and the same time won’t allow any of its associates to do the same. Well as they say life isn’t fair.

A few related discussions about this topic include

  • http://forums.prospero.com/n/mb/message.asp?webtag=am-associhelp&msg=32189.1&search=y
  • http://www.warriorforum.com/main-internet-marketing-discussion-forum/289507-google-amazon-affiliate.html