Simple Ajax Tutorial

May 15, 2014 by Charles Hays in Ajax, Javascript | Leave a comment

Ajax is at the heart of Web2.0 design it is used across the most popular websites from Facebook to Twitter. What is Ajax? It is a method of a web page making a request to another file or program to obtain data and then to dynamically show that data without reloading the page.

There are lots of Ajax libraries available, most of which claim they are easy to setup and understand but I find that incorrect in most instances as their API is fairly bloated and one can find it difficult to grasp what is the real Ajax part and the rest of the bloated API part. So here I will attempt to explain just the most basic fundamental aspect of Ajax.

Ajax uses just a few lines of JavaScript to request another page similar to how a form would access another page using either GET or POST method. However it is not necessary to send any GET or POST data unless the page that is being accessed is in itself dynamic responding to the GET or POST method request data fields. In this example we will ignore sending any data and forget about the page we are calling being any type of PHP, Perl, ASP, CGI dynamic page, we will simply request a regular text file.

The comments in the below code should tell you everything you need to know. I’ve stripped the entire Ajax process down to its most basic elements so you can look at the code and see what it is with no extra baggage that can me it confusing.

File: blurb.txt (make this file and safe it to your webserver)

My voice is my password, verify.

File: ajax.html (make this file and save it to your webserver)

<html>
<head>
<title>Simple Ajax Example</title>

<script language="Javascript">
function simpleAjax()
	{
	var xmlhttp = new XMLHttpRequest(); // The XMLhttpRequest is the built object that actually dos the Ajax call
	xmlhttp.onreadystatechange = function () // this function is what will be called AFTER the requested page has been fetched
		{
		if (xmlhttp.readyState == 4 && xmlhttp.status == 200) // verifies the status of the fetched page to be OK
			{
				var x = xmlhttp.responseText; // Get the results of the fetched page and put in the variable "x"
				document.getElementById("data").innerHTML = x; // change the contents of DIV with id "data" to the value of "x"
			}
		}
	xmlhttp.open("GET", "blurb.txt", true); // the page request to make, using method GET (could be POST) and is located in the same dir path as calling file
	xmlhttp.send(); // send the request which then calls the above function to process the results
	}
</script>

</head>
<body>

<a href="" onclick="simpleAjax();return false;">Click Here for Ajax</a>

<div id="data">
	What is your Password?
</data>

</body>
</html>

Goto http://(yourdomain)/ajax.html and click on “Click Here for Ajax”. You should see the text in the DIV with id=”data” change. That is all Ajax is.

Text to Speech with PHP (TTS)

April 13, 2014 by Charles Hays in PHP, Text To Speech | 2 Comments

A really quick and easy Text to Speech class for PHP that will generate an MP3 file. You can also easily setup an Ajax call on a website to play the text to speech audio file as you generate it. The quality is pretty good compared to other solutions but I haven’t figured out how to adjust pitch and tone or mix background music with it (yet.)

This is not really a solid solution for robust TTS as it relies on Google’s TTS API service; for more advanced solutions with lots of controls and embedding into videos we use Microsoft’s TTS system. But for easy to deploy and on demand web services this solution is a synch.

This Text to Speech with PHP version can be found on my CodeSlinger GitHub page.

<?php
/******************************************************************
Projectname:   PHP Text 2 Speech Class 
Version:       1.0 
Author:        Radovan Janjic <rade@it-radionica.com> 
Last modified: 11 06 2013 
Copyright (C): 2012 IT-radionica.com, All Rights Reserved 

* GNU General Public License (Version 2, June 1991) 
* 
* This program is free software; you can redistribute 
* it and/or modify it under the terms of the GNU
* General Public License as published by the Free
* Software Foundation; either version 2 of the License, 
* or (at your option) any later version.
* 
* This program is distributed in the hope that it will
* be useful, but WITHOUT ANY WARRANTY; without even the 
* implied warranty of MERCHANTABILITY or FITNESS FOR A 
* PARTICULAR PURPOSE. See the GNU General Public License 
* for more details. 

Description: 

PHP Text 2 Speech Class 

This class converts text to speech using Google text to  
speech API to transform text to mp3 file which will be  
downloaded and later used as eg. embed file.  

Example: 

****************************************************************** 
<?php
$t2s = new PHP_Text2Speech; 
?> 

// Simple example 
<audio controls="controls" autoplay="autoplay"> 
  <source src="<?php echo $t2s->speak('If you hear this sount it means that you are using PHP text to speech class.'); ?>" type="audio/mp3" /> 
</audio>

// Example use of other language 
<audio controls="controls" autoplay="autoplay"> 
  <source src="<?php echo $t2s->speak('Wie geht es Ihnen', 'de'); ?>" type="audio/mp3" /> 
</audio> 

******************************************************************/ 

class PHP_Text2Speech { 
     
    /** Max text characters
     * @var    Integer  
     */ 
    var $maxStrLen = 100; 
     
    /** Text len
     * @var    Integer  
     */ 
    var $textLen = 0; 
     
    /** No of words 
     * @var    Integer  
     */ 
    var $wordCount = 0; 
     
    /** Language of text (ISO 639-1) 
     * @var    String  
     * @link https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes 
     */ 
    var $lang = 'en'; 
     
    /** Text to speak 
     * @var    String  
     */ 
    var $text = NULL; 
     
    /** File name format 
     * @var    String  
     */ 
    var $mp3File = "%s.mp3"; 
     
    /** Directory to store audio file
     * @var    String  
     */ 
    var $audioDir = "audio/"; 

    /** Contents 
    * @var    String 
    */ 
    var $contents = NULL; 
     
    /** Function make request to Google translate, download file and returns audio file path 
     * @param     String     $text        - Text to speak 
     * @param     String     $lang         - Language of text (ISO 639-1) 
     * @return     String     - mp3 file path 
     * @link https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
     */ 
    function speak($text, $lang = NULL) { 
         
        if ($lang !== NULL) { 
            $this->lang = $lang; 
        } 

        // Create dir if not exists 
        if (!is_dir($this->audioDir)) { 
            mkdir($this->audioDir, 0755) or die('Could not create audio dir: ' . $this->audioDir); 
        } 
         
        // Try to set writing permissions for audio dir. 
        if (!is_writable($this->audioDir)) {  
            chmod($this->audioDir, 0755) or die('Could not set appropriate permissions for audio dir: ' . $this->audioDir); 
        } 
         
        // Can not handle more than 100 characters so split text 
        if (strlen($text) > $this->maxStrLen) {
            $this->text = $text;

            // Generate unique mp3 file name 
            $file = sprintf($this->mp3File, $this->audioDir . md5($this->text)); 
            if (!file_exists($file)) {
                $texts = array();
                $words = explode(' ', $this->text);
                $i = 0;
                $texts[$i] = NULL;
                foreach ($words as $w) {
                    $w = trim($w);
                    if (strlen($texts[$i] . ' ' . $w) < $this->maxStrLen) {
                        $texts[$i] = $texts[$i] . ' ' . $w;
                        if (preg_match('/[:;,.!?-]$/', $w)) { $i++; } // seperate at common breaks
                    } else {
                        $texts[++$i] = $w;
                    } 
                }

                // Get get separated files contents and marge them into one
                foreach ($texts as $txt) {
                    $pFile = $this->speak($txt, $this->lang); 
                    $this->contents .= $this->stripTags(file_get_contents($pFile)); 
                    unlink($pFile);
                }
                unset($words, $texts); 
                 
                // Save file
                file_put_contents($file, $this->contents); 
                $this->contents = NULL;
            }
        } else {
             
            // Generate unique mp3 file name 
            $file = sprintf($this->mp3File, $this->audioDir . md5($text)); 

            if (!file_exists($file)) { 
                // Text lenght 
                $this->textLen = strlen($text); 
                 
                // Words count 
                $this->wordCount = str_word_count($text); 

                // Encode string 
                $text = urlencode($text);

                // Download new file
                $this->download("http://translate.google.com/translate_tts?ie=UTF-8&q={$text}&tl={$this->lang}&total={$this->wordCount}&idx=0&textlen={$this->textLen}", $file);
            }
        } 
         
        // Returns mp3 file path 
        return $file; 
    } 
     
    /** Function to find the beginning of the mp3 file 
     * @param     String     $contents        - File contents 
     * @return     Integer 
     */  
    function getStart($contents) { 
        for($i=0; $i < strlen($contents); $i++){ 
            if(ord(substr($contents, $i, 1)) == 255){ 
                return $i; 
            } 
        } 
    } 
     
    /** Function to find the end of the mp3 file 
     * @param     String     $contents        - File contents 
     * @return     Integer 
     */  
    function getEnd($contents) { 
        $c = substr($contents, (strlen($contents) - 128)); 
        if(strtoupper(substr($c, 0, 3)) == 'TAG'){ 
            return $c; 
        }else{ 
            return FALSE; 
        } 
    } 

    /** Function to remove the ID3 tags from mp3 files 
     * @param     String     $contents        - File contents
     * @return     String
     */
    function stripTags($contents) {
        // Remove start
        $start = $this->getStart($contents);
        if ($start === FALSE) { 
            return FALSE;
        } else { 
            return substr($contents, $start);
        } 
        // Remove end tag 
        if ($this->getEnd($contents) !== FALSE){ 
            return substr($contents, 0, (strlen($contents) - 129));
        } 
    } 

    /** Function to download and save file 
     * @param     String     $url        - URL 
     * @param     String     $path         - Local path 
     */
    function download($url, $path) {  
        // Is curl installed? 
        if (!function_exists('curl_init')){ // use file get contents  
            $output = file_get_contents($url);
        }else{ // use curl  
            $ch = curl_init();  
            curl_setopt($ch, CURLOPT_URL, $url);  
            curl_setopt($ch, CURLOPT_AUTOREFERER, true);
            curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");  
            curl_setopt($ch, CURLOPT_HEADER, 0);  
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);  
            curl_setopt($ch, CURLOPT_TIMEOUT, 10);
            $output = curl_exec($ch);  
            curl_close($ch);
        } 
        // Save file
        file_put_contents($path, $output);
    }
}

Here is an example of using the TTS class in PHP.

<?php
include 'PHP_Text2Speech.class.php';

$t2s = new PHP_Text2Speech;
?>

<audio controls="controls" autoplay="autoplay">
  <source src="<?php echo $t2s->speak('What are you looking at? Wipe that face off your head.'); ?>" type="audio/mp3" />
</audio>

New gTLD Domain Extensions Timeline (spreadsheet)

March 26, 2014 by Charles Hays in Domains, Spreadsheet | Leave a comment

There is a paradigm shift coming to the way we view domain names. I predict the .com to be a dinosaur in the following years. I am so sure of that I’m betting on it by acquiring some of them that fit my current business model. However it has been extremely confusing on the new gTLD release dates.

First there is the Sunrise phase, which allows national and international registered trademark holders to secure their name brands. Self use, local and pending applications need not apply to this phase—you must wait till the next phase if applicable.

Landrush typically comes after Sunrise (however some gTLD’s do not offer it.) During the Landrush you can pay extra get it sooner and beat out the competition.
Finally there is General Availability and that is where the scraps are picked up. Sure you can still get lucky at this point, I have. But keep in mind many services out there allow for pre-registration and payment for general availability, so if several others have also pre-registered this same domain (which is common) your chances of getting that stellar name is very low.

To make the gTLD domain name release dates a bit more easy to digest I put together a Google Spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0AiSCqnEYeYZcdEkyLXRncUdKaGx0THJnNmxjM1duVVE&usp=sharing

SR (yellow) = Sunrise
PL (pink) = Pre Land Rush
LR (orange) = Land Rush
PG (light green) = Pre General Availability
GA (green) = General Availability

PHP Class Wrapper for Stanford Part of Speech Tagger

February 3, 2014 by Charles Hays in NLP | 2 Comments

File: class_Stanford_POS_Tagger.php

Over the last several years I have been dabbling in part of speech tagging, using various natural language processing (NLP) systems. I especially wanted something that would work with PHP as most of my web programming is done with this scripting language. PHP offers a lot of advantages for quick prototyping and testing with its very flexible use of variables, strings and arrays. Unfortunately nearly every POS tagger I have tested written in PHP was either poorly designed, broken, had erroneous errors or was no longer supported, in many cases all the above. The best results of any NLP tagging system seemed to be the one developed by Stanford but only available in Java.

I had written a wrapper some years ago in C# for testing, but it was not very useful for the projects I had in mind. Eventually I came up with the following class in PHP to also wrap. This class includes a lot more functionality then a simple tagger but has variable settings you can change. Most of these functionality options are necessary for my own project but others may find them useful as well.

You will need to download the Stanford post tagger from here http://nlp.stanford.edu/downloads/tagger.shtml

This in turn requires that you have Java 1.6 or newer installed to run it.
When using this class you will need to pass the directory location of the above Stanford tagger to the constructor like so:

$pos = new Stanford_POS_Tagger(‘somewhere/StanfordNLP/stanford-postagger-2014-01-04’);

<?php

/**
 * PHP Class Stanford POS Tagger 1.1.0 - PHP Wrapper for Stanford's Part of Speech Java Tagger
 * Copyright (C) 2014 Charles R Hays http://www.charleshays.com
 *
 * file: class_Stanford_POS_Tagger.php
 *
 * @version 1.1.0 (2/4/2014)
 *		1.0.0 - release
 *		1.1.0 - added merge cardinal numbers
 *
 * @requirements
 *		1)Requires stanford postagger 3.3.1 or newer. Download @ http://nlp.stanford.edu/downloads/tagger.shtml
 *
 *		2)In turn the stanford postagger requires Java 1.6+ to be installed and about 60MB of memory.
 *
 * @example
 * 		require('class_Stanford_POS_Tagger.php');
 *		$pos = new Stanford_POS_Tagger();
 * 		print_r($pos->array_tag("The cow jumped over the moon and the dish ran away with the spoon."));
 *

    This library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
    License as published by the Free Software Foundation; either
    version 2.1 of the License, or (at your option) any later version.

    This library is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    Lesser General Public License for more details.

    You should have received a copy of the GNU Lesser General Public
    License along with this library; if not, write to the Free Software
    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
 */

class Stanford_POS_Tagger
	{
	////////////////////////////////////////////////////////////////////////////
	// POS TAGGER MODELS
	////////////////////////////////////////////////////////////////////////////
	/*
	english-bidirectional-distsim.tagger
	Trained on WSJ sections 0-18 using a bidirectional architecture and
	including word shape and distributional similarity features.
	Penn Treebank tagset.
	*/
	//private $model = 'english-bidirectional-distsim.tagger'; // 97.32% accuracy - slow

	/*
	english-left3words-distsim.tagger
	Trained on WSJ sections 0-18 and extra parser training data using the
	left3words architecture and includes word shape and distributional
	similarity features. Penn tagset.
	*/
  	private $model = 'english-left3words-distsim.tagger'; // 96.97% accuracy - fast

	////////////////////////////////////////////////////////////////////////////
	// Java variables
	////////////////////////////////////////////////////////////////////////////
	private $java_path = 'java'; // the command to run java
	private $java_options = array(); // array of java switch options
	private $jar = 'stanford-postagger.jar'; // the jar to use located in $path
	private $path = '';	// path to where the standford postagger directory resides

	////////////////////////////////////////////////////////////////////////////
	// Temporary files - the text is stored in a tmp file which is parsed
	////////////////////////////////////////////////////////////////////////////
	private $tmp_path = '/tmp'; // directory to store tmp file
	private $tmp_prefix = 'posttagger'; // prefix of tmp file
	private $tmp_permission = 0644; // permission to set tmp file

	////////////////////////////////////////////////////////////////////////////
	// POS Tag seperator such as John_NNP where _ is the seperator
	////////////////////////////////////////////////////////////////////////////
	private $separator = '_'; // used for tagged output
	private $best_separator = '#_#'; // used for better seperation when used in array output

	////////////////////////////////////////////////////////////////////////////
	// Sanatizing text
	////////////////////////////////////////////////////////////////////////////
	private $use_pspell = true; // Use Pspell for spell checking (if installed)

	////////////////////////////////////////////////////////////////////////////
	// In Array Tag Options - For us with array_tag() method only
	////////////////////////////////////////////////////////////////////////////
	private $hash_type = 'md5'; // Hash types for sentence include 'none', 'md5', 'base64', 'sha1' (http://us3.php.net/manual/en/function.hash.php)

	private $merge_proper_nouns = true; // so "John_NNP" "Smith_NNP" becomes "John Smith_NNP"
	private $merge_cardinal_numbers = true; // so "one hundred and thirty" or "two and a half" is grouped as a single CD

	private $sequence_tags = true; // numbers the order of each tag occurance

	private $tag_mask_types = true; // adds new field in array records that masks with a * specified tag types in list below.
	private $tag_mask_list = array(
		//'#',		// Pound sign
		//'$',		// Dollar sign
		//'"',		// Close double quote
		//'``',		// Open double quote
		//"'",		// Close single quote
		//'`',		// Open single quote
		//',',		// Comma
		'.',		// Final punctuation
		//':',		// Colon or semi-colon
		//'-LRB-',// Left bracket
		//'-RBR-',// Right bracket
		//'CC',		// Coordinating conjunction : and, but, or, yet, for, nor, so
		'CD',			// Cardinal number 1, 2, 3, one, two, three hundred

		//'DT',			// Determiner :
		//'EX',			// Existential there : There is a cult of ignorance in the United States.
		//'FW',			// Foreign word :
		//'IN',			// Preposition : links nouns, pronouns and phrases to other words in a sentence. on, beneath, against, beside, over, during

		'JJ',			// adjective : sweet, angry, bright, cold, long : also orignal numbers like "3rd" fastest, "6th" place
		'JJR',		// comparitive adjective : sweeter, angrier, brighter, colder, longer
		'JJS',		// superlative adjective : sweetest, angriest, brightest, coldest, longest

		//'LS',			// List item marker :
		//'MD',			// Modal : can, may, must, should, would

		'NN', 		// singular noun : girl, mother, nurse, city, town, bicycle, doll, train, dream, truth, pride, colony, team, litter, covey
		'NNS',	 	// plural noun : children, men, girls, mothers, nurses, cities, towns, bikes, dolls, trains, dreams, colonies, teams, litters
		'NNP',		// proper singular noun : John, Smith, Pizza Hut
		'NNPS',		// proper plural noun: Kennedys

		//'PDT',	// predeterminer : all, both, half
		//'POS',	// possessive ending : 's, s'
		//'PRP',	// personal pronoun : I, me, myself, we us, ourselves, you, yourself (http://en.wikipedia.org/wiki/English_personal_pronouns)
		//'PP?',	// possessive pronouns : her, your, his, hers, my, their, yours, whose, one's, theirs, its, our (http://examples.yourdictionary.com/examples-of-possessive-pronouns.html)
		'RB',		// adverb : slowly, now, soon, suddenly (http://en.wikipedia.org/wiki/Adverb)
		'RBR',		// comparative adverb : more quietly, more carefully, more happily, harder, faster, earlier
		'RBS',		// superlative adverb : most quiely, most carefully, most happily, hardest, fastest, earliest
		'RP',		// particle : prepositions that modify a verb instead of a noun. along, away, back, by, down, forward, in, off, on, out, over, round, under, up
		//'SYM',	// symbol :
		//'TO',		// to
		'UH',		// injection : ah, oh, brrr, oops, huh?, booh, eh, mwahaha, bwahaha, yay, yuck, yeah (http://www.vidarholen.net/contents/interjections/)

		'VB',		// verb, base form : walk, skip, jump
		'VBD',		// verb, past tense : walked, shipped, jumped
		'VBG',		// verb, gerund/present participle : walking, skipping, jumping
		'VBN',		// verb, past participle : have walked, have skipped, have jumped
		'VBP',		// verb, non 3rd person: sing, present :
		'VBZ',		// verb, 3rd person: sing, present :
		//'WDT',		// wh-determiner : what, which, whose, whatever, whichever
		//'WP',		// wh-pronoun : what, which, where, when, who, whom, whose. (And maybe: whether.)
		//'WP$',		// possesive wh-pronoun : whose
		//'WRB',		// wh-adverb : how, where, when
		//' '			// blank space
		);

	////////////////////////////////////////////////////////////////////////////
	// Methods
	////////////////////////////////////////////////////////////////////////////

	public function __construct($path = '', $java_options = array('-mx300m'))
		{
		if(trim($path) == '')
			{
			$path = __DIR__;
			}
		$this->set_path($path);
		$this->set_java_options($java_options);
		$this->set_model($this->model);
		}

	public function set_path($path)
		{
		$this->path = trim(rtrim(trim($path),'/')).'/';
		}

	public function merge_proper_nounds($val = true)
		{
		$this->merge_proper_nouns = $val;
		}

	public function sequence_tags($val = true)
		{
		$this->sequence_tags = $val;
		}

	public function tag_mask_types($val = true)
		{
		$this->tag_mask_types = $val;
		}

	public function tag_mask_list($taglist = array())
		{
		$this->tag_mask_types_list= $taglist;
		}

	public function set_hash($val = '')
		{
		if($val == '') $val = 'none';

		$this->hash_type = $val;
		}


	public function set_stanford_path($path)
		{
		$this->path = trim(rtrim($path,'/'));
		}

	public function set_model($model)
		{
		$this->model = trim(ltrim($model));
		}

	public function get_model()
		{
		return rtrim($this->path,'/').'/models/'.ltrim($this->model,'/');
		}

	public function get_jar()
		{
		return rtrim($this->path,'/').'/'.ltrim($this->jar,'/');
		}

	public function set_jar($jar)
		{
		$this->jar = trim(ltrim($jar));
		}

	public function set_java_path($java_path)
		{
		$this->java_path = trim($java_path);
		}

	public function set_java_options($java_options = array())
		{
		$this->java_options = $java_options;
		}

	public function set_tmp_path($path)
		{
		$this->tmp_path = trim(rtrim($path,'/'));
		}

	public function set_tmp_prefix($prefix)
		{
		$this->tmp_prefix = trim(ltrim($prefix,'/'));
		}

	public function set_tmp_permission($perm)
		{
		$this->tmp_permission = $perm;
		}

	public function set_tag_separator($separator = '_')
		{
		$this->separator = trim($separator);
		}

	public function get_tag_separator()
		{
		return $this->separator;
		}

	public function tag($txt,$normalize = true,$separator = '')
		{
		if(!file_exists($this->get_jar()))
			{
			throw new Exception("Jar not found: ".$this->get_jar());
			}
		if(!file_exists($this->get_model()))
			{
			throw new Exception("Model not found: ".$this->get_model());
			}
		if($separator == '')
			{
			$separator = $this->separator;
			}

		$tf = tempnam($this->tmp_path, $this->tmp_prefix);
		chmod($tf, octdec($this->tmp_permission));

		chmod($tf, 0644);

		$words = explode(' ',$txt);

		if($this->use_pspell)
			{
			$txt = $this->spellcheck($txt);
			}

		file_put_contents($tf, $txt);

		$options = implode(' ', $this->java_options);
		$model = $this->path.'/'.$this->model;

		$descriptorspec = array(
			0 => array("pipe", "r"),  // stdin
			1 => array("pipe", "w"),  // stdout
			2 => array("pipe", "w")   // stderr
			);

		$cmd = escapeshellcmd('java '.$options.' -cp "'.$this->jar.';" edu.stanford.nlp.tagger.maxent.MaxentTagger -model '.$this->get_model().' -textFile '.$tf.' -outputFormat slashTags -tagSeparator '.$separator.' -encoding utf8');


		$process = proc_open($cmd, $descriptorspec, $pipes, dirname($this->get_jar()));

		$output = null;
		$errors = null;
		if(is_resource($process))
			{
			// ignore stdin - input
			fclose($pipes[0]);

			// get stdout - output
			$output = stream_get_contents($pipes[1]);
			fclose($pipes[1]);

			// get stderr - errors
			$errors = stream_get_contents($pipes[2]);
			fclose($pipes[2]);

			// prevent deadlock by closing pipe before calling proc_close
			$return_value = proc_close($process);
			if($return_value == -1)
				{
				throw new Exception("Java process error: ".$cmd);
				}
			}

		unlink($tf);

		return $output;
		}

	public function array_tag($txt,$normalize = true)
		{
		return $this->tagged_to_array($this->tag($txt,$normalize,$this->best_separator),$this->best_separator);
		}

	public function tagged_to_array($tagged, $separator)
		{
		$arr = array();

		if(!$tagged) return $arr;

		if($separator == '')
			{
			$separator = $this->separator;
			}

		$sentences = explode("\n", $tagged);
		foreach($sentences as $k => $v)
			{
			$sequence = array();
			if(trim($v) == '')
				{
				continue;
				}
			$tagrec = array();
			$tags = explode(' ', trim($v));
			$last_tag = 'START';
			$i = 0;
			foreach($tags as $kk => $vv)
				{
				$parts = explode($separator, trim($vv));
				$tag = array();

				// start - merge proper nouns
				if($this->merge_proper_nouns)
					{
					if(($parts[1] == 'NNP') || ($parts[1] == 'NNPS'))
						{
						if(($last_tag == 'NNP') || ($last_tag == 'NNPS'))
							{
							$tagrec[$i - 1][token] .= ' '.$parts[0]; // append this word to last token
							$tagrec[$i - 1][tag] = $parts[1]; // the final proper noun type is used
							continue;
							}
						}
					}

				// end - merge proper nouns

				// start - merge cardinal numbers
				if($this->merge_cardinal_numbers)
					{
					if($parts[1] == 'CD')
						{
						if($last_tag == 'CD')
							{
							$tagrec[$i - 1][token] .= ' '.$parts[0]; // append this word to last token
							continue;
							}
						}
					}

				// end - merge cardinal numbers

				$last_tag = $parts[1];

				$tag[token] = $parts[0];
				$tag[tag] = $parts[1];

				// start - sequence tags
				if($this->sequence_tags)
					{
					if($sequence[$parts[1]] > 0)
						{
						$sequence[$parts[1]]++;
						}
					else
						{
						$sequence[$parts[1]] = 1;
						}
					$tag[seq] = $sequence[$parts[1]];
					}
				// end sequence proper nouns

				// start - tag masking
				if($this->tag_mask_types)
					{
					if(in_array($parts[1],$this->tag_mask_list))
						{
						$tag[mask] = '*';
						}
					else
						{
						$tag[mask] = $parts[0];
						}

					}
				// end - tag masking

				$tagrec[] = $tag;
				$i++;

				}

			$tagdata = array();
			$tagdata[tagged] = $tagrec;

			$tagdata[sentence] = '';
			$tagdata[tag_set] = '';
			$tagdata[mask_set] = '';
			foreach($tagrec as $k => $v)
				{
				// sentence
				if($tagdata[sentence] != '') $tagdata[sentence] .= ' ';
				$tagdata[sentence] .= $v[token];

				// tag set
				if($tagdata[tag_set] != '') $tagdata[tag_set] .= ' ';
				if($this->sequence_tags)
					{
					$tagdata[tag_set] .= '{'.$v[tag].'-'.$v[seq].'}';
					}
				else
					{
					$tagdata[tag_set] .= '{'.$v[tag].'}';
					}

				// mask set
				if($tagdata[mask_set] != '') $tagdata[mask_set] .= ' ';
				if($v[mask] == '*')
					{
					if($this->sequence_tags)
						{
						$tagdata[mask_set] .= '{'.$v[tag].'-'.$v[seq].'}';
						}
					else
						{
						$tagdata[mask_set] .= '{'.$v[tag].'}';
						}
					}
				else
					{
					$tagdata[mask_set] .= $v[mask];
					}
				}

			// generate hashes
			if($this->hash_type == 'md5')
				{
				$tagdata[hash_sentence] = md5($tagdata[sentence]);
				$tagdata[hash_tag_set] = md5($tagdata[tag_set]);
				$tagdata[hash_mask_set] = md5($tagdata[mask_set]);
				}
			else if($this->hash_type == 'base64')
				{
				$tagdata[hash_sentence] = base64_encode($tagdata[sentence]);
				$tagdata[hash_tag_set] = base64_encode($tagdata[tag_set]);
				$tagdata[hash_mask_set] = base64_encode($tagdata[mask_set]);
				}
			else if($this->hash_type == 'sha1')
				{
				$tagdata[hash_sentence] = sha1($tagdata[sentence]);
				$tagdata[hash_tag_set] = sha1($tagdata[tag_set]);
				$tagdata[hash_mask_set] = sha1($tagdata[mask_set]);
				}

			$arr[] = $tagdata; // add seqntence array to output array
			}

		return $arr;
		}

	public function spellcheck($txt)
		{
		$o = '';
		if(function_exists(pspell_new))
			{
			$pspell_link = pspell_new("en");
			foreach($words as $k => $v)
				{
				if (!pspell_check($pspell_link, $v))
					{
					$o .= pspell_suggest($pspell_link, $v).' ';
					}
				}
			$txt = $o;
			}
		return $txt;
		}	

	}

// EOF

A quick example:


require('class_Stanford_POS_Tagger.php');
$pos = new Stanford_POS_Tagger();
print_r($pos->array_tag("The cow jumped over the moon and the dish ran away with the spoon."));

Resulting output:

This project also available on Github @ https://github.com/TheCodeSlinger/PHP-Class-Stanford-POS-Tagger

Who is Googhydr-20? It is Amazon!

January 27, 2014 by Charles Hays in Amazon | 2 Comments

You probably found this blog from the keyword “googhydr” either ending in -20 or -21. This is an Amazon affiliate id and the -20 indicates US market while -21 is the UK market. I too did some digging into this after researching keywords and SERP listings and Adwords listings and kept seeing this Amazon affiliate linking directly to Amazon. Being an Amazon affiliate myself it was easy to figure out the affiliate ID googhydr-20 from the link.

The name is kind of ominous if where googhydr is pronounced “Goog Hider” which could mean “Google Hider”, so is the associate trying to hide from Google?

I had been contimplating doing the same thing this googhydr character is doing for some time. It seems so easy why isn’t everyone doing it? I started digging around and reading up on Amazon’s rules.

This Adwords listing by the Amazon affiliate was doing exactly what is prohibited to Amazon associates to do. In the Amazon Associates Operating Agreement, section 7:

“Prohibited Paid Search Placement” means an advertisement that you purchased through bidding on keywords, search terms, or other identifiers (including Proprietary Terms) or other participation in keyword auctions. “Proprietary Term” means keywords, search terms, or other identifiers that include the word “amazon,” “Kindle,” “myhabit,” or “Javari,” or any other trademark of Amazon or its affiliates ( see a non-exhaustive list of our trademarks), or variations or misspellings of any of those words (e.g., “ammazon,” “amaozn,” “kindel,” and “javary”). “Redirecting Link” means a link that sends users indirectly to the Amazon Site via an intermediate site or webpage and without requiring the user to click on a link or take some other affirmative action on that intermediate site or webpage. “Search Engine” means Google, Yahoo, Bing, or any other search engine, portal, sponsored advertising service, or other search or referral service, or any site that participates in any of their respective networks.

If you do a search for Googhydr you might come across this website http://www.googhydr.com/ where the page claims to be an independent reseller who has tagged every possible Google search keyword. The page then goes onto to convince you to click on a link to buy a book on “Exploiting Amazon and Adwords PPC”. I can tell you that the claim on the website is false and a scam.

Googhydr is Amazon itself running its own Adwords campaigns.

It might not seem fair that Amazon who already dominates the organic search engine keywords for just about every product also adds itself to Adwords keywords and the same time won’t allow any of its associates to do the same. Well as they say life isn’t fair.

A few related discussions about this topic include

http://forums.prospero.com/n/mb/message.asp?webtag=am-associhelp&msg=32189.1&search=y
http://www.warriorforum.com/main-internet-marketing-discussion-forum/289507-google-amazon-affiliate.html

Htaccess Password Protect Word Press Admin

January 21, 2014 by Charles Hays in Security | Leave a comment

Brute force cracking Word Press sites admin login is rampant because it is very easy. Word Press doesn’t offer much in the way of the way helping protect your blog from such attacks. Brute force attacks is a method of trying to guess the password either systematically or via a common password dictionary list. One of the easiest ways to deter would be hackers is to add a second layer of password authentication to the administration area. Using htaccess rules to require a password before getting to the WP admin password will thwart nearly all the cracking bots out there. Just be sure you make the new htaccess login user name and password completely different than the one used by WP.

To setup htaccess password protection for your Word Press admin area you need to first create a text document called .htpasswd. You can either use the linux shell htpasswd or the online tool HTPasswd Generator. Once you created this file save it or upload it to your wp-admin directory. (It needs to have the period at the front of the file—the period hides the file from view and access by direct web access on the linux apache system.)

Next you need to edit or create a file in your wp-admin directory called .htaccess it should look like so:


ErrorDocument 401 "Access Denied"
ErrorDocument 403 "Access Denied"
AuthName "AuthorizedAccess"
AuthUserFile "/home/your site/www/wp-admin/.htpasswd"
AuthType Basic
require valid-user

In the line that starts with AuthUserFile you will need to adjust the path to where the .htpasswd file you will create will be located. In most linux servers the path needs to be absolute, so it has to start at /home or whatever the start is, you can’t just go AuthUserFile .htpasswd (On most servers.)

Now when you access your wp-admin page you will get an htaccess popup window requiring the user name and password from the .htpasswd file first before you can access the Word Press login.

Automating Google Webmaster Tools Validation (PHP)

January 20, 2014 by Charles Hays in PHP, SEO | Leave a comment

In order to claim a website with Google’s Webmaster Tools you have to verify you own it or at least have access to the file system. In order to do so Google gives you a file to upload to the domains www directory that is something like this: google9bf026a5h34deb40d.html

It use to be this file was completely empty returning nothing. However this probably lead to some hacking abuse where certain websites were not properly configured to give an empty page for 404 (page not found.) It doesn’t take a genius to figure out such a website could easily be registered as your own as the call to the validation file would return exact what Google was looking for.

The validation file is no longer empty and Google won’t accept empty files they now must contain the following:

google-site-verification: google9bf026a5h34deb40d.html

You can see the name of the file is now part of the content which helps eliminate validation hijacking

Suppose you got a lot of Google friends or accounts that you want to allow to have access to your site with their webmaster tools. Updating their individual validation files can be a chore so here is an easy way to automate the process.

This example is not exact you will need to adjust it to your own needs.

First add a line in your .htaccess file.

RewriteRule ^google ([a-zA-Z0-9]+).html$ googlevalidateme.php?v=$1 [NC,L]

Then have a PHP program called googlevalidateme.php

<?php
echo 'google-site-verification: google'.$_REQUEST[v].'.html';

So with this example any calles to googlexxxx.html will pass xxxx as a value to googlevalidateme.php which will return the correct validation code for Google.

Of course you might want to improve this with a call to a master list someplace so that unwanted others can’t validate and spy on you.

Science Jobs for Young Earth Creationists

December 3, 2013 by Charles Hays in Science | Leave a comment

There is a huge demand for qualified and skilled scientists in the fields of biochemists and genetics for the purpose of helping find cures to diseases and viruses. This research over the last 100 years has saved millions if not billions of humans and improved our quality of health and life beyond measure. These fields today require the understanding of real world biology and genetics and evolution is an important tool. Evolution requires time to work as well, time for genetic patterns to emerge and mutate and change again. Young Earth Creationists (YEC) hamper the search for truth and actually harm the rest of mankind because they do not do unbiased research (if any real research at all), this fallacy is that they must find data to support their beliefs and not objectively see what the evidence is saying.

There is no real practical scientific jobs for a YEC who supports such beliefs over evolution.

If you want scientists to help find cures for HIV and cancer they must use the scientific method and tools that work. YEC cannot help cure these or any disease with such beliefs because a “belief” is not a tool to help solve such problems.

This is why not only the entire debate for teaching Creationism along with Evolution in science classes is idiotic, it is also disingenuous and DANGEROUS! The less our next generation knows about REAL SCIENCE the greater chance you or a loved one will die from something that might otherwise have been curable.

Today even the Vatican supports the theory of evolution. Only biblical literalists want to fight the only working tools we have to help create future cures because if evolution is true then the literal word of the bible cannot be.

So when the next Pandemic hits do we want a generation of scientists to help cure it and save us all or a bunch of believers who feel good that the bible is the infallible word of God?

Auto Post to Posthaven via Email with Images (PHP)

November 5, 2013 by Charles Hays in PHP, Web Bots | Leave a comment

If you were a fan of Posterous before it was bought out by Twitter, you will want to check out Posthaven. It is as of right now still in development but looks very promising, offering a more flexible and simpler blogging solution then with other sites. This new blogging service isn’t free, but at $5 a month for 10 sub domain blogs it is extremely affordable. If you want to lock in a wealth of prime real-estate on this upcoming service in the way of subdomain keywords, now is the time.

Like previous articles I have covered on automating blog posting, Posthaven works the same and is really easy. However there is a huge advantage using Posthaven over the other blog services and that is your automated posts can also get put right into Facebook page and Twitter account at the same time. That is cool! And it leverages your blogs big time.

Start by creating a Posthaven account, then login and create one or more site accounts.

To post to your account via email you need to then click on “Edit Your Account” and goto the section “Post by Email Settings”. Click on the checkbox next to ”Use a secret word to verify my emails” and enter a secret password in the field.

You also need to make sure that the email address you will be sending from is listed at the top where it says “Your Email Addresses”

For example lets say your secret is “mysecret” and your site account sub domain is “abc”. Then the email address you will send to post a new blog article would be post.mysecret@abc.posthaven.com

The subject of the email will be the articles title. The body will be the article itself and if you want to include an image then you send it with the email as an attachment. Lets see some code in PHP that will email post to your Posthaven blog. (We will use PHPMAILER found here.)

include('class.phpmailer.php');

$posthaven_account = "YOUR SUBDOMAIN NAME"; //Example "abc" NOT "abc.posthaven.com"
$posthaven_secret = "YOUR POSTHAVEN SECRET";

$gmail_your_name = "YOUR NAME";
$gmail_username = "YOUR GMAIL USERNAME";
$gmail_password = "YOUR GMAIL PASSWORD";
$gmail_email = "YOUR GMAIL EMAIL ADDRESS";
$image_location = 'C:/YOUR LOCATION OF IMAGE/IMAGE.JPG';
$email_title = "EMAIL TITLE";
$email_body = "EMAIL BODY"; // (LIMITED) HTML OK

$mail = new PHPMailer();
$mail->IsHTML(true);
$mail->IsSMTP();
$mail->SMTPAuth = true;
$mail->SMTPSecure = "ssl";
$mail->Host = "smtp.gmail.com";
$mail->Port = 465;
$mail->Username = $gmail_username;
$mail->Password = $gmail_password;
$fromname = $gmail_your_name;

$posthaven_blog_email = 'post.'.$posthaven_secret.'@'.$posthaven_account.'.posthaven.com';

$To = trim($posthaven_blog_email,"\r\n");

$mail->AddAttachment($image_location);
$mail->From = $gmail_email;
$mail->FromName = $fromname;
$mail->Subject = $email_title;
$mail->Body = $email_body;
$mail->AddAddress($To);
$mail->set('X-Priority', '3'); //Priority 1 = High, 3 = Normal, 5 = low
$mail->Send();

If you then want to capture the URL of the newly posted posthaven blog you can use the following code:

$posthaven_url = 'http://'.$posthaven_account.'.posthaven.com';
sleep(30); // give it enough time to receive and update the post (30 seconds)
$bf = file_get_contents(rtrim($posthaven_url,'/').'/posts.atom');
list($t,$b1) = explode("<updated>",$bf,2);
list($t,$b2) = explode('href="',$b1,2);
list($b3,$t) = explode('"',$b2,2);
$bb = trim(str_replace('"','',$b3));
$bb = trim(str_replace("'",'',$bb));
$bb = trim(str_replace(' ','',$bb));
echo '<li>LINK IS='.$bb;

Block All Internet Traffic from China/Russia/Nigeria on your Linux Server

November 5, 2013 by Charles Hays in Security | Leave a comment

Every server connected to the internet is constantly being attacked with brute force login attempts, software exploits, email spam and more. It is the dirty laundry all IT Security or anyone who manages there own website or server knows. With the extent of dark nets, bot nets and abused proxies this activity runs amuck and pretty much unstoppable. The only thing we can really do is just make sure our software is up to date and passwords are strong.
Just the other day one of my reseller hosting servers located in Germany was terminated and another at Hostgator was suspended. I was told that my wordpress sites were using too much CPU from the server. Looking at the log snapshot sent by Hostgator indicated that all of the usage came from the wp-admin.php script. Was this not obvious to them? Someone was trying to brute force open the wordpress admin. After informing Hostgator that this was not my fault unless they didn’t think I should be using the most popular blog software they were quick to start blocking IP’s coming in. The German company (who I won’t name) said this was beyond the capabilities and that there policy was to take down any website that gets attacked…WTF? Ya I will be ditching them next week, any policy like that which penalizes the website owner for an attack rather than simply blocking the attacking IP’s is bullshit.
The German company told me as did Hostgator the attacks were all coming from China and the Ukraine. On my own managed dedicated boxes I have blocked these countries completely, along with other countries that have originated some scams and abuse such as Nigeria.
If you manage a linux server this is really easy here is how you can block nearly all the traffic from specific countries from coming into your website.
First get and install Advanced Policy Firewall (APF) https://www.rfxn.com/projects/advanced-policy-firewall/
Once you have that installed and configured properly according to the documentation login to your shell and find the apf folder usually at /etc/apf and edit the file deny_hosts.rules
Goto wizcrafts.net and find the APF IP lists for the desired countries. Here is some quick links
Nigeria: http://www.wizcrafts.net/nigerian-iptables-blocklist.html
China: http://www.wizcrafts.net/chinese-iptables-blocklist.html
Russia: http://www.wizcrafts.net/russian-iptables-blocklist.html
South America: http://www.wizcrafts.net/lacnic-iptables-blocklist.html
Other Exploited Networks: http://www.wizcrafts.net/exploited-servers-iptables-blocklist.html

Copy and paste these lists into the deny_hosts.rules and then save it.
Restart APF by #apf –r
That’s it.
If you find other IP’s in your logs that you want to block you can just edit this file and add those IP numbers to the list and restart APF.
These lists of IP’s change regularly so you may want to once a month go back and update it.
If your internet business for your server has nothing to do with these other countries there is no real reason not to block them using this or another method.

If you have an IP# and your not sure what country it is originating from, use http://www.infosniper.net/ to look it up.

Code Slinger

X-NLP

Data Bots

Vid Automatic

VR Hacker

Simple Ajax Tutorial

Text to Speech with PHP (TTS)

New gTLD Domain Extensions Timeline (spreadsheet)

PHP Class Wrapper for Stanford Part of Speech Tagger

Who is Googhydr-20? It is Amazon!

Htaccess Password Protect Word Press Admin

Automating Google Webmaster Tools Validation (PHP)

Science Jobs for Young Earth Creationists

Auto Post to Posthaven via Email with Images (PHP)

Block All Internet Traffic from China/Russia/Nigeria on your Linux Server

Recent Posts

Recent Comments

Archives

Categories

Meta

Social Networks

Recent Posts

About Charles Hays