machine learning - How to quantify these features so they can be analysed upon using Logistic Regression? -

- May 15, 2012

I have a very short question which is bothering me for a while. I have a dataset with interesting features , But some of them dimensionless amounts (I have tried to use z-scores) have made them anything worse:

  Timestamp (like YMMDDHMMSMSMI) From the last 9 characters are found. User IDs (like a hash form) How do I interpret them? The IP address (you know what they are) I only remove the first 3 characters. City (there is an ID like 1,15,72) How can I interpret it? Should the area (similar to the city) make me mean, or just leave it?

The rest of the things, which understand prices, width and height. Any help or insight would be greatly appreciated thanks ..

li>

If the user / city / region is a nominal value, which must be encoded in any way. The most common method is to create more "dummy" dimensions as possible numbers, so if you have 100 centuries, you create 100 dimensions and only "1" on representing a particular city (and others on 0)

The IP should be removed or some small group of them (nominally for the dummy change on the basis of DNS-network identity and above)

Search This Blog

City

machine learning - How to quantify these features so they can be analysed upon using Logistic Regression? -

Comments

Post a Comment

Popular posts from this blog

c# - Highlight all words containing a letter in a richtextbox -

Admob interstitials not clickable on Nexus 5 (Android 4.4.2) -

java - MigLayout - selective component fill -