machine learning - How to quantify these features so they can be analysed upon using Logistic Regression? -


I have a very short question which is bothering me for a while. I have a dataset with interesting features , But some of them dimensionless amounts (I have tried to use z-scores) have made them anything worse:

  Timestamp (like YMMDDHMMSMSMI) From the last 9 characters are found. User IDs (like a hash form) How do I interpret them? The IP address (you know what they are) I only remove the first 3 characters. City (there is an ID like 1,15,72) How can I interpret it? Should the area (similar to the city) make me mean, or just leave it?  

The rest of the things, which understand prices, width and height. Any help or insight would be greatly appreciated thanks ..

li>

  • If the user / city / region is a nominal value, which must be encoded in any way. The most common method is to create more "dummy" dimensions as possible numbers, so if you have 100 centuries, you create 100 dimensions and only "1" on representing a particular city (and others on 0)
  • The IP should be removed or some small group of them (nominally for the dummy change on the basis of DNS-network identity and above)

  • Comments

    Popular posts from this blog

    import - Python ImportError: No module named wmi -

    Editing Python Class in Shell and SQLAlchemy -

    c# - MySQL Parameterized Select Query joining tables issue -