python - Using dynamic lists in query in Pandas -

- June 15, 2012

For example, say I have several types of columns that encode different types of rates ( "Annual Rate" , "1/2 Annual Rate" , etc.). I want to use query on my datafire to search for entries where any of these rates is above 1 .

First I

Code>

Then I want to do something like this:

  df.query ('any of my columns & gt; 1')

  I  query  
 < / Div>

How can I format it for a query? A python expression (with some limitations, for example, you can not use lambda expression or thorium if } / Other expressions). This means that any column referenced in your column string should be a valid Python identifier (a more formal word for "variable name"). One way to check this is to hide the name pattern in the tag in the module: [156] in: tokenize.Name out [156]: '[a-zA-Z _ ] \\ w * 'in [157]: def isidentifier (x): .....: returns re.match (tokenize.Name, x) None .. ...: in [158]: isidentifier ( 'Adsf') [158]: Really [15 9]: isidentifier ('1adsf') out [15 9]: Incorrect

Now because the names of your columns are spaces, Each word apart from the spaces will be evaluated as a different identifier so that you will have something like

  df.query (" Rshik rate & gt; 1 ")

is invalid Python syntax. Try typing annual rate in a Python interpreter and you will get an SyntaxError exception.

Take Home Message: Valid variable name to rename your column You will not be able to do this program (at least, easily), as long as your columns do not follow any type of structure. In your case, you can [166]: Call Out [166]: ['Annual Rate', '1/2 Annual Rate', 'Monthly Rate'] [167]: List (Map (Lambda X: ' ('' '1/2', 'half', Cold)) Out [167]: ['Annual_tate', 'half-monthly_tate', 'Monthly_rate']

You can then format the query string as an example of the @ assuror

 in  [173]: newcols out [173]: ['annual_tate', 'half_other_attack', ' 'Magazine_rate'] in [174]: 'or'. ('% S' in 1 new seals. For% c) Out [174]: 'annual_tread' gt; 1 or half_manial_traet> gt; 1 or monthly_at & gt; 1 '

Note: You really need To use / em> `query` here:

[180]: df = dataframe (randon (10, 3), column = column) [181 ]: DF out [181]: Annual rate 1/2 annual rate monthly rate 0 -0.6 9 80 0.6322 2.5695 1 -0.1413 -0.3285 -0.9856 2 0.818 9 0.7166 -1.4302 3 1.3300 -0.9596 -0.8 9 34 4 -1.7545 -0.9635 2.8515 5 -1.138 9 0.1055 0.5423 6 0.2788 -1.3973 -0. 9 7 7 7 -1.8570 1.3781 0.0501 8 -06,842 -0.2012 -0.5083 9 -03,270 -1.5280 0.2251 [10 rows x 3 columns] [182] in: df.gt (1) .un (1) out [182]: 0 True 1 Lies 2 False 3 Truths 4 Truths 5 Lies 6 False 7 Truths 8 False 9 False DTP: Bull In [183]: DF [df.gt (1). No (1)] Out [183]: Annual Rate 1/2 Annual Rate Monthly Rate 0 -0.6 9 80 0.6322 2.5695 3 1.3300 -0.9596 -0.8 9 34 4 -1.7545 -0.9635 2.8515 7 -1.8570 1.3781 0.0501 [4 Lines x 3 Column]

As you noted in the comments, Jeff noted the non-identifying column names, though in a clumsy way:

pd.eval ('df [df ["annual rate"]> gt;') ')

If you want to save the lives of kittens then I This way does not recommend writing code.

Comments

Post a Comment

Search This Blog

City