python - Using dynamic lists in query in Pandas -
For example, say I have several types of columns that encode different types of rates (
"Annual Rate"
, "1/2 Annual Rate"
, etc.). I want to use query
on my datafire to search for entries where any of these rates is above 1
. First I
Code>
Then I want to do something like this:
df.query ('any of my columns & gt; 1')
I query
< / Div>
How can I format it for a query?
A python expression (with some limitations, for example, you can not use lambda
expression or thorium if
} / Other
expressions). This means that any column referenced in your column string should be a valid Python identifier (a more formal word for "variable name"). One way to check this is to hide the name
pattern in the tag in the
module: [156] in: tokenize.Name out [156]: '[a-zA-Z _ ] \\ w * 'in [157]: def isidentifier (x): .....: returns re.match (tokenize.Name, x) None .. ...: in [158]: isidentifier ( 'Adsf') [158]: Really [15 9]: isidentifier ('1adsf') out [15 9]: Incorrect
Now because the names of your columns are spaces, Each word apart from the spaces will be evaluated as a different identifier so that you will have something like
df.query (" Rshik rate & gt; 1 ")
is invalid Python syntax. Try typing annual rate
in a Python interpreter and you will get an SyntaxError
exception.
Take Home Message: Valid variable name to rename your column You will not be able to do this program (at least, easily), as long as your columns do not follow any type of structure. In your case, you can [166]: Call Out [166]: ['Annual Rate', '1/2 Annual Rate', 'Monthly Rate'] [167]: List (Map (Lambda X: ' ('' '1/2', 'half', Cold)) Out [167]: ['Annual_tate', 'half-monthly_tate', 'Monthly_rate']
You can then format the query string as an example of the @ assuror
in [173]: newcols out [173]: ['annual_tate', 'half_other_attack', ' 'Magazine_rate'] in [174]: 'or'. ('% S' in 1 new seals. For% c) Out [174]: 'annual_tread' gt; 1 or half_manial_traet> gt; 1 or monthly_at & gt; 1 '
Note: You really need To use / em> query
here:
[180]: df = dataframe (randon (10, 3), column = column) [181 ]: DF out [181]: Annual rate 1/2 annual rate monthly rate 0 -0.6 9 80 0.6322 2.5695 1 -0.1413 -0.3285 -0.9856 2 0.818 9 0.7166 -1.4302 3 1.3300 -0.9596 -0.8 9 34 4 -1.7545 -0.9635 2.8515 5 -1.138 9 0.1055 0.5423 6 0.2788 -1.3973 -0. 9 7 7 7 -1.8570 1.3781 0.0501 8 -06,842 -0.2012 -0.5083 9 -03,270 -1.5280 0.2251 [10 rows x 3 columns] [182] in: df.gt (1) .un (1) out [182]: 0 True 1 Lies 2 False 3 Truths 4 Truths 5 Lies 6 False 7 Truths 8 False 9 False DTP: Bull In [183]: DF [df.gt (1). No (1)] Out [183]: Annual Rate 1/2 Annual Rate Monthly Rate 0 -0.6 9 80 0.6322 2.5695 3 1.3300 -0.9596 -0.8 9 34 4 -1.7545 -0.9635 2.8515 7 -1.8570 1.3781 0.0501 [4 Lines x 3 Column]
As you noted in the comments, Jeff noted the non-identifying column names, though in a clumsy way:
pd.eval ('df [df ["annual rate"]> gt;') ')
If you want to save the lives of kittens then I This way does not recommend writing code.
Comments
Post a Comment