Multiple group-by with one common variable with pandas? -


I want to mark duplicate values ​​within an id group. For example,

id AB i1 a1b1i1a1b2i1a2b2i2a1b2

Should be

  ID A is a bbn i1a1bb1a1a1b2 2i1a2 1b2 2i2a1 1b 2 1 Actually count the multiplicity within the  one  and  BN  each  id  group. How can I do this in Pandas? I've found  from the group , but it was quite messy to keep everything together. Also I tried the individual group for  id, a  and  id, b . Maybe there is a way of pre-group by first and then use all other variables by the  id ? (There are many variables and I have lots of man lines!) 

Tried for ID, A and ID, B

I think this is a straightforward way of solving it; As you suggest, you can do the by group each and then calculate the size of the group. And use change so you can easily add results to the original dataframe:

  df ['an'] = df.groupby (['id' "a [a] transform (NP Secure) DF [ 'BN'] = df.groupby ([ 'id', 'b']) [B] transform (NP. ID AB a BN 0 I 1A 1B1 2 1 1A1A1B2 2 2Ii1A2B2 1 2 3I2A1B2 1 1  

Of course, too many columns You can: ['A', 'B'] for Colonel

  (d) [col + 'n'] = df.groupby (['id', Cola]] [cola] .transf Rm (Np.size)  

you Duplicate using the method can also be used to do something similar, but this first one as a duplicate Will mark the comments within the group: for the call in Print DF-ID AB a BN 0 I1 a1 b1 false false 1 i1 a1 b2 true false 2 Il A2 B2 false true 3 i2 a1 b2 false false

edit < / Strong>: Enhancing performance for large data Has done it on a large dataset (4 million lines) and if I want to avoid something with change then it is quite fast (it is very less elegant):

 < code ( '' A ',' B ']: x = df.groupby ([' ID ', col]) size () df.set_index ([' id ', Cola], inplace = true) df [Col +' N '] = x df.reset_index (inplace = true)  

Comments

Popular posts from this blog

import - Python ImportError: No module named wmi -

Editing Python Class in Shell and SQLAlchemy -

c# - MySQL Parameterized Select Query joining tables issue -