Multiple group-by with one common variable with pandas? -

- March 15, 2014

I want to mark duplicate values within an id group. For example,

id AB i1 a1b1i1a1b2i1a2b2i2a1b2

Should be

  ID A is a bbn i1a1bb1a1a1b2 2i1a2 1b2 2i2a1 1b 2 1 Actually count the multiplicity within the  one  and  BN  each  id  group. How can I do this in Pandas? I've found  from the group , but it was quite messy to keep everything together. Also I tried the individual group for  id, a  and  id, b . Maybe there is a way of pre-group by first and then use all other variables by the  id ? (There are many variables and I have lots of man lines!)

   Tried for ID, A and ID, B 
 
  I think this is a straightforward way of solving it; As you suggest, you can do the  by group  each and then calculate the size of the group. And use  change  so you can easily add results to the original dataframe: 
   df ['an'] = df.groupby (['id' "a [a] transform (NP Secure) DF [ 'BN'] = df.groupby ([ 'id', 'b']) [B] transform (NP. ID AB a BN 0 I 1A 1B1 2 1 1A1A1B2 2 2Ii1A2B2 1 2 3I2A1B2 1 1  
  Of course, too many columns You can: ['A', 'B'] for Colonel 
   (d) [col + 'n'] = df.groupby (['id', Cola]] [cola] .transf Rm (Np.size)

you Duplicate using the method can also be used to do something similar, but this first one as a duplicate Will mark the comments within the group: for the call in Print DF-ID AB a BN 0 I1 a1 b1 false false 1 i1 a1 b2 true false 2 Il A2 B2 false true 3 i2 a1 b2 false false

edit < / Strong>: Enhancing performance for large data Has done it on a large dataset (4 million lines) and if I want to avoid something with change then it is quite fast (it is very less elegant):

< code ( '' A ',' B ']: x = df.groupby ([' ID ', col]) size () df.set_index ([' id ', Cola], inplace = true) df [Col +' N '] = x df.reset_index (inplace = true)

Comments

Post a Comment

Search This Blog

City

Multiple group-by with one common variable with pandas? -

Comments

Post a Comment

Popular posts from this blog

uislider - In a MATLAB GUI, how does one implement a continuously varying slider from a GUIDE created .m file? -

lua - HowTo create a fuel bar -

Editing Python Class in Shell and SQLAlchemy -