Multiple group-by with one common variable with pandas? -
I want to mark duplicate values within an id group. For example,
id AB i1 a1b1i1a1b2i1a2b2i2a1b2 Should be
ID A is a bbn i1a1bb1a1a1b2 2i1a2 1b2 2i2a1 1b 2 1 Actually count the multiplicity within the one and BN each id group. How can I do this in Pandas? I've found from the group , but it was quite messy to keep everything together. Also I tried the individual group for id, a and id, b . Maybe there is a way of pre-group by first and then use all other variables by the id ? (There are many variables and I have lots of man lines!)
Tried for ID, A and ID, B
I think this is a straightforward way of solving it; As you suggest, you can do the by group each and then calculate the size of the group. And use change so you can easily add results to the original dataframe:
df ['an'] = df.groupby (['id' "a [a] transform (NP Secure) DF [ 'BN'] = df.groupby ([ 'id', 'b']) [B] transform (NP. ID AB a BN 0 I 1A 1B1 2 1 1A1A1B2 2 2Ii1A2B2 1 2 3I2A1B2 1 1 Of course, too many columns You can: ['A', 'B'] for Colonel
(d) [col + 'n'] = df.groupby (['id', Cola]] [cola] .transf Rm (Np.size) you edit < / Strong>: Enhancing performance for large data Has done it on a large dataset (4 million lines) and if I want to avoid something with Duplicate using the method can also be used to do something similar, but this first one as a duplicate Will mark the comments within the group: for the call in change then it is quite fast (it is very less elegant): < code ( '' A ',' B ']: x = df.groupby ([' ID ', col]) size () df.set_index ([' id ', Cola], inplace = true) df [Col +' N '] = x df.reset_index (inplace = true)
Comments
Post a Comment