0

I have this code that I wrote and it's taking too long to run. I was advised to vectorize this operation but so far I have found only multiplication examples. Here is my code:

my_dict = {}
for i in list(df.index):
    my_dict[i] = myClass(df.loc[i, 'name'])
    my_dict[i].class_method({'col1': df.loc[i, 'col1']})
    my_dict[i].class_method({'col2': df.loc[i, 'col2']})
    ...

and so on until 'col17'. Someone reviewed my code and said to 'use the fact that df is a dataframe and not loop through and don't use the expensive .loc() operation'

The only thing I could come up with is:

my_list = ['col1', 'col2', ..., 'col17']
my_dict = {}

for i in list(df.index):
    my_dict[i] = myClass(df.loc[i, 'name'])
        for col in my_list:
            my_dict[i].class_method({col: df.loc[i, col})
    

but this is not really vectorizing anything... are there any secret ways around pandas vectorization that I don't know about?

4
  • When you make an operation among columns you are going to execute it in all the rows. I'm not sure what you trying to do, but you could try to instead of iterating through the rows "for i in list(df.index)" iterate only through the columns. Could you give us more details? What the "class_method" is doing? Commented Jan 28 at 14:47
  • I agree with @GustavoAlves comments. Please include an explanation of what you are trying to do and we can possibly provide alternate means to accomplish the task. Commented Jan 28 at 15:07
  • please provide a minimal reproducible example Commented Jan 28 at 16:29
  • 1
    what you are doing here, creating an instance of myClass from some value in the "name" column and then calling some method of that instance with various other values from the same row, isn't really vectorizable, but you could at least use something like df.itertuples() instead of for i in df.index: ... in combination with .loc. Commented Jan 28 at 16:31

1 Answer 1

0

.loc can be expensive as it needs to look up if you are passing a slice or an iterable over keys. Converting your dataframe to a dict of dict should bring faster lookups:

my_list = ['col1', 'col2', ..., 'col17']
my_dict = {}

for row_key, row in df.T.to_dict().items():
    my_dict[row_key] = myClass(row['name'])
        for col in my_list:
            my_dict[row_key].class_method({col: row[col})
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.