Setting with Copy Warning

One of the most frustrating issues in Pandas is the SettingWithCopyWarning. To a beginner in Pandas this can be one of the most frustrating issues to deal with. What becomes even more frustrating is that the documentation is not very helpful, and StackOverflow offers a ton of answers which don’t help either.

The answer is actually quite simple. Pandas Dataframes are similar to dictionaries in how they have indexes, and every row has a specific name. This is useful because it helps ensure that every item will be placed in the proper row.

To make this work without triggering the error, you need to make sure that when you are retaining the row index in the object you are creating with your desired values.

Here are two common situations which you might find:

Rename a specific column

A Pandas noob will be inclined to write the following:

price_lb=prices[prices['Data Item'].str.contains('LB')]
price_lb['Value per pound'] = price_lb['Value']

Which will return the following error:

:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation:
price_lb[‘Value per pound’] = price_lb[‘Value’]

How do we solve this?!?!

Well, we need our good friend pd.concat to solve this problem unambiguously:

temp_df = pd.DataFrame(price_lb['Value'])
temp_df.columns = ['Value per pound']
price_lb=pd.concat([price_lb,temp_df], axis=1)

This is what you should do in order to ensure that there is no potential mismatching of data. We know this works because temp_df.index is identical to price_lb.index.

Run an operation on a column

A Pandas noob will be inclined to write the following:

price_ton['Value']=[x.replace(',','').replace(' (S)','').replace(" (NA)",'') for x in price_ton['Value']]

This presents a problem, because this generates a list, which has a index which is from 0 to the length of the list. This most likely doesn’t match the index of the DataFrame!

So we need to run an operation on this column, but also retain the index. Lambda functions are your friend in this situation. Rewrite your function like so:

values=price_ton['Value'].apply(lambda x: x.replace(',','').replace(' (S)','').replace(" (NA)",''))

Values will now retain the index and column name of the original Dataframe!

You can then use a pd.concat() function to bring this information back into your original dataframe.

price_ton=pd.concat([price_ton,pd.DataFrame(price_ton['Value'].apply(lambda x: x.replace(',','').replace(' (S)','').replace(" (NA)",'')))], axis=1)

No warnings, and every datapoint is exactly where it belongs.

I hope this helps you write better Python code.

1 comment

  1. Good day very cool web site!! Guy .. Excellent .. Amazing .. I’ll bookmark your site and take the feeds also?I am satisfied to search out so many useful information here within the submit, we’d like develop more techniques in this regard, thank you for sharing. . . . . .

Leave a comment

Your email address will not be published.