Assume we have a data frame in Python Pandas that looks like this:
df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': [u'aball', u'bball', u'cnut', u'fball']})Or, in table form:
ids vals
aball 1
bball 2
cnut 3
fball 4How do I filter rows which contain the key word "ball?" For example, the output should be:
ids vals
aball 1
bball 2
fball 4In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
ids vals
0 aball 1
1 bball 2
3 fball 4- 9How would you invert this to find all the rows that did not contain the string? – user4896331 Mar 1 '17 at 9:50
- 43@user4896331 -
df[~df['ids'].str.contains("ball")],~negates the condition – Amit Verma Mar 1 '17 at 12:57 - If it was a specific word, to negate, could you also use: df = df[df.id != "ball"] – Brian Apr 26 '17 at 17:59
- @Brian - Yes, in the above df you can try
df = df[df.ids != "aball"]to see it in action. – Amit Verma Apr 27 '17 at 2:03 - @Amit: I need to access columns by id instead of name. However trying str gives me an error [AttributeError: 'DataFrame' object has no attribute 'str'] Does new pandas not support it or is it because of number based access? – Sameer Mahajan Oct 23 '17 at 10:01
df[df['ids'].str.contains('ball', na = False)] # valid for (at least) pandas version 0.17.1Step-by-step explanation (from inner to outer):
df['ids']selects theidscolumn of the data frame (technically, the objectdf['ids']is of typepandas.Series)df['ids'].strallows us to apply vectorized string methods (e.g.,lower,contains) to the Seriesdf['ids'].str.contains('ball')checks each element of the Series as to whether the element value has the string 'ball' as a substring. The result is a Series of Booleans indicatingTrueorFalseabout the existence of a 'ball' substring.df[df['ids'].str.contains('ball')]applies the Boolean 'mask' to the dataframe and returns a view containing appropriate records.na = Falseremoves NA / NaN values from consideration; otherwise a ValueError may be returned.
str.containsanswers are probably the fastest and recommended method for your requirements: pandas.pydata.org/pandas-docs/stable/generated/… – EdChum Jan 16 '15 at 9:00