Assume we have a data frame in Python Pandas that looks like this:
df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': [u'aball', u'bball', u'cnut', u'fball']})
Or, in table form:
ids vals
aball 1
bball 2
cnut 3
fball 4
How do I filter rows which contain the key word "ball?" For example, the output should be:
ids vals
aball 1
bball 2
fball 4
In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
ids vals
0 aball 1
1 bball 2
3 fball 4
- 9How would you invert this to find all the rows that did not contain the string? – user4896331 Mar 1 '17 at 9:50
- 43@user4896331 -
df[~df['ids'].str.contains("ball")]
,~
negates the condition – Amit Verma Mar 1 '17 at 12:57 - If it was a specific word, to negate, could you also use: df = df[df.id != "ball"] – Brian Apr 26 '17 at 17:59
- @Brian - Yes, in the above df you can try
df = df[df.ids != "aball"]
to see it in action. – Amit Verma Apr 27 '17 at 2:03 - @Amit: I need to access columns by id instead of name. However trying str gives me an error [AttributeError: 'DataFrame' object has no attribute 'str'] Does new pandas not support it or is it because of number based access? – Sameer Mahajan Oct 23 '17 at 10:01
df[df['ids'].str.contains('ball', na = False)] # valid for (at least) pandas version 0.17.1
Step-by-step explanation (from inner to outer):
df['ids']
selects theids
column of the data frame (technically, the objectdf['ids']
is of typepandas.Series
)df['ids'].str
allows us to apply vectorized string methods (e.g.,lower
,contains
) to the Seriesdf['ids'].str.contains('ball')
checks each element of the Series as to whether the element value has the string 'ball' as a substring. The result is a Series of Booleans indicatingTrue
orFalse
about the existence of a 'ball' substring.df[df['ids'].str.contains('ball')]
applies the Boolean 'mask' to the dataframe and returns a view containing appropriate records.na = False
removes NA / NaN values from consideration; otherwise a ValueError may be returned.
str.contains
answers are probably the fastest and recommended method for your requirements: pandas.pydata.org/pandas-docs/stable/generated/… – EdChum Jan 16 '15 at 9:00